Data drives business. Distributed across numerous systems, vast amounts of data sets are created every day, their information being potentially business-critical for every company. Describing and merging the data are the most challenging tasks for analytical applications. While the term "ETL" (Extract - Transform - Load / or ELT) usually described the classic batch-driven process, today the term "Data Integration" extends to all methods of integration: whether batch, real-time, inside or outside a database or between any systems.
In addition to the physical data integration, the purely logical integration "Data Virtualization" is increasingly used due to its higher flexibility and agility, especially in modern "Data Factory" architectures.
What is Data Integration?
Data Integration describes all measures, tools or processes that are necessary to transfer data from source systems into a target system (often data warehouse or data lake). This usually includes options for connecting to the source system ("connectivity"), different speeds (batch vs. real-time) and logic for transforming the data or bringing it into a uniform schema.
In classic Data Integration, data is physically transferred from the source to the target, which offers the advantage of shared access with assured performance, but comes with storage and development costs and less agility. For Data Virtualization, data remains in its original location, with a logical data model replacing the physical one. The agility gained comes at the cost of performance challenges and limited transformation logic.