Challenges of classic Data Warehouses
In the Data Warehouse environment, there are two well-known modeling approaches according to Kimball and Inmon that have been used for countless years when it comes to storing data. However, these have to face more and more growing challenges:
It is therefore questionable whether these approaches are still appropriate for all the modern issues and requirements of today. This consideration gave rise to the Data Vault modeling approach.
What is Data Vault?
Data Vault is a modeling technique that is particularly suitable for agile Data Warehouses. It offers a high flexibility for extensions, a complete historization of the data and allows a parallelization of the data loading processes. This hybrid approach combines all the advantages of the third normal form with the star schema. Especially in today's world, companies need to transform their businesses in ever shorter cycles and map these transformations in the Data Warehouse. Data Vault supports exactly these requirements without significantly increasing the complexity of the Data Warehouse over time. Unlike Kimball and Inmon, this eliminates the ever-increasing IT costs associated with extensive implementation and testing cycles and a long list of potential dependencies.
Procedure for Data Vault
The Data Integration Architecture of the Data Vault approach has robust standards and definition methods that bring information together to use them in a way that makes sense. The model consists of three basic table types:
- Hub (blue): Contains a list of unique business keys, such as customer numbers.
- Link (orange): Establishes relationships between business keys. Links are often used to handle changes in data granularity and reduce the impact of adding a new business key to a linked hub
- Satellite (turquoise): contains descriptive attributes that may change over time. Where hubs and links form the structure of the data model, satellites contain temporal and descriptive attributes, including metadata, that link them to their parent hub or link tables.
- Massive reduction in development time when implementing business requirements
- Earlier return on investment (ROI)
- Scalable Data Warehouse
- Traceability of all data back to the source system
- Near-real-time loading (in addition to classic batch run)
- Big Data Processing (>Terabytes)
- Iterative, agile development cycles with incremental expansion of the DWH
- Few, automatable ETL patterns