With the advent of some new terms like Delta Lake and Lake House I thought I would put together a brief summary for those who are pondering what these model/storage approaches are.
Please note that some examples are less contemporary than others, for example the Corporate Information Factory is effectively now replaced by the Data Vault.
|Corporate Information Factory||Combines sources and transforms it into a repository in the integration layer. It is highly normalised.|
|Operational Data Store||It is intended to integrate real-time updates with master and transactional data for use by operational reports. It is normalised.|
|Data Lake||Enables storage of structured and unstructured data at scale. Data is stored without transformation. Despite this modelling approach, users can access Data Lakes to deliver analytics. The data is raw.|
|Data Lakehouse||Combines the capabilities of data lakes and data warehouses, by enabling BI and ML.|
|Delta Lake||A layer that adds reliability to data lakes. It provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.|
|Data Mart||Designed for reporting current and historical data at a line of business level. It is often a subset of the data warehouse and is de-normalised|
|Data Vault||Set of normalised tables that support one or more functional areas of business. There are currently two versions, referred to as Business and Raw Vaults. It is highly normalised|
|Data Warehouse||Designed for reporting current and historical data at an enterprise level. It is de- normalised.|
You can see that there are many options, and those above are only a subset, also in many cases that refer to each other (e.g. a Data Mart is a subset of a Data Warehouse) so lets look at how they are often combined:
If you are asked “which one is best”, please don’t get stuck in a fundamentalist position, they all have their merits depending on the architecture (cloud or on premise) data, budget and requirements you are presented with
Just as importantly though, consider that the model/storage approaches needs to be complimented by the tools and the team you have. For example I promise you that hand coding a Data Vault (i.e. sooner than using meta data driven ETL) with a team that has never built one will not be an easy experience!!!