Third Normal Form (3NF) and the Data Vault
OR How not to be afraid of the Data Vault
From time to time we receive an interesting question from people interested in using the Datavault Builder: Does the Datavault Builder DWH automation tool supports 3NF?
The interesting answer is: yes.
Though we have actively decided against a 3NF core because we believe that splitting the data into individual functions such as keys (hubs), relationships (links), and context/attributes (satellites) solves many problems with 3NF DWHs.
This includes the creation of models, which can now be created from different starting points and can easily grow together, as well as easier maintenance in case of model changes and additionally also the independence of individual loading routes. And these are only three of many advantages.
So, we usually ask what the business requirement is to use 3NF. Most of the answers can be summarized that the data consumers were used to a 3NF core and are not sure if they can also handle queries from a Data Vault model.
This is the moment to emphasize clearly: nobody should query the core directly. In my opinion, this was already not the case for Inmon models and applies even more explicitly to Data Vault cores.
Clearly, interfaces must be created for the queries. These bring the data into a format that is optimized for queries and reports.
Very often these interfaces are created as a dimensional model since many modern reporting tools are optimized for this. However, since we have a higher level of normalization in the Data Vault, we can also provide interfaces at a finer level like 3NF.
How to create a 3NF Layer
If you have captured the expected cardinalities in the data modeling of the data vault (one-to-many, many-to-many, etc.) you even have all the information to deterministically translate a data vault model into a 3NF interface:
- The hub business key or hash is assigned to the PK
- All many-to-one or one-to-one connected hub business keys/hashes become foreign keys
- All fields from the satellites become attributes
Whether you create this as an As-of-Now or As-of-Then view doesn’t matter. You can simply go through the list of hubs and create the 3NF view per hub without any user input.
If you now add a satellite in the Data Vault, you simply have to regenerate the view for that hub. Easy to create, easy to adjust.
Since we offer APIs for all functions in the Datavault Builder, it is easy to use them to create interface views fully automatically according to your own rules (e.g. do the foreign keys have an FK_ prefix, or are they named exactly the same as the target field or use the BK or hash as PK). In theory, the data consumers will never know that their data is stored in Data Vault format.
Data Vault and 3NF?
So our answer is: yes we support 3NF to present the data, but always use a Data Vault to store it. This could well be the message to convince also data consumers who want the assurance that they can continue to work with existing approaches.
And if that doesn’t work you could even hide it from their eyes how you organize your backroom. After all, when we go shopping, we don’t visit the mall’s warehouse but are happy about the well-stocked shelves.