- How is the raw data historized after loading, so it can be reloaded in an error case?
In the data integration flow, the data is only historized and persisted once it is loaded up to the data vault (persistent staging / raw vault / business vault). The staging itself will always be deleted when a new dataset is loaded. In case that there is a need to historize the data as it arrived in the staging area, there are two approaches:
Define a persistent staging load (based on the technical key of the source) to load the data into the data vault before doing the integration on the business key.
Write a manual process, which is triggered after each staging run to archive the loaded data.
- What is the difference between a general subset load and a delta load?
The difference between a general subset and a delta subset is the way the Datavault Builder handles implicit deletions in the source. Therefore, this mainly has an impact on the historization of data in the data vault.
With an active General Subset clause, the loaded data set is viewed as if it is still the complete data set from the source and is treated as a
full load. This means that if a key value is no longer provided by the source, it will be marked as deleted in the tracking satellite. This is the same behavior as a normal full load, simply with a certain reduced data set, which is always the same.
An example of the use of this type is customer data: We are a country branch of an international company and always extract only the customer data of our own country.
With a delta subset, only part of the data is extracted from a single source, mainly for performance reasons. In this case, the loaded data is regarded as an effective
delta. For historization, this means that if a key value does not appear in a load, it cannot be regarded as deleted. This is because the key may not be in the loaded subset. Therefore the information of the
Last Seenis carried in the tracking satellite. This information can also be used for permanent delta loading.