Preparing target database

To start the Setup of the Datavault Builder, some Objects (Mainly Tables for Config and Logging and Metadata Views) need to be installed on the target Database.

When using a separate, non-containered Version of the Database, then this can be achieved by starting up the containerized version of the clientdabase once, connecting into the container and executing the install scripts.

Exasol, MS SQL, Oracle, Postgres

Run a helper container and connect to the shell using bash or windows powershell

docker run -it --rm --entrypoint=/bin/bash --mount type=bind,ro=true,src=$(pwd)/secrets/authenticator_password.txt,dst=/run/secrets/authenticator_password docker.datavault-builder.com/datavaultbuilder/clientdb_<target_db>:8.3.2.1
Hint

where <target_db> is mssql, oracle, postgres or exasol

the secret file for the authenticator user under secrets/authenticator_password.txt must exists

change into folder with source code
cd /dvb_sql/
execute shell script which deploys source code, depending on the database, the script execution may vary

Postgres

./create_db.sh <host> <port> <database-name> <admin-user> <admin-password>

MS SQL

Hint

If the named database does not exists, it will be created
Make sure to either specify an instance_name or a port, but not both!

./create_db.sh <host>[/<instance_name>|,<port>] <sys-user> <password> <database-name>

Sample

./create_db.sh internal.domain.com,1433 sysusername password123 dvb_warehouse

Exasol

./create_db.sh <host>:<port> <sys-user> <password>

Oracle

Hint

the sys user must have have the sysdba role for this script to work.

./create_db.sh '<sys-user>/"<password>"@"<host>:<port>/<service>.<domain>" as sysdba'

Google Big Query

Please follow the instructions on our Knowledge Base:

https://support.portal.datavault-builder.com/servicedesk/customer/portal/5/article/135102466

MS Fabric SQL DB

Please follow the instructions on our Knowledge Base:

https://support.portal.datavault-builder.com/servicedesk/customer/portal/5/article/124747792

MS Synapse (Dedicated SQL Pool)

Please contact us for an install package you can run in SSMS

Snowflake

Install the Snowflake python module:

python -m pip install --upgrade pip
pip install --upgrade snowflake-connector-python

Execute python script to install DVB structures:

python .\snowflake_by_schema_install.py -d <db_name> -w <warehouse_name> -a <snowflake_account> -u <username> -P [ --dvb-connection-password "<authenticator_password>" ]

Hint

More options for the install can be displayed python .\snowflake_by_schema_install.py --help.

Hint

If you don’t specify the --dvb-connection-password, then the created users AUTHENTICATOR and DVB_SECURITY_ADMIN are disabled by default. Make sure to manually set their password (same password for both), enable the users and store the password in the authenticator_password.txt secret.

Hint

The python script will be available in the install package zip file, which can be downloaded in our knowledge base: https://support.portal.datavault-builder.com/servicedesk/customer/portal/5/article/11993850#Install-packages Make sure to use the correct casing for database, warehouse and username.

Database Specific Limitations

Oracle:

Does not know transactions for ddl-statements. Therefore, it can happen that structures may remain in an inconsistent state when an error occurs.
Loading of CLOBs and BLOBs into the Data Vault is currently not supported.
Data Viewer / Data Preview does not (fully) support tables/columns with CLOBs and BLOBs.

Snowflake:

Does not know transactions for ddl-statements. Therefore, it can happen that structures may remain in an inconsistent state when an error occurs.
Missing dependencies metadata restricts data lineage calculation.

MS Synapse:

No Field length longer than 8000 bytes as large datatypes can not participate in column store index.
No transactions: Everything is running in auto-commit mode.
TimeTZ not loadable.
No FKs on synapse - no validation for referential integrity.

Snowflake Source Connector:

Limitations with processing timestamps due to a handling issue in the JDBC Driver

Google Big Query:

Generic Behavior:
- User Impersonation: Impersonation is not implemented yet, Datavault Builder runs in superuser mode.
- Performance of DDL’s: Creating objects (Staging Tables, Hubs, Hub Loads, Satellites, Links, Business Objects, Business Rules) is rather slow (5-20 sec) due to limitations by Big Query itself, there might be ways to tune this though.
- Autocommit is disabled and transactions should be correctly executed. There might be some ‘timetravel’- or ‘object not found’-errors when quickly (re-)creating and loading an object. These messages should disappear automatically after some time the object exists.
- The timezone is cut off and therefore always displaying UTC currently.
- BigQuery limits manipulations on its objects, there might appear ‘Quota reached’-messages (i.e. for too many ALTER-Statements on a table or too many INSERTS in a specific time period).
Data Processing:
- Timestamps: For computing logging and load times the timestamp of the core service is used.
- Loaded row size in staging is limited to 10 MB due to max request size (Quotas and limits | BigQuery | Google Cloud )
- Staging datatype mappings: There might be some (exotic) datatypes that are not mapped correctly yet, in this case they can be overwritten manually in the staging module.
- Runtime data is not cleaned up and may show entries for non-existing objects.
Lineage:
- Big Query does not offer metadata on object dependencies. Dependencies are therefore retrieved by ddl parsing and might not be 100% accurate.