Attention

This document is no longer being updated. For the most recent documentation, including the latest release notes for version 6, please refer to Documentation Version 7

Operations

The operations module is all about orchestrating and monitoring the loading process of staging and data vault loads.

Status

The status lineage allows to get an immediate overview of the current loading states. As a difference to the data lineage, its flows are not colored by the system, but the state of each staging and data vault load.

../_images/status.png
  1. Search Bar

    Use the search bar to filter the status-lineage. Filtering works the same as in the data lineage.

  2. Zoom Toggle

    Set zooming: Will rescale the lineage to be able to see all node labels. Reset Zooming: Will rescale lineage to current window size.

  3. Load

    Loads are colored by their current status.

    • Green: Successfully loaded.

    • Light blue: Initialized, but waiting.

    • Blue: Loading.

    • Red: Failed.

    When hovering a load, the window will appear, giving details about the load state, such as start-time, duration and loaded rows.

    ../_images/load1.png
  4. Circular Dependencies

    Once a filter is applied, switch to hide business vault dependencies from the lineage.

  5. Loads Filter

    Only show loads and related objects for the selected load states.

Jobs

The jobs module is the heart, automating and orchestrating the data flows in the Datavault Builder. The module allows to define multiple different packages of staging and data vault loads, giving you the maximum flexibility to define your custom jobs. However, if you compose manual jobs, the Datavault Builder will help you create an execution order by taking away the work of manually defining the correct and most efficient loading order. Focus on the loads to include in a job, and the Datavault Builder will optimize and parallelize the loads as good as possible.

Definition

A job is a package of loads (datavault / staging) on the granularity of one source system. It is possible to have multiple jobs per system, but not multiple systems per job.

Usage

At system creation, a default job is created, automatically taking up all newly created loads. Once you start modelling, in the background the job is automatically updated with each modification in your data integration process, in the end allowing you to directly trigger a reload of your integration flow.

Besides all the Automation, many parts can be customized, as you can see in the following chapters.

../_images/jobs.png
  1. Job List

    When clicking into the search-bar, a drowdown will appear, listing all existing jobs in the Datavault Builder.

  2. Job Load Status / Load Selection
    On the canvas, the loads within a job as well as the dependencies are visualized.
    In the status view, the color indicates the current status for each load. When using the view toggler, the selection view allows you to compose your custom job.
  3. Circular Dependencies

    Once a filter is applied, switch to hide business vault dependencies from the lineage.

  4. View Toggler

    Click onto the toggler to switch between load status and load selection on the canvas.

  5. Loads Filter

    Only show loads and related objects for the selected load states. Can also be used to filter included / excluded loads for the job.

  6. Job Actions
    ../_images/jobActions.bmp
    • Circled Arrow: Initiate Job Load

    • Switch: Enable/Disable Job

    • Screw nut: Edit Job

    • Trash bin: Delete Job

  7. Job Details

    Lists when the job started the last time, how long it did run as well as the next schedule. If you have multiple schedules defined for the same job, only the next run overall will appear.

  8. Schedules

    Lists defined schedules and lets you add another schedule to a job.

  9. SQL Query

    It is possible to define custom SQL commands, which will be executed every time the job completes. This allows you for instance to trigger the copy of data into another database or programm notifications about job status. Please see the example below:

    DO $$
    DECLARE
        _result TEXT;
    BEGIN
    
        IF ({{job_completed}}) THEN
            _result = dvb_core.f_execute_update_fdb_query_with_log(
                query => 'EXECUTE sp_whatever',
                function_name => 'PostSqlScript',
                action => 'Execute stored procedure',
                attribute_list => ''
            );
            IF (_result IS NOT NULL) THEN
                RAISE EXCEPTION 'Error while exeuting stored procedure: %', _result;
            END IF;
        END IF;
    
    END;
    $$;
    
  10. Triggered by / Triggered on Completion: Jobs

    Lists and allows to define dependencies between jobs, which causes jobs to run after onea nother.

Creating a Job

As stated before, a default job is created on system creation. This default job is configured to automatically include newly created loads. The default job can be modified or deleted if needed.

Also, it is possible to create more jobs for a system. This can be the case, if only parts of the source should be loaded more often than others to improve the effectivety of the integration flow.

Steps
  1. Navigate to the jobs module

  2. Click onto Add Job

  3. Fill out the dialogue explained below

  4. Complete the creation dialogue.

  5. Continue with defining individual loads, schedules and job dependencies.

../_images/addJob.png
  1. Activation Switch

    Enable or disable the job as a whole.

  2. Source System

    Select a source system which was previously created in the staging module.

  3. Job Name

    Displayed name of the job.

  4. Job Suffix ID

    Partial technical ID of the job. Will compose the job ID in form of:

    [Source System ID] + ``_j_`` + [Job Suffix ID]
    
  5. Max. Parallel Running Loads

    The maximum number of loads that are able to run in parallel in this job, where “-1” means unlimited. Available load slots are always limited by the system config parameters.

  6. Auto Add New Elements

    When creating a new staging table or data vault element, the loads can automatically be added to a job. Enable this continuous integration by ticking the box. Otherwise new loads need to be added manually.

  7. Comment

    Custom notes about the job. Will appear in the documentation.

  8. Add Job

    Button to complete the job creation.

Defining loads within a Job

A job can be configured in two ways:

  1. Automatically take up new loads

  2. Only contain manually specified loads

In both cases, loads can be individually enabled/disabled. If a load is disabled in the first case (automatically take up new loads), then new loads will still be automatically added to the loads, but the selected ones are skipped.
In the second case (manually specified loads only), new loads are not automatically added, so that loads have to be picked individually and only selected loads will be contained in the job.

Note

This behavior is as well reflected in the state export. E.g. if you have a “manual” job, the jobs loads are the ones being executed. If you have an “auto” job, then the jobs loads are the ones being skipped.

If a load is removed in the Datavault or Staging, the load will as well be automatically be removed from the job. Be aware, that if you have previously excluded a load from a job, then removed it from the data vault and added it again, it has to be disabled again.

You can as well reconfigure a job from automatically adding new loads to manually selecting loads. In this case, the state is preserved, meaning that all loads added to the job up to this point will directly be included in the manual job.

Steps
  1. Go to the jobs module and open up an existing job from the list

  2. Toggle the view to activate the “Selection”-Appearance of the canvas

  3. Click onto a load to add it to / remove it from the job

When you are on the selection view, you will notice, that the connections in the lineage are colored in three different ways:

  • Purple: Selected load, will be executed on job execution

  • Dark Gray: Available load, could be selected, won’t be executed on job execution

  • Light Gray: Independent data flows, can’t be selected, represent virtual layers or loads from different source systems.

../_images/definingLoads.png

The selection composer will only allow you to select loads of the underlying source system of the job. If you would like to initiate dependent loads on job completion, please have a look at job dependencies.

Schedule a Job

The Datavault Builder consists of an integrated scheduling agent, allowing you to directly manage and configure the automation of reloading processes within the environment.

Definition

A schedule will trigger one individual job on a given timing.

Steps
  1. Navigate to the jobs module, open an existing job and click onto the located behind “Schedules” in the Job properties slider.

  2. Fill out the Base tab, allowing to define a start and end date.

  3. Fill out the scope, defining to load as full- or delta-load.

  4. Indicate the timing to initiate the job at.

Note

It is possible to have overlapping schedules, as well as job dependencies leading to a job possibly being initiate while already loading. To prevent the system from being queued up, the job will first check the existence of a running job and not initiate a second time.

Base

../_images/base4.png
  1. Activation Switch

    Enables/Disabled the schedule.

  2. Schedule Name

    Displayed name of the Schedule.

  3. Start Date / Time

    Only run the schedule past this initial date and time.

  4. End Date / Time

    Only run the schedule before this final date and time. Can be left out if infinite.

  5. Comment

    Individual notes about the schedule.

  6. Schedule ID

    Technical ID of the schedule. Is directly derived from the Name but can manually be edited.

Scope

The scope is the place where you can define a schedule as loading full or delta in the staging.

In case of a full load, all data from the contained source tables in the job will be copied.

In case of a delta load, the where clause parameter specified for each staging table will be used to define a delta-subset in the source. In this case, eventually used parameters can be assigned a fix or calculated value in the schedule definition.
For details: Working with full / delta loads

Note

If a schedule is configured as delta load and has dependent jobs which will be initiated, then the parameter-values from the first job triggered by a schedule will be passed down the the next job to be used in the delta-load.

../_images/scope.png
  1. Schedule Type

    Defines to load full or only delta. If full load, then no parameters have to be specified.

  2. Delta Load Parameters
    Name: Type in the name of the used parameter in the Delta-Load-Where-Clause of the staging table(s)
    Value: Assign a fix or dynamic value to the parameter, which will be filled in on each execution

    Note

    Pay attention to confirm the adding of a new parameter using the plus-symbol.

    The value can as well be the result of a function call. For that purpose, three execution environments are possible:

    • Execute on source: Just add the functions to the value, it will be added to the where-clause and passed as is to the source

    • Execute in the Datavault Builder Core: For your convenience default functions can be added with the prefix DVB_CORE_FUNC.*

    • Execute on your client-database: Prefix your value statement with DVB_CDB_FUNC. and it will execute the code after it to be run on the client-database.

    First, you have to create a function. For instance, create your function in schema dvb_functions called get_transaction_id. Then, you can use the created function in the value field as follows: DVB_CDB_FUNC.SELECT dvb_functions.get_transaction_id():

    ../_images/delta_param.png

    Hint

    Make sure, that the function call returns a single value (e.g. text/varchar, int) as a result!

    See also

    This just created parameter transaction_id can then be used in the Delta Load Clause when adding a staging table:

    ../_images/delta_load_clause.png

    Description
    • Variable transaction_id is filled with the result of the previously created function SELECT dvb_functions.get_transaction_id() which in this case is a number.

    • So you compare the transaction_id from the result with the max_transaction_id (which is a number as well)

    • In this example, only those rows are loaded where transaction_id is greater than max_transaction_id.

    Hint

    • If you compare string values you need to add quotation marks in the Delta-Load-Where-Clause field.

    • If you compare dates, you have to explicitly add a cast. Here’s an example with oracle syntax:

    ../_images/delta_load_clause_date.png

    Important: Place quotation marks around the curly brackets as well to prevent errors when casting a string to a date.

Timing

In the last step of adding a new schedule to a job, the timing is assigned.

Available options are:
  • Every Minute

  • Every Hour (minute)

  • Every Day (hours,minute)

  • Every Week (weekday,hour,minute)

  • Every Month (monthday/lastofmonth,hour,minute)

  • Every Year(month,monthday/lastofmonth,hour,minute)

Note

It is possible to also select multiple options within one step, as displayed in the example below, where the job will only run during the workdays and working hours.

../_images/timing.png
  1. Timing Selection

    Definition of the timing for a schedule. Multiple selections within a row are possible.

  2. Add Schedule

    Button to complete the creation of the schedule.

Job dependencies

Jobs can not only be triggered by schedules, but as well by other jobs. In this case, the staging load type (full / delta) as well as the parameters are inherited from the triggering job. The dialogues for the jobs Triggered by and Triggered on Completion work the same way, so we will only show one of both below.

Definition

A dependent job is triggered on completion of another job.

When adding a job to Triggered on Completion, that job will run after the first job is finished. Therefore, the triggering job will also appear in the Triggered by jobs list of the dependent job. Direct circular dependencies are not possible (a dependent job being the trigger for its triggering job). However, indirect circular dependencies are possible.

../_images/jobDependencies.png
  1. Job List

    Search the list for a specific job.

  2. Job Status / Action
    ../_images/status_action.png
    • Toggle: Indicates, whether the chosen dependent job is activated. For enabling/disabling open up the job.

    • Trash bin: Removes the dependency.

Post-Job SQL Query

Besides Job-Dependencies, the Datavault Builder as well supports execution of Post-Job SQL Queries, allowing to perform customized actions once the loading process of the job is finished.

Definition

A Post-Job SQL Query is custom code, executed within the job, after all loads, but before triggering dependent jobs.

Steps
  1. Navigate to the jobs module, open an existing job and click onto the edit-symbol located behind “Default SQL Query” in the Job properties slider.

  2. Fill out the Pop-Up, defining your custom code.

  3. Activate the SQL Query.

Configuration

In the dvb_config.config, you can define a paramter called run_job_sql_query_on_core, allowing you to specify, whether the query will be executed on our core, or on the client/process database.

Example
The following code shows, when the code execution of the post-jobs sql query on the core makes sense.

Note

To run this example, set run_job_sql_query_on_core in the config to TRUE.

In this case, the job is loading data from CSV-Files. Once those files are loaded, they should no longer be in the source folder, otherwise they would be reloaded each and every run. Therefore, we will move them into a subfolder after the load.
DO $$
DECLARE
_moved_files_count INTEGER;
BEGIN
    IF ({{job_completed}}) THEN
        _moved_files_count = dvb_core.f_move_files('test*.csv', 'loaded_files/bak');
        RAISE NOTICE 'Post Job SQL Script moved % source files.', _moved_files_count;
    END IF;
END;
$$;

As you notice, for the movement we use the core-function dvb_core.f_move_files, specifying which files should be moved into what directory. Also, we only move the files away, if the job was actually successful, meaning no load failed. For this purpose, the variable {{job_completed}} can be used, which will automatically be evaluated on execution time by the core.

Deleting a job

Before deleting a job, make sure, that it is not needed anymore because removing a job will also directly remove:

  • All schedules defined for the job

  • All SQL Queries within a job

  • All job dependencies

  • Inclusion logic for manually disabled / enabled loads within a job

Steps
  1. Open up the job in the jobs module

  2. Click onto the trash bin in the job properties.

  3. Confirm the deletion.

Load States

The are four basic load types in the Datavault Builder:

  • Staging load (St): A single staging table load is performed.

  • System load (S): All staging tables for a system are loaded.

  • Data Vault load (DV): A single load into a Data Vault object (Hub / Link / Satellite).

  • Job load (J): A (group of) Staging and / or Data Vault load(s).

Returned State

Meaning

Summarized Load Step

Applicable to

Initiating

Finding load parts to execute

Running

St, S, DV, J

Waiting

Waiting for a free processing slot

Running

St, S, DV, J

Loading

Staging Data or Loading the Vault

Running

St, S, DV, J

Incomplete

Parts of the load failed

Failed

S, J

Failed

The load failed

Failed

St, S, DV, J

Succeeded

The load was successful

Succeeded

St, S, DV, J