Operations

The operations module is all about orchestrating and monitoring the loading process of staging and data vault loads.

Status

The status lineage allows to get an immediate overview of the current loading states. As a difference to the data lineage, its flows are not colored by the system, but the state of each staging and data vault load.

../_images/status.png
  1. Search Bar

    Use the search bar to filter the status-lineage. Filtering works the same as in the data lineage.

  2. Load

    Loads are colored by their current status.

    • Green: Successfully loaded.

    • Light blue: Initialized, but waiting.

    • Blue: Loading.

    • Red: Failed.

    When hovering a load, the window will appear, giving details about the load state, such as start-time, duration and loaded rows.

    ../_images/load1.png

Jobs

The jobs module is the heart, automating and orchestrating the data flows in the Datavault Builder. The module allows to define multiple different packages of staging and data vault loads, giving you the maximum flexibility to define your custom jobs. However, if you compose manual jobs, the Datavault Builder will help you create an execution order by taking away the work of manually defining the correct and most efficient loading order. Focus on the loads to include in a job, and the Datavault Builder will optimize and parallelize the loads as good as possible.

Definition

A job is a package of loads (datavault / staging) on the granularity of one source system. It is possible to have multiple jobs per system, but not multiple systems per job.

Usage

At system creation, a default job is created, automatically taking up all newly created loads. Once you start modelling, in the background the job is automatically updated with each modification in your data integration process, in the end allowing you to directly trigger a reload of your integration flow.

Besides all the Automation, many parts can be customized, as you can see in the following chapters.

../_images/jobs.png
  1. Job List

    When clicking into the search-bar, a drowdown will appear, listing all existing jobs in the Datavault Builder.

  2. Job Load Status / Load Selection
    On the canvas, the loads within a job as well as the dependencies are visualized.
    In the status view, the color indicates the current status for each load. When using the view toggler, the selection view allows you to compose your custom job.
  3. View Toggler

    Click onto the toggler to switch between load status and load selection on the canvas.

  4. Job Actions
    ../_images/jobActions.bmp
    • Circled Arrow: Initiate Job Load

    • Switch: Enable/Disable Job

    • Screw nut: Edit Job

    • Trash bin: Delete Job

  5. Job Details

    Lists when the job started the last time, how long it did run as well as the next schedule. If you have multiple schedules defined for the same job, only the next run overall will appear.

  6. Schedules

    Lists defined schedules and lets you add another schedule to a job.

  7. SQL Query

    It is possible to define custom SQL commands, which will be executed every time the job completes. This allows you for instance to trigger the copy of data into another database or programm notifications about job status.

  8. Triggered by / Triggered on Completion: Jobs

    Lists and allows to define dependencies between jobs, which causes jobs to run after onea nother.

Creating a Job

As stated before, a default job is created on system creation. This default job is configured to automatically include newly created loads. The default job can be modified or deleted if needed.

Also, it is possible to create more jobs for a system. This can be the case, if only parts of the source should be loaded more often than others to improve the effectivety of the integration flow.

Steps
  1. Navigate to the jobs module

  2. Click onto Add Job

  3. Fill out the dialogue explained below

  4. Complete the creation dialogue.

  5. Continue with defining individual loads, schedules and job dependencies.

../_images/addJob.png
  1. Activation Switch

    Enable or disable the job as a whole.

  2. Source System

    Select a source system which was previously created in the staging module.

  3. Job Name

    Displayed name of the job.

  4. Job Suffix ID

    Partial technical ID of the job. Will compose the job ID in form of:

    [Source System ID] + ``_j_`` + [Job Suffix ID]
    
  5. Max. Parallel Running Loads

    Defines the number of parallely running loads overall in the system. If another job is executed at the same time, this will also impact the number of available loads.

  6. Auto Add New Elements

    When creating a new staging table or data vault element, the loads can automatically be added to a job. Enable this continuous integration by ticking the box. Otherwise new loads need to be added manually.

  7. Comment

    Custom notes about the job. Will appear in the documentation.

  8. Add Job

    Button to complete the job creation.

Defining loads within a Job

A job can be configured in two ways:

  1. Automatically take up new loads

  2. Only contain manually specified loads

In both cases, loads can be individually enabled/disabled. If a load is disabled in the first case (automatically take up new loads), then new loads will still be automatically added to the loads, but the selected ones are skipped.
In the second case (manually specified loads only), new loads are not automatically added, so that loads have to be picked indivually and only selected loads will be contained in the job.

If a load is removed in the Datavault or Staging, the load will as well be automatically be removed from the job. Be aware, that if you have previously excluded a load from a job, then removed it from the data vault and added it again, it has to be disabled again.

You can as well reconfigure a job from automatically adding new loads to manually selecting loads. In this case, the state is preserved, meaning that all loads added to the job up to this point will directly be included in the manual job.

Steps
  1. Go to the jobs module and open up an existing job from the list

  2. Toggle the view to activate the “Selection”-Appearance of the canvas

  3. Click onto a load to add it to / remove it from the job

When you are on the selection view, you will notice, that the connections in the lineage are colored in three different ways:

  • Purple: Selected load, will be executed on job execution

  • Dark Gray: Available load, could be selected, won’t be executed on job execution

  • Light Gray: Independent data flows, can’t be selected, represent virtual layers or loads from different source systems.

../_images/definingLoads.png

The selection composer will only allow you to select loads of the underlying source system of the job. If you would like to initiate dependent loads on job completion, please have a look at job dependencies.

Schedule a Job

The Datavault Builder consists of an integrated scheduling agent, allowing you to directly manage and configure the automation of reloading processes within the environment.

Definition

A schedule will trigger one individual job on a given timing.

Steps
  1. Navigate to the jobs module, open an existing job and click onto the located behind “Schedules” in the Job properties slider.

  2. Fill out the Base tab, allowing to define a start and end date.

  3. Fill out the scope, defining to load as full- or delta-load.

  4. Indicate the timing to initiate the job at.

Note

It is possible to have overlapping schedules, as well as job dependencies leading to a job possibly being initiate while already loading. To prevent the system from being queued up, the job will first check the existence of a running job and not initiate a second time.

Base

../_images/base4.png
  1. Activation Switch

    Enables/Disabled the schedule.

  2. Schedule Name

    Displayed name of the Schedule.

  3. Start Date / Time

    Only run the schedule past this initial date and time.

  4. End Date / Time

    Only run the schedule before this final date and time. Can be left out if infinite.

  5. Comment

    Individual notes about the schedule.

  6. Schedule ID

    Technical ID of the schedule. Is directly derived from the Name but can manually be edited.

Scope

The scope is the place where you can define a schedule as loading full or delta in the staging.

In case of a full load, all data from the contained source tables in the job will be copied.

In case of a delta load, the where clause parameter specified for each staging table will be used to define a delta-subset in the source. In this case, eventually used parameters can be assigned a fix or calculated value in the schedule definition.
For details: Working with full / delta loads

Note

If a schedule is configured as delta load and has dependent jobs which will be initiated, then the parameter-values from the first job triggered by a schedule will be passed down the the next job to be used in the delta-load.

../_images/scope.png
  1. Schedule Type

    Defines to load full or only delta. If full load, then no parameters have to be specified.

  2. Delta Load Parameters
    Name: Type in the name of the used parameter in the Delta-Load-Where-Clause of the staging table(s)
    Value: Assign a fix or dynamic value to the parameter, which will be filled in on each execution

    The value can as well be the result of a function call. For that purpose, three execution environments are possible:

    • Execute on source: Just add the functions to the value, it will be added to the where-clause and passed as is to the source

    • Execute in the Datavault Builder Core: For your convenience default functions can be added with the prefix DVB_CORE_FUNC.*

    • Execute on your client-database: Prefix your value with DVB_CDB_FUNC. and it will be run on the client-database.

    Note

    Make sure, that the function call returns a single string (text/varchar) as a result!

    For instance, create your own function in schema dvb_functions called get_last_fullmoon_date. To use it in the value field, add DVB_CDB_FUNC.dvb_functions.get_last_fullmoon_date().

    Note

    Pay attention to confirm the adding of a new parameter using the plus-symbol.

Timing

In the last step of adding a new schedule to a job, the timing is assigned.

Available options are:
  • Every Minute

  • Every Hour (minute)

  • Every Day (hours,minute)

  • Every Week (weekday,hour,minute)

  • Every Month (monthday/lastofmonth,hour,minute)

  • Every Year(month,monthday/lastofmonth,hour,minute)

Note

It is possible to also select multiple options within one step, as displayed in the example below, where the job will only run during the workdays and working hours.

../_images/timing.png
  1. Timing Selection

    Definition of the timing for a schedule. Multiple selections within a row are possible.

  2. Add Schedule

    Button to complete the creation of the schedule.

Job dependencies

Jobs can not only be triggered by schedules, but as well by other jobs. In this case, the staging load type (full / delta) as well as the parameters are inherited from the triggering job. The dialogues for the jobs Triggered by and Triggered on Completion work the same way, so we will only show one of both below.

Definition

A dependent job is triggered on completion of another job.

When adding a job to Triggered on Completion, that job will run after the first job is finished. Therefore, the triggering job will also appear in the Triggered by jobs list of the dependent job. Direct circular dependencies are not possible (a dependent job being the trigger for its triggering job). However, indirect circular dependencies are possible.

../_images/jobDependencies.png
  1. Job List

    Search the list for a specific job.

  2. Job Status / Action
    ../_images/status_action.png
    • Toggle: Indicates, whether the chosen dependent job is activated. For enabling/disabling open up the job.

    • Trash bin: Removes the dependency.

Post-Job SQL Query

Besides Job-Dependencies, the Datavault Builder as well supports execution of Post-Job SQL Queries, allowing to perform customized actions ones the loading process of the job is finished.

Definition

A Post-Job SQL Query is custom code, executed within the job, after all loads, but before triggering dependent jobs.

Steps
  1. Navigate to the jobs module, open an existing job and click onto the edit-symbol located behind “Default SQL Query” in the Job properties slider.

  2. Fill out the Pop-Up, defining your custom code.

  3. Activate the SQL Query.

Configuration

In the dvb_config.config, you can define a paramter called run_job_sql_query_on_core, allowing you to specify, whether the query will be executed on our core, or on the client/process database.

Example
The following code shows, when the code execution of the post-jobs sql query on the core makes sense.

Note

To run this example, set run_job_sql_query_on_core in the config to TRUE.

In this case, the job is loading data from CSV-Files. Ones those files are loaded, they should no longer be in the source folder, otherwise they would be reloaded each and every run. Therefore, we will move them into a subfolder after the load.
DO $$
DECLARE
_moved_files_count INTEGER;
BEGIN
    IF ({{job_completed}}) THEN
        _moved_files_count = dvb_core.f_move_files('test*.csv', 'loaded_files/bak');
        RAISE NOTICE 'Post Job SQL Script moved % source files.', _moved_files_count;
    END IF;
END;
$$;

As you notice, for the movement we use the core-function dvb_core.f_move_files, specifying which files should be moved into what directory. Also, we only move the files away, if the job was actually successful, meaning no load failed. For this purpose, the variable {{job_completed}} can be used, which will automatically be evaluated on execution time by the core.

Deleting a job

Before deleting a job, make sure, that it is not needed anymore because removing a job will also directly remove:

  • All schedules defined for the job

  • All SQL Queries within a job

  • All job dependencies

  • Inclusion logic for manually disabled / enabled loads within a job

Steps
  1. Open up the job in the jobs module

  2. Click onto the trash bin in the job properties.

  3. Confirm the deletion.

Load States

The are four basic load types in the Datavault Builder:

  • Staging load (St): A single staging table load is performed.

  • System load (S): All staging tables for a system are loaded.

  • Data Vault load (DV): A single load into a Data Vault object (Hub / Link / Satellite).

  • Job load (J): A (group of) Staging and / or Data Vault load(s).

Returned State

Meaning

Summarized Load Step

Applicable to

Initiating

Finding load parts to execute

Running

St, S, DV, J

Waiting

Waiting for a free processing slot

Running

St, S, DV, J

Loading

Staging Data or Loading the Vault

Running

St, S, DV, J

Incomplete

Parts of the load failed

Failed

S, J

Failed

The load failed

Failed

St, S, DV, J

Succeeded

The load was successful

Succeeded

St, S, DV, J