The data_vault_pipelinedescription from cimt-ag

Currently there can be only one additional HK of a Hub in a link. (Hard coded as H1 and H0) For the very rare case of a multi back reference in the same relation, this would be not sufficient.

Check for conflict of mapping when using multiple relations

When a table is targeted by multiple relations, the resulting column structure must be the same for all relations
same number of columns
same column names and types

if this is violated, the compiler must log a proper error message and stop

There is already a stub function in the code:
check_multifield_mapping_consistency_of_column

suppress redundant row hash stage columns

"When the same satellite content is loaded for all field groups, currently the process mapping and staging provide a row hash for every field group. Maybe it is possible zu reduce this to one row hash.
The probability of occurence for this constellation is very low, so dont hurry."

Check, diff hash given for not historized reference table

a not historized raference table must not declare a diff hash columns

test: modify order in key hash

This test should determine, if the declaration of the field order for a key hash is transported correctly to the DVPI and
has an impact on the DVPI summary

Test created

Test: recursive link having also relation to other hub

For recursion allow explicit declaraion of parent key column names in the link table

add declaration of key name to recursion declaration of link

Test Case: 2 Links to different hubs with different relations

Case aligned with current variation catalog
test case(s) created (3110-3130,3170-3210)

add option for explicit naming of stage table

Check: field group for normal sat must be valid on parent

"Explicit for bk or dck
Implicit for links"

add link_key_assemble_rule in link attributes

specification (incl default of data vault profile)
parsing
presentation in table properties

Add optional explicit Businesskey declaration to HUB Table definition

Declaration will be used to check and force the field mappings to meet these names and types and provide consistent declarations about hashing concatenation

Test; modify priority in diff hash

This test should determine, if the declaration of the field order for a diff hash is transported correctly to the DVPI and
has an impact on the DVPI summary

Test created

BUG: Check if parent of link is defined, does not work

add link_key_assemble_rule in data vault profile

Add comments to all dictionary views

Add exculsion of ghost records, document HUB Elt generator

add ink_key_explicit_content_order[]

Challenge: How to declare explicitly the recursive content? Allow usage of field names, indicated by prefixing with "!" (maybe force the use of field names in case of recursive parents)

Extend process plan, to provide "hub and link" only plans

Check an adapt to provide Plans for tables, without connected satellites.

Test: Recursives to 2 different hubs (4 Relations)

Case checked and aligned with new catalog (see 3490)

Ref insert / update Generator (plpgsql)

Sat enddate Generator (plpgsql)

Use position in fields array as defintion of column position

Currently the position of a filed in the incoming resultset must be declare explicitly. This is due to the fact, postgresSQL array expandation hat no way to provide the index of the array. If somehow this can be fixed, we could remove the explicit index.

Standardize declaration and processing of data types

"Depends on target database and source system or fetch processing
Translation and normalisation is respobility of processing engine.
Recommendenstion: upper case, remove spaces, check syntax
Possible support: separate configuration JSON, mapping of source to target types for every product (needs a optional product specification in the DVPD for source and targets)"

Check if relation_name for field mapping is declared in link

Describe development and deployment using DVPD in more detail to make ecosystem more clear

First approach of document created. Further writing according to the list below, will be added later

use case analysis

Information representation needed, annotatef with sourceendpoints and fields

Source specification

Breakdown of Endpoint data structure to single table representation

Fields, types, business key parts, parsing rules, increment pattern, tracking/deletion detection method

Tools: metadata discovery, content analysis

Result: Pipelines with field list

data vault modelling

Table structure and mapping of fields

Tool/Resource: Modelling tool, already established model

Vault model completion and verification

Full/Conform naming of tables and columns
Essential Naming of key and diff hash columns
Integration into established model (no conflicts)

Tools: Generators, check routines

implementation

Deployable and ""executable"" Artifact for

Deployment of DB tables
Processing and loading incoming data
Bandwidth of methods
Can be just the dvpd = full generic engine.
Dvpd + copy of current engine
Generated process (dvpd only provided as documentation)
Generated template, with final manual work

(Discussion about pro/cons of full coded artifacts against generic solutions)

Generation of Fetch

Test of pipeline

All increment scenarios
All historization scenarios

Tools: generated vault to source views, generated testdata (variety, change over time)

cimt-ag / data_vault_pipelinedescription Goto Github PK

data_vault_pipelinedescription's Introduction

Data Vault Pipeline Description (DVPD)

The concept in "3 words"

Motivation

What you find in this repository

Concept Documentation

Reference implementation

data_vault_pipelinedescription's People

Contributors

Stargazers

Watchers

data_vault_pipelinedescription's Issues

use case analysis

Source specification

data vault modelling

Vault model completion and verification

implementation

Test of pipeline

Deployment

operations

usage of data

Recommend Projects

Recommend Topics

Recommend Org

Jobs