GithubHelp home page GithubHelp logo

cimt-ag / data_vault_pipelinedescription Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 42.51 MB

A concept and syntax to provide a universal data format, for storing all essential informations, that are needed to implement or generate a data loading process for a data vault model.

Home Page: https://www.cimt-ag.de/leistungen/data-vault-pipeline-description/

License: Apache License 2.0

PLpgSQL 2.16% Python 95.41% Java 1.86% Batchfile 0.58%
data-warehousing datavault20 data-vault

data_vault_pipelinedescription's Issues

add testcase "Hub only"

"This should only load a hub.
Main concern, if a plan will be generated whithout any ""leaf table"""

Add deletion detection properties

"A source may have a complete or partitioned set of a business objects. On this knowledge an internal deletion detection can be implemented. "

  • define syntax
  • load and check syntax with compiler
  • provide derivations by compiler
  • provide test cases for different scenarios

Check for conflict of mapping when using multiple relations

When a table is targeted by multiple relations, the resulting column structure must be the same for all relations
same number of columns
same column names and types

if this is violated, the compiler must log a proper error message and stop

There is already a stub function in the code:
check_multifield_mapping_consistency_of_column

test: modify order in key hash

This test should determine, if the declaration of the field order for a key hash is transported correctly to the DVPI and
has an impact on the DVPI summary

  • Test created

Describe development and deployment using DVPD in more detail to make ecosystem more clear

First approach of document created. Further writing according to the list below, will be added later

use case analysis

Information representation needed, annotatef with sourceendpoints and fields

Source specification

Breakdown of Endpoint data structure to single table representation

Fields, types, business key parts, parsing rules, increment pattern, tracking/deletion detection method

Tools: metadata discovery, content analysis

Result: Pipelines with field list

data vault modelling

Table structure and mapping of fields

Tool/Resource: Modelling tool, already established model

Vault model completion and verification

Full/Conform naming of tables and columns
Essential Naming of key and diff hash columns
Integration into established model (no conflicts)

Tools: Generators, check routines

implementation

Deployable and ""executable"" Artifact for

  • Deployment of DB tables
  • Processing and loading incoming data
    Bandwidth of methods
  • Can be just the dvpd = full generic engine.
  • Dvpd + copy of current engine
  • Generated process (dvpd only provided as documentation)
  • Generated template, with final manual work

(Discussion about pro/cons of full coded artifacts against generic solutions)

Generation of Fetch

Test of pipeline

  • All increment scenarios
  • All historization scenarios

Tools: generated vault to source views, generated testdata (variety, change over time)

Deployment

  • Schedule

operations

usage of data

Tools: Vault model, columns, types, comments, linage

Standardize declaration and processing of data types

"Depends on target database and source system or fetch processing
Translation and normalisation is respobility of processing engine.
Recommendenstion: upper case, remove spaces, check syntax
Possible support: separate configuration JSON, mapping of source to target types for every product (needs a optional product specification in the DVPD for source and targets)"

add ink_key_explicit_content_order[]

  • Challenge: How to declare explicitly the recursive content? Allow usage of field names, indicated by prefixing with "!" (maybe force the use of field names in case of recursive parents)

Test; modify priority in diff hash

This test should determine, if the declaration of the field order for a diff hash is transported correctly to the DVPI and
has an impact on the DVPI summary

  • Test created

suppress redundant row hash stage columns

"When the same satellite content is loaded for all field groups, currently the process mapping and staging provide a row hash for every field group. Maybe it is possible zu reduce this to one row hash.
The probability of occurence for this constellation is very low, so dont hurry."

Use position in fields array as defintion of column position

Currently the position of a filed in the incoming resultset must be declare explicitly. This is due to the fact, postgresSQL array expandation hat no way to provide the index of the array. If somehow this can be fixed, we could remove the explicit index.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.