First approach of document created. Further writing according to the list below, will be added later
use case analysis
Information representation needed, annotatef with sourceendpoints and fields
Source specification
Breakdown of Endpoint data structure to single table representation
Fields, types, business key parts, parsing rules, increment pattern, tracking/deletion detection method
Tools: metadata discovery, content analysis
Result: Pipelines with field list
data vault modelling
Table structure and mapping of fields
Tool/Resource: Modelling tool, already established model
Vault model completion and verification
Full/Conform naming of tables and columns
Essential Naming of key and diff hash columns
Integration into established model (no conflicts)
Tools: Generators, check routines
implementation
Deployable and ""executable"" Artifact for
- Deployment of DB tables
- Processing and loading incoming data
Bandwidth of methods
- Can be just the dvpd = full generic engine.
- Dvpd + copy of current engine
- Generated process (dvpd only provided as documentation)
- Generated template, with final manual work
(Discussion about pro/cons of full coded artifacts against generic solutions)
Generation of Fetch
Test of pipeline
- All increment scenarios
- All historization scenarios
Tools: generated vault to source views, generated testdata (variety, change over time)
Deployment
operations
usage of data
Tools: Vault model, columns, types, comments, linage