bcodmo / pipeline-generator Goto Github PK
View Code? Open in Web Editor NEWGenerates a pipeline .yml file
Generates a pipeline .yml file
Make a dataflow .py for pivot/unpivot and see if it will work for our identified use-cases.
If this works out we can add flows directly into the pipeline yamls.
Example custom flow called from a pipeline:
https://github.com/frictionlessdata/datapackage-pipelines#dataflows-integration
rename_fields
reorder_fields
fixed_width with load
https://github.com/BCODMO/pipeline-generator/releases/tag/v0.0.2
RE: @mbiddle-bcodmo request for including custom python.
Investigate Dataflows and how they are incorporated into FDPs
https://github.com/frictionlessdata/datapackage-pipelines#dataflows-integration
Need processor that takes a file list and creates all the needed steps to concatenate them all into one resource.
If we decide to integrate dataflows into our pipelines there could potentially be three different ways to find and replace.
The current way we are using, pipeline built-in processor way, has an issue where non-matches that are also blank cells are filled in with the string "None." Currently the only way to avoid this is to do a second find and replace using find pattern: "None" (without quotes) and leave the replace with field blank.
To fully leverage the "profile" property, let make BCO-DMO specific metadata an inline data resource.
https://frictionlessdata.io/specs/data-resource/
{
...
"resources": [
{
"id": "http://datadocs.bco-dmo.org/submissiox/yz123",
...the description...
},
{
"profile": "http://schema.bco-dmo.org/odo.json"
"format": "json",
"data": {
"@context": {
"odo": "http://ocean-data.org/schema/"
},
"@graph": [
{
"@type": "odo:Dataset",
"@id": "http://datadocs.bco-dmo.org/submissiox/yz123"
}
]
},
}]
...
}
The profile JSON file http://schema.bco-dmo.org/odo.json
would help us validate the required information for ingest. For example, does the Dataset have a name, are all resources in the data pkg described, etc.
https://github.com/frictionlessdata?utf8=%E2%9C%93&q=datapackage-pipelines&type=&language=
Is there something in the pattern here, that we should pivot towards?
Installation of custom processors into a Docker container:
https://github.com/frictionlessdata/datapackage-pipelines/blob/master/Dockerfile#L9
Example: datapackage-pipelines-aws provides aws.dump.to_s3
https://github.com/frictionlessdata/datapackage-pipelines-aws
QUESTION: how do we package our bcodmo_pipeline so that we can also include our custom processors in a Dockerfile, something like:
FROM frictionlessdata/datapackage-pipelines:1.7.1
COPY bcodmo_pipeline /bcodmo_pipeline
ENV DPP_PROCESSOR_PATH=/bcodmo_pipeline
????
@akariv says, "you can pip install
[custom processors] in case these are datapackage-pipelines
extension packages (such as datapackage-pipelines-aws
).
If these are processors of your own, you should add them in the container (using Docker's ADD
or COPY
commands) and then set the DPP_PROCESSOR_PATH
environment variable so that they become discoverable by dpp
"
QUESTION: Do we make datapackage-pipelines-bcodmo_pipeline
?
https://github.com/frictionlessdata/datapackage-pipelines#plugins-and-source-descriptors
What needs to be done in the flow
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.