GithubHelp home page GithubHelp logo

stream2db's Introduction

Stream2db

Stream json documents or a csv file to a backend. Currently we support mongodb and elasticsearch. More backends could be added easily using the node etl driver.

install

npm install -g git+https://[email protected]:kyv/stream2db.git

Examples

Import compranet from streaming source to elasticsearch.

Since ellison hashes documents before sending them over the wire, those streams will get checked for data corruption.

stream2db https://excel2json.herokuapp.com/https://compranetinfo.funcionpublica.gob.mx/descargas/cnet/Contratos2013.zip

Use CODIGO_CONTRATO as _id

If you do not provide an ID field (--id) a random ID will be generated. If you do set the ID new documents with the same ID will replace their predecessors.

stream2db -i CODIGO_CONTRATO https://excel2json.herokuapp.com/https://compranetinfo.funcionpublica.gob.mx/descargas/cnet/Contratos2013.zip

Import CSV into cargografias index on elasticsearch

You can use a csv file as your data source.

stream2db -d cargografias ~/Downloads/Cargografias\ v5\ -\ Nuevos_Datos_CHEQUEATON.csv

Options

You can set some options on the commandline.

stream2db -h|--help
--backend DATA BACKEND   Backend to save data to. [mongo|elastic]
--db INDEX|DB            Name of the index (elastic) or database (mongo) where data is written
--type TYPE|COLLECTION   Mapping type (elastic) or collection (mongo).
--id ID                  Specify a field to be used as _id. If hash is specified the object hash will be used
--uris URIS              Space separated list of urls to stream
--host HOST              Host to stream to. Default is localhost
--port PORT              Port to stream to. Defaults to 9500 (elastic) or 27017 (mongo)
--converter JAVASCRIPT MODULE   Pass data trough some predefined conversion function
--help                   Print this usage guide.

Debugging

The --verbose flag triggers debugging mode of the DB driver. En elasticsearch this is set to log: trace. The mongo driver allows for configuration by way of variables in the enviornment.

conversion

You can add arbitrary data conversion using by exporting a default function from some file in the converters directory and passing the name of that file with the option --converter. A conversion to OCDS has been added as an example. To use it you would add --converter ocds to your commandline.

Notes

As we are targeting local data management, we have not yet added DB authorization. This will get added to the parameters.

Cleanup

strings are normalized and trimmed.

Type coercion

We do very simple type coercion. Numbers should work. Anything else you want to do can be easily implemented with a converter.

Hashes

We add the field hash to the indexed document. You can use it however you like.

k8s

We produce a docker image which you can use with the *CronJob.yaml files found here to run this code as a cronJob on kubernetes.

stream2db's People

Contributors

kyv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.