GithubHelp home page GithubHelp logo

orenelias / datayoga Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datayoga-io/datayoga

0.0 0.0 0.0 1.38 MB

streaming data pipeline platform

Home Page: https://datayoga-io.github.io/datayoga/

License: Apache License 2.0

Python 100.00%

datayoga's Introduction

Introduction

DataYoga is a framework for building and running streaming or batch data pipelines. DataYoga uses low-code to easily define data pipelines using a declarative markup language using YAML files.

PyPI - License PyPI PyPI - Python Version

DataYoga overview

Concepts

Job - A Job is composed of a series of Steps that reads information from a source, performs transformations, and write to a target. Many sources and targets are supported, including relational databases, non relational databases, file formats, cloud storage, and HTTP servers.

Step - Each Step runs a Block that uses specific business logic. The output of each Step is fed into the next Step, creating a chain of transformations.

Blocks - The Block defines the business logic. Blocks can:

  • Read and write from relational and non relational databases
  • Read, write, and parse data from local storage and cloud storage
  • Perform transformations, modify structure, add computed fields, rename fields, or remove fields
  • Enrich data from external sources and APIs

DataYoga Runtime

DataYoga provides a standalone stream processing engine, the DataYoga Runtime that validates and run Transformation Jobs. The Runtime provides:

  • Validation
  • Error handling
  • Metrics and observability
  • Credentials management

The Runtime supports multiple stream processing strategies including buffering and rate limiting. It supports both async processing, multi-threading, and multi-processing to enable maximum throughput with a low footprint.

Quickstart

pip install datayoga

Verify that the installation completed successfully by running this command:

datayoga --version

Create New DataYoga Project

To create a new DataYoga project, use the init command:

datayoga init hello_world
cd hello_world

Directory structure

Run Your First Job

Let's run our first job. It is pre-defined in the samples folder as part of the init command:

datayoga run sample.hello

If all goes well, you should see some startup logs, and eventually:

{"id": "1", "fname": "john", "lname": "doe", "credit_card": "1234-1234-1234-1234", "country_code": "972", "country_name": "israel", "gender": "M", "full_name": "John Doe", "greeting": "Hello Mr. John Doe"}
{"id": "2", "fname": "jane", "lname": "doe", "credit_card": "1000-2000-3000-4000", "country_code": "972", "country_name": "israel", "gender": "F", "full_name": "Jane Doe", "greeting": "Hello Ms. Jane Doe"}
{"id": "3", "fname": "bill", "lname": "adams", "credit_card": "9999-8888-7777-666", "country_code": "1", "country_name": "usa", "gender": "M", "full_name": "Bill Adams", "greeting": "Hello Mr. Bill Adams"}

That's it! You've created your first job that loads data from CSV, runs it through a series of transformation steps, and shows the data to the standard output. A good start. Read on for a more detailed tutorial or check out the reference to see the different block types currently available.

datayoga's People

Contributors

spicy-sauce avatar zalmane avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.