GithubHelp home page GithubHelp logo

avaradharaju / dagster Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dagster-io/dagster

0.0 1.0 0.0 233.48 MB

A Python library for building data applications: ETL, ML, Data Pipelines, and more.

License: Apache License 2.0

Makefile 0.16% Python 76.39% Jupyter Notebook 1.20% Shell 0.27% HTML 0.03% TypeScript 21.08% JavaScript 0.19% Dockerfile 0.16% Scala 0.36% Mako 0.05% CSS 0.02% Smarty 0.09%

dagster's Introduction



Dagster

Dagster is a system for building modern data applications.

  • Elegant programming model: Dagster is a set of abstractions for building self-describing, testable, and reliable data applications. It embraces the principles of functional data programming; gradual, optional typing; and testability as a first-class value.

  • Flexible & incremental: Dagster integrates with your existing tools and infrastructure, and can invoke any computation–whether it be Spark, Python, a Jupyter notebook, or SQL. It is also designed to deploy to any workflow engine, such as Airflow.

  • Beautiful tools: Dagster's development environment, dagit–designed for data engineers, machine learning engineers, data scientists–enables astoundingly productive local development.

Getting Started

Installation

pip install dagster dagit

This installs two modules:

  • dagster | The core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
  • dagit | A UI and rich development environment for Dagster, including a DAG browser, a type-aware config editor, and a streaming execution interface.

Hello dagster 👋

hello_dagster.py

from dagster import execute_pipeline, pipeline, solid


@solid
def get_name(_):
    return 'dagster'


@solid
def hello(context, name: str):
    context.log.info('Hello, {name}!'.format(name=name))


@pipeline
def hello_pipeline():
    hello(get_name())

Let's execute our first pipeline via any of three different mechanisms:

  • From arbitrary Python scripts, use dagster’s Python API

    if __name__ == "__main__":
        execute_pipeline(hello_pipeline)  # Hello, dagster!
  • From the command line, use the dagster CLI

    $ dagster pipeline execute -f hello_dagster.py -n hello_pipeline
  • From the Dagit GUI

    $ dagit -f hello_dagster.py -n hello_pipeline

    And then navigate to http://localhost:3000 to start using Dagit

Learn

Next, jump right into our tutorial, or read our complete documentation. If you're actively using Dagster or have questions on getting started, we'd love to hear from you:


Contributing

For details on contributing or running the project for development, check out our contributing guide.

Integrations

Dagster works with the tools and systems that you're already using with your data, including:

Integration Dagster Library
Apache Airflow dagster-airflow
Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs.
Apache Spark dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and Pyspark.
Dask dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed.
Datadog dagster-datadog
Provides a Dagster resource for publishing metrics to Datadog.
 /  Jupyter / Papermill dagstermill
Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines.
PagerDuty dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows.
Snowflake dagster-snowflake
A library for interacting with the Snowflake Data Warehouse.
Cloud Providers
AWS dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with S3, EMR, and (coming soon!) Redshift.
GCP dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with BigQuery and Cloud Dataproc.

This list is growing as we are actively building more integrations, and we welcome contributions!

Example Projects

Several example projects are provided under the examples folder demonstrating how to use Dagster, including:

  1. examples/airline-demo: A substantial demo project illustrating how these tools can be used together to manage a realistic data pipeline.
  2. examples/event-pipeline-demo: An example illustrating a typical web event processing pipeline with S3, Scala Spark, and Snowflake.

dagster's People

Contributors

schrockn avatar mgasner avatar alangenfeld avatar helloworld avatar prha avatar bengotow avatar freiksenet avatar asingh16 avatar abegong avatar yuhan avatar natekupp avatar catherinewu avatar puma314 avatar kevinrodriguez-io avatar pedronauck avatar aylr avatar cclauss avatar jbrambledc avatar thethingstheycoded avatar rparrapy avatar ramshackle-jamathon avatar pseudopixels avatar habibutsu avatar jinnovation avatar jkimbo avatar jtmiclat avatar jmswaney avatar kdungs avatar rockymeza avatar zzztimbo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.