GithubHelp home page GithubHelp logo

frantic / dagster Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dagster-io/dagster

0.0 1.0 0.0 288.35 MB

A data orchestrator for machine learning, analytics, and ETL.

Home Page: https://dagster.io

License: Apache License 2.0

Makefile 0.04% Python 79.85% Jupyter Notebook 1.28% Shell 0.05% HTML 0.07% TypeScript 18.11% JavaScript 0.12% Dockerfile 0.17% Scala 0.19% Mako 0.03% CSS 0.02% Smarty 0.06%

dagster's Introduction



Dagster

Dagster is a data orchestrator for machine learning, analytics, and ETL

Dagster lets you define pipelines in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of pipelines and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke.

Dagster is designed for data platform engineers, data engineers, and full-stack data scientists. Building a data platform with Dagster makes your stakeholders more independent and your systems more robust. Developing data pipelines with Dagster makes testing easier and deploying faster.

Develop and test on your laptop, deploy anywhere

With Dagster’s pluggable execution, the same pipeline can run in-process against your local file system, or on a distributed work queue against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.

Model and type the data produced and consumed by each step

Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.

Link data to computations

Dagster’s Asset Manager tracks the data sets and ML models produced by your pipelines, so you can understand how your they were generated and trace issues when they don’t look how you expect.

Build a self-service data platform

Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.

Avoid dependency nightmares

Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.

Debug pipelines from a rich UI

Dagit, Dagster’s web interface, includes expansive facilities for understanding the pipelines it orchestrates. When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, re-execute subsets of steps, and more.

Getting Started

Installation

pip install dagster dagit

This installs two modules:

  • Dagster: the core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
  • Dagit: the UI for developing and operating Dagster pipelines, including a DAG browser, a type-aware config editor, and a live execution interface.

Hello dagster 👋

hello_dagster.py

from dagster import execute_pipeline, pipeline, solid


@solid
def get_name(_):
    return 'dagster'


@solid
def hello(context, name: str):
    context.log.info('Hello, {name}!'.format(name=name))


@pipeline
def hello_pipeline():
    hello(get_name())

Save the code above in a file named hello_dagster.py. You can execute the pipeline using any one of the following methods:

(1) Dagster Python API

if __name__ == "__main__":
    execute_pipeline(hello_pipeline)   # Hello, dagster!

(2) Dagster CLI

$ dagster pipeline execute -f hello_dagster.py

(3) Dagit web UI

$ dagit -f hello_dagster.py

Learn

Next, jump right into our tutorial, or read our complete documentation. If you're actively using Dagster or have questions on getting started, we'd love to hear from you:


Contributing

For details on contributing or running the project for development, check out our contributing guide.

Integrations

Dagster works with the tools and systems that you're already using with your data, including:

Integration Dagster Library
Apache Airflow dagster-airflow
Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs.
Apache Spark dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and PySpark.
Dask dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed.
Datadog dagster-datadog
Provides a Dagster resource for publishing metrics to Datadog.
 /  Jupyter / Papermill dagstermill
Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines.
PagerDuty dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows.
Snowflake dagster-snowflake
A library for interacting with the Snowflake Data Warehouse.
Cloud Providers
AWS dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with Cloudwatch, S3, EMR, and Redshift.
Azure dagster-azure
A library for interacting with Microsoft Azure.
GCP dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with GCS, BigQuery, and Cloud Dataproc.

This list is growing as we are actively building more integrations, and we welcome contributions!

dagster's People

Contributors

abegong avatar ajnadel avatar alangenfeld avatar asingh16 avatar aylr avatar bengotow avatar catherinewu avatar cclauss avatar davidkatz-il avatar dpeng817 avatar fishmanl avatar freiksenet avatar gibsondan avatar hellendag avatar helloworld avatar jbrambledc avatar jmsanders avatar johannkm avatar kevinrodriguez-io avatar kinghuang avatar mgasner avatar nancydyc avatar natekupp avatar pedronauck avatar prha avatar puma314 avatar rexledesma avatar sd2k avatar sryza avatar yuhan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.