GithubHelp home page GithubHelp logo

tri2820 / airflow-dags Goto Github PK

View Code? Open in Web Editor NEW

This project forked from superdao-inc/airflow-dags

0.0 0.0 0.0 129 KB

Shell 0.15% Python 76.60% Makefile 0.10% PLpgSQL 22.84% Dockerfile 0.31%

airflow-dags's Introduction

Superdao airflow

Repository contains airflow pipelines (DAGs) that derive blockchain data from dedicated node via Ethereum ETL to Postgres database with quite plot structure.

Required airflow connections:

clickhouse-eth-data - connection to ClickHouse for raw data and DBT models google-cloud - connection to Google Cloud Storage (for transfer datasets between databases) pg-prod-scoring-api - connection to Postgres for showcase API is hosted slack_notifications - connection to Slack (for notifications)

Project structure

  • /dags - the main package containing definitions of our airflow dags
  • /common - common modules
  • /deployments/local - contains docker-compose file for local debugging
  • /dbt_clickhouse - contains DBT models

Directory Descriptions

  • analytics: Contains DAGs related to analytics tasks, such as generating reports, data analysis, or running machine learning models.
  • api: Includes DAGs related to API integration tasks, such as fetching and processing data from external APIs.
  • attributes: Contains DAGs focused on extracting and processing attribute-related data, such as wallet attributes or labels.
  • audiences: Holds DAGs related to audience-related tasks, such as creating and updating audience segments.
  • chains/ethereum_etl: Contains DAGs specifically related to Ethereum ETL tasks, handling the extraction, transformation, and loading of Ethereum blockchain data.
  • control: Contains DAGs that serve as control mechanisms or orchestrators for other DAGs.
  • dbt: Holds DAGs related to DBT (Data Build Tool) tasks, responsible for transforming and modeling the extracted data.
  • deanonimization: Includes DAGs related to de-anonymization tasks, linking anonymized data to specific individuals or entities.
  • ens: Contains DAGs related to Ethereum Name Service (ENS) tasks, including data extraction, processing, and storage.
  • erc_1155: Holds DAGs specifically related to ERC-1155 token tasks, including data extraction, processing, and storage.
  • external: Includes DAGs that interact with external systems or services outside of the Airflow environment.
  • ml: Contains DAGs related to machine learning tasks, such as training models and running data pipelines.
  • nft_holders: Holds DAGs specifically related to NFT (Non-Fungible Token) holder tasks, including data extraction, processing, and storage.
  • report: Contains DAGs related to generating and delivering reports based on the extracted data.
  • snapshot: Holds DAGs related to snapshotting or creating snapshots of the blockchain data at specific points in time.
  • tests: Includes DAGs specifically designed for testing purposes, such as testing data pipelines or validating data quality.
  • top_collections: Contains DAGs related to tasks involving top collections or popular items within a collection.
  • utility: Holds utility or helper DAGs providing common functions or tasks used across other DAGs.
  • monodag.py: Represents a single DAG that encapsulates a specific workflow or task.

Pipeline Details

The airflow DAGs in this repository follow the following pipeline:

  • Load raw data: The DAG uses the open-source tool named Ethereum ETL (https://github.com/blockchain-etl/ethereum-etl) to download raw data from the Ethereum blockchain.

  • Insert into ClickHouse: The downloaded raw data is then inserted into ClickHouse, a columnar database optimized for analytics.

  • Business Analytics with DBT: The data stored in ClickHouse is further processed using DBT (https://github.com/dbt-labs/dbt-core) to calculate several business analytics entities. DBT provides a powerful toolkit for transforming and modeling data.

  • API Integration: The resulting tables or models from DBT are then utilized in an API, allowing users to access the derived analytics data.

Debug on localhost

make run-local - running local airflow instance with CeleryExecutor and all production connections (postgres, clickhouse, google-cloud)

make down - stopping a local airflow instance and deleting all containers and volumes

make delete - the same as make down above, only with removal all images

airflow-dags's People

Contributors

matterai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.