GithubHelp home page GithubHelp logo

zksync-era-etl's Introduction

zksync-era-ETL: on-chain data tool

Introduction

As Ethereum continues to evolve, the role of Layer 2 (L2) solutions like rollups becomes increasingly pivotal. These innovations are crucial in reducing transaction costs on Ethereum, but they also present new challenges, such as fragmented liquidity. In this rapidly changing landscape, leading L2 platforms are gaining prominence, and I anticipate that in the near future, a select few will handle the majority of significant transactions.

In this regard, zkSync stands out as a potential leader. Its continuous optimization positions it alongside other major L2 solutions like Optimism and Arbitrum. Recognizing zkSync's potential to become a 'Super Rollup', I developed zkSync-ETL. This tool is designed for efficient and real-time access to on-chain data, a crucial need for developers and analysts in the Ethereum ecosystem.

zkSync-ETL is an ongoing project, and warmly welcome ideas, feedback, and contributions. We ensure it remains a valuable resource for anyone looking to leverage the power of zkSync in their Ethereum-based applications.

Architecture

High-Level

The zkSync-ETL is structured into two primary components: the /data module for data storage, and the /era module for specific data processing tasks.

Data Acquisition (/rpc Module): This module interfaces with the zkSync RPC, where running a local node is advisable (see external node documentation for guidance). It retrieves raw block and transaction data in JSON format.

Data Processing (/json Module): Within the json module, raw data undergoes cleaning and processing. This transforms it into comprehensively clean data, currently comprising seven core tables: accounts, balances, blocks, contracts, SyncSwap swaps, token transfers, and transactions. Future updates aim to include data from mainstream DEXs, NFTs, and derivative protocols.

  • accounts
  • balances
  • blocks
  • contracts
  • SyncSwap swaps
  • token transfers
  • transactions

Database Management (/db Module): The db module is responsible for creating PostgreSQL tables and data schemas. It imports all data in CSV format into these tables. This setup enables the development of custom data programs akin to Dune, Nansen, and The Graph, utilizing zkSync data. Additionally, these datasets can be instrumental in researching the Ethereum and zkSync ecosystems.

Low-Level

  • /data

    • /json_raw_data: Raw JSON data of blocks & transactions.
    • /json_clean_data: Clean JSON data of all tables.
    • /json_to_csv: Clean CSV data of all tables, prepare for import to PostgreSQL DB.
  • /era

    • /rpc: Get raw JSON data from zkSync RPC.

      • /fetch: Call to get raw blocks and transactions data.
      • /trace: Call to get raw trace data.
    • /json: Convert raw JSON data to clean JSON/CSV data, and plus applications cleaner.

      • /structures: Define the data structure of the base tables.
      • /resolver: A tool that assists in converting the base table from raw data to clean data..
      • /cleaner: Important module to convert all raw JSON data to clean JSON and CSV data. Parsing for more applications will also be encapsulated in this module.
    • /db: Module for importing data into a database.

      • /schemas: Define the data structure of all tables in the PostgreSQL database.
      • /exporter: Import clean CSV data from all tables into the database.
    • /setup: Some basic setup.

      • /config: Block ranges, file size, folder size, RPC URL, etc.
      • /tokens: Token addresses for balance data.
    • /utils: All the utils crates used as dependencies of the module crates above.

How to use it

Create a VENV:

Recommended to use ETL by creating a virtual environment.

# Create venv
brew install pyenv
pyenv virtualenv 3.11.4 myenv

# Active, two methods both works
pyenv activate myenv
source ~/.pyenv/versions/myenv/bin/activate

Setup

In the /setup module, configure the block range, folder size, and RPC URL for data retrieval.

Block Range: Select the specific range of blocks to source your on-chain data.

Files Size and Folder Size: By default, data is stored in units of 10,000 per file and 100,000 per folder. Adjust these settings based on your storage preferences.

RPC URL: While the default setting is the zkSync public RPC, considering performance constraints, it's advisable to use a local node. For setup details, please refer to the zkSync official team's guidance.

# Example
FILE_SIZE = 10000  # 10k
FOLDER_SIZE = 100000  # 100k

START_BLOCK = 0
END_BLOCK = 1000000  # block 0 to 999,999
BATCH_SIZE = 100
MULTI_BATCH_SIZE = 100

BALANCE_BATCH_SIZE = 10

RPC_URL = 'https://mainnet.era.zksync.io'

Data Processing Procedure

# Get raw data from RPC
python -m scroll.rpc.fetch.call

# Get clean data from raw data
python -m scroll.json.cleaner.all

# Create schemas for DB
python -m scroll.db.schemas.create

# Import all data into DB
python -m scroll.db.exporter.all

Contribution

Contributions of any kind are welcome! ๐ŸŽ‰

zksync-era-etl's People

Contributors

luozhuzhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.