GithubHelp home page GithubHelp logo

blockchain-etl / eos-etl Goto Github PK

View Code? Open in Web Editor NEW
9.0 8.0 7.0 231 KB

ETL scripts for EOS.

License: MIT License

Dockerfile 0.43% Python 99.57%
eos eosio etl apache-beam blockchain-analytics crypto cryptocurrency data-analytics data-engineering gcp

eos-etl's Introduction

EOS ETL

Join the chat at https://gitter.im/ethereum-eth Build Status Join Telegram Group

Install EOS ETL:

pip install eos-etl

Export blocks, transactions and actions (Schema, Reference):

> eosetl export_blocks --start-block 1 --end-block 500000 \
--provider-uri http://api.main.alohaeos.com \
--blocks-output blocks.json --transactions-output transactions.json --actions-output actions.json

Stream blockchain data continually to console:

> pip install eos-etl[streaming]
> eosetl stream -p http://api.main.alohaeos.com --start-block 500000

For the latest version, check out the repo and call

> pip install -e .[streaming] 
> python eosetl.py

Table of Contents

Schema

Exporting the Blockchain

  1. Install python 3.6.0+ https://www.python.org/downloads/

  2. Install EOS node or get access to EOS node maintained by someone else (because running your own node is not so easy). Some docs:

  1. Make sure it downloaded the blocks that you need by executing in the terminal:
curl --request POST \
  --url https://localhost:8080/v1/chain/get_info \
  --header 'accept: application/json'

You can export blocks below last_irreversible_block_num, there is no need to wait until the full sync

  1. Install EOS ETL:

    > pip install eos-etl
  2. Export blocks, transactions and actions:

    > eosetl export_all --start 1 --end 499999  \
    --provider-uri http://api.main.alohaeos.com

    In case eosetl command is not available in PATH, use python -m eosetl instead.

    The result will be in the output subdirectory, partitioned in Hive style:

    output/blocks/start_block=00000000/end_block=00000099/blocks_00000000_00000099.csv
    output/blocks/start_block=00000100/end_block=00000199/blocks_00000100_=00000199.csv
    ...
    output/transactions/start_block=00000000/end_block=00000099/transactions_00000000_00000099.csv
    ...
    output/actions/start_block=00000000/end_block=00000099/actions_00000000_00000099.csv
    ...

Running in Docker

  1. Install Docker https://docs.docker.com/install/

  2. Build a docker image

    > docker build -t eos-etl:latest .
    > docker image ls
  3. Run a container out of the image

    > MSYS_NO_PATHCONV=1 docker run -v $HOME/output:/eos-etl/output eos-etl:latest \
        export_blocks --max-workers 50 --start-block 30000000 \
        --end-block 30000100 --provider-uri http://your_eos_node:node_port \
        --blocks-output ./output/blocks.csv --transactions-output ./output/transactions.csv \
        --actions-output ./output/actions.csv
  4. Run streaming to console or Pub/Sub

    > MSYS_NO_PATHCONV=1 docker build -t eos-etl:latest-streaming -f Dockerfile_with_streaming .
    > echo "Stream to console"
    > MSYS_NO_PATHCONV=1 docker run eos-etl:latest-streaming stream -p http://api.main.alohaeos.com --start-block 500000
    > echo "Stream to Pub/Sub"
    > MSYS_NO_PATHCONV=1 docker run -v /path_to_credentials_file/:/eos-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/eos-etl/credentials_file.json eos-etl:latest-streaming stream -p http://api.main.alohaeos.com --start-block 500000 --output projects/your-project/topics/crypto_eos
  5. Refer to https://github.com/blockchain-etl/blockchain-etl-streaming for deploying the streaming app to Google Kubernetes Engine.

Command Reference

All the commands accept -h parameter for help, e.g.:

> python eosetl.py export_blocks --help
Usage: eosetl.py export_blocks [OPTIONS]

  Export blocks, transactions and actions.

Options:
  -s, --start-block INTEGER   Start block
  -e, --end-block INTEGER     End block  [required]
  -p, --provider-uri TEXT     The URI of the remote EOS node
  -w, --max-workers INTEGER   The maximum number of workers.
  --blocks-output TEXT        The output file for blocks. If not provided
                              blocks will not be exported. Use "-" for stdout
  --transactions-output TEXT  The output file for transactions. If not
                              provided transactions will not be exported. Use
                              "-" for stdout
  --actions-output TEXT       The output file for actions. If not provided
                              transactions will not be exported. Use "-"
                              for stdout
  --help                      Show this message and exit.

For the --output parameters the supported type is json. The format type is inferred from the output file name.

export_blocks

> python eosetl.py export_blocks --start-block 1 --end-block 500000 \
  --provider-uri http://api.main.alohaeos.com \
  --blocks-output blocks.json --transactions-output transactions.json --actions-output actions.json

Omit --blocks-output or --transactions-output or --actions-output options if you want to export only transactions/blocks/actions.

You can tune --max-workers for performance.

get_block_range_for_date

> python eosetl.py get_block_range_for_date --provider-uri http://api.main.alohaeos.com --date=2018-06-09

export_all

> python eosetl.py export_all --provider-uri http://api.main.alohaeos.com --start 2018-06-08 --end 2018-06-09

You can tune --export-batch-size, --max-workers for performance.

stream

> python eosetl.py stream --provider-uri http://api.main.alohaeos.com --start-block 500000
  • This command outputs blocks and transactions to the console by default.
  • Use --output option to specify the Google Pub/Sub topic where to publish blockchain data, e.g. projects/your-project/topics/eos_blockchain.
  • The command saves its state to last_synced_block.txt file where the last synced block number is saved periodically.
  • Specify either --start-block or --last-synced-block-file option. --last-synced-block-file should point to the file where the block number, from which to start streaming the blockchain data, is saved.
  • Use the --lag option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to handle chain reorganizations - they are less likely the further a block from the head.
  • You can tune --period-seconds, --batch-size, --max-workers for performance.

Running Tests

> pip install -e .[dev]
> echo "The below variables are optional"
> export EOSETL_PROVIDER_URI=http://api.main.alohaeos.com
> pytest -vv

Running Tox Tests

> pip install tox
> tox

Public Datasets in BigQuery

TODO: https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them

eos-etl's People

Contributors

medvedev1088 avatar vasiliy-bondarenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eos-etl's Issues

Overview: Implement ETL scripts for blocks, transactions, and actions

The scope of this task is:

  1. Implement a job for exporting blocks, transactions and actions using export_blocks_and_transactions.py as a basis. Result should be in 3 CSV files: blocks.csv, transactions.csv, actions.csv.
  2. Implement tests using test_export_blocks_job.py as a basis.
  3. Implement get_block_range_for_date using get_block_range_for_date as a basis.
    Optional
  4. Implement Dockerfile
  5. Write documentation in README using Exporting the Blockchain as a basis.
  6. Test the scripts on a bigger range of blocks (5 days) on latest data to make sure the script don't fail.

Notes/questions:

  1. There are a few EOS clients in Python to choose from https://github.com/Netherdrake/awesome-eos#python. The simplest one with minimal dependencies is preferable.
  2. Does EOS api allow batching requests into a single HTTP payload?
  3. Does this api https://developers.eos.io/eosio-nodeos/reference#get_block return full transactions or only hashes?
  4. Are block timestamps in EOS monotonic? If yes get_block_range_for_date can be optimized. Read this for reference https://twitter.com/EvgeMedvedev/status/1073844856009576448.
  5. Reuse blockchainetl package from bitcoin-etl https://github.com/blockchain-etl/bitcoin-etl/tree/master/blockchainetl. Don't worry about copy-pasting the code for now. This package will be moved to a separate repo later and included as a dependency.
  6. Read this article to understand the overall architecture https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset.

Invalid \escape error on blocks 24224015 to 24396469

[2019-06-29 08:14:25,150] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 08:14:25,150] {build_export_dag.py:76} INFO - Calling get_block_range_for_date(http://api.main.alohaeos.com, 2018-10-30T12:00:00+00:00, ...)
[2019-06-29 08:14:41,906] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 08:14:41,906] {build_export_dag.py:92} INFO - Calling export_blocks(24224015, 24396469, 10, http://api.main.alohaeos.com, 11, ...)
[2019-06-29 08:14:41,908] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 08:14:41,908] {progress_logger.py:51} INFO - Started work. Items to process: 172455.
[2019-06-29 08:56:40,559] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 08:56:40,556] {progress_logger.py:70} INFO - 17246 items processed. Progress is 10%.
[2019-06-29 09:35:36,825] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 09:35:36,825] {progress_logger.py:70} INFO - 34491 items processed. Progress is 20%.
[2019-06-29 10:12:28,643] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 10:12:28,643] {progress_logger.py:70} INFO - 51737 items processed. Progress is 30%.
[2019-06-29 10:27:10,412] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 10:27:10,409] {progress_logger.py:70} INFO - 68982 items processed. Progress is 40%.
[2019-06-29 10:51:57,767] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 10:51:57,767] {progress_logger.py:70} INFO - 86228 items processed. Progress is 50%.
[2019-06-29 11:04:54,944] {base_task_runner.py:101} INFO - Job 4314: Subtask export_blocks [2019-06-29 11:04:54,943] {progress_logger.py:70} INFO - 103473 items processed. Progress is 60%.
[2019-06-29 11:09:19,241] {models.py:1796} ERROR - Invalid \escape: line 1 column 19136 (char 19135)
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/models.py", line 1659, in _run_raw_tas
    result = task_copy.execute(context=context
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 103, in execut
    return_value = self.execute_callable(
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 108, in execute_callabl
    return self.python_callable(*self.op_args, **self.op_kwargs
  File "/home/airflow/gcs/dags/eosetl_airflow/build_export_dag.py", line 161, in python_callable_with_fallbac
    raise 
  File "/home/airflow/gcs/dags/eosetl_airflow/build_export_dag.py", line 155, in python_callable_with_fallbac
    python_callable(**kwargs
  File "/home/airflow/gcs/dags/eosetl_airflow/build_export_dag.py", line 101, in export_blocks_comman
    actions_output=os.path.join(tempdir, "actions.json")
  File "/home/airflow/gcs/dags/eosetl/cli/export_blocks.py", line 66, in export_block
    job.run(
  File "/home/airflow/gcs/dags/blockchainetl_common/jobs/base_job.py", line 30, in ru
    self._end(
  File "/home/airflow/gcs/dags/eosetl/jobs/export_blocks_job.py", line 102, in _en
    self.batch_work_executor.shutdown(
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/batch_work_executor.py", line 96, in shutdow
    self.executor.shutdown(
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/fail_safe_executor.py", line 39, in shutdow
    self._check_completed_futures(
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/fail_safe_executor.py", line 47, in _check_completed_future
    future.result(
  File "/opt/python3.6/lib/python3.6/concurrent/futures/_base.py", line 425, in resul
    return self.__get_result(
  File "/opt/python3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_resul
    raise self._exceptio
  File "/home/airflow/gcs/dags/blockchainetl_common/jobs/base_job.py", line 28, in ru
    self._export(
  File "/home/airflow/gcs/dags/eosetl/jobs/export_blocks_job.py", line 70, in _expor
    total_items=self.end_block - self.start_block + 
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/batch_work_executor.py", line 58, in execut
    self.executor.submit(self._fail_safe_execute, work_handler, batch
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/fail_safe_executor.py", line 31, in submi
    self._check_completed_futures(
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/fail_safe_executor.py", line 47, in _check_completed_future
    future.result(
  File "/opt/python3.6/lib/python3.6/concurrent/futures/_base.py", line 425, in resul
    return self.__get_result(
  File "/opt/python3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_resul
    raise self._exceptio
  File "/opt/python3.6/lib/python3.6/concurrent/futures/thread.py", line 56, in ru
    result = self.fn(*self.args, **self.kwargs
  File "/home/airflow/gcs/dags/blockchainetl_common/executors/batch_work_executor.py", line 62, in _fail_safe_execut
    work_handler(batch
  File "/home/airflow/gcs/dags/eosetl/jobs/export_blocks_job.py", line 74, in _export_batc
    blocks = self.eos_service.get_blocks(block_number_batch
  File "/home/airflow/gcs/dags/eosetl/service/eos_service.py", line 49, in get_block
    return [self.get_block(x) for x in block_number_batch
  File "/home/airflow/gcs/dags/eosetl/service/eos_service.py", line 49, in <listcomp
    return [self.get_block(x) for x in block_number_batch
  File "/home/airflow/gcs/dags/eosetl/service/eos_service.py", line 36, in get_bloc
    return self.eos_rpc.getblock(block_number
  File "/home/airflow/gcs/dags/eosetl/rpc/eos_rpc.py", line 54, in getbloc
    'block_num_or_id': block_num_or_i
  File "/home/airflow/gcs/dags/eosetl/rpc/eos_rpc.py", line 50, in cal
    return raw_response.json(parse_float=decimal.Decimal
  File "/opt/python3.6/lib/python3.6/site-packages/requests/models.py", line 897, in jso
    return complexjson.loads(self.text, **kwargs
  File "/opt/python3.6/lib/python3.6/json/__init__.py", line 367, in load
    return cls(**kw).decode(s
  File "/opt/python3.6/lib/python3.6/json/decoder.py", line 339, in decod
    obj, end = self.raw_decode(s, idx=_w(s, 0).end()
  File "/opt/python3.6/lib/python3.6/json/decoder.py", line 355, in raw_decod
    obj, end = self.scan_once(s, idx
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 19136 (char 19135

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.