GithubHelp home page GithubHelp logo

qzshucsz / btcspark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jeremyrubin/btcspark

0.0 1.0 0.0 47 KB

A toolkit for using apache spark to efficiently query Bitcoin Blockchain data.

License: GNU Affero General Public License v3.0

Shell 7.02% Python 92.98%

btcspark's Introduction

Bitcoin Spark Framework (BTCSpark)

What is BTCSpark?

BTCSpark is a layer for accessing the Bitcoin Blockchain from Apache Spark.

The goal of BTCSpark is to offer high quality, easy to use, performant, and free software to Bitcoin developers and analysts.

NOTE: BTCSpark is currently unmaintained. BlockSci is a similar project with better performance, available here.

Benchmarks

The following benchhmark finds the Transaction Output Amount Distribution (TOAD). On an AWS 6 node (5 slave, one master) m3.large cluster, with the blockchain in hadoop on ephemera storage, this take 8.4 minutes to run using the nativ_lazy_blockchain implementation.

    block_objs = sb.fetch_chain()
    unlazy = lambda x: x()
    txns = block_objs.map(unlazy)\
                     .flatMap(lambda b: 
                          b.txns)\
                     .map(unlazy)
    txns.flatMap(lambda txn:
                 map(lambda txo:
                     ((txo.value>>14)<<14, 1),
                 txn.tx_outs.map(unlazy)))\
        .reduceByKey(lambda x,y: x+y)\
        .saveAsTextFile("txouts_values")

Finding the BIP100 Blocks takes 5.0 minutes on the same cluster.

    block_objs.map(unlazy)\
              .map(lambda b: b.txns[0]().tx_ins[0]().signature_script)\
              .filter(lambda f: "BIP100" in f)\
              .saveAsTextFile(result_name("BIP100_Blocks"))

Note: Unless you have a lot of memory, or you've reduced the working set largely, it isn't recommended to use caching as the overhead of re-parsing isn't horrible.

License

BTCSpark is released under the terms of the AGPL license. See COPYING for more information. Non-free license may also be purchased from Jeremy Rubin for organizations who are unable to use AGPL licensed software.

btcspark's People

Contributors

jeremyrubin avatar hrldcpr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.