GithubHelp home page GithubHelp logo

outlierventures / blockchaindevreport Goto Github PK

View Code? Open in Web Editor NEW
27.0 7.0 17.0 229 KB

Find out how active blockchain devs are on an organisation-by-organisation basis.

License: Apache License 2.0

Python 99.30% Shell 0.70%

blockchaindevreport's Introduction

Blockchain Development Report

Source code and full methodology for Outlier Ventures' Blockchain Development Reports. Latest one can be found here

Setup

Install

Requires Python.

pip3 install -r requirements.txt

Add GitHub PATs

For all large data pulling operations from GitHub, GitHub Personal Access Tokens (PAT) are required as user to GitHub server requests are rate-limited at 5000 requests per hour per authenticated user. No scope/access is required for the tokens. PS: If you have private repos, be sure to use a token that only has the public_repo scope. Create a .env (refer to env.sample) to store all the GitHub PATs in a single space seperated list. These PATs will be used in round robin to access the various GitHub Organisations and Repositories.

Update Config (optional)

In the config.ini file, there are three categories of protocols/projects namely,

  • Blockchain
  • DeFi
  • NFT (& Metaverse)

Each category contains the protocols/projects analysed for the Blockchain Development Trends 2021 Report. To run for a particular category, uncomment the corresponding section and run script(s) for Blockchian/DeFi/NFT protocols/projects. You can also add protocols/projects you want the scripts to analyse.

Update Protocols (optional)

The analysis is based on core repositories for each protocol with the Electric Capital’s crowdsourced Crypto Ecosystems index being used as the base, where we have manually curated relevant organisations per ecosystem based on thorough research. Therefore, we would advise against updating protocol toml as it would overwrite the manual curation of organisations.

All of the ecosystems are specified in TOML configuration files. To update TOML files of the protocols/projects added for comparision by you to the config, you can follow either of the two steps:

  • Automated: Comment all categories of protocols/projects in the config.ini, create a new variable called chains in the config.ini containing their names in a single space seperated list. Ensure that their names are the same as .toml file names of the corresponding Electric Capital Crypto Ecosytems. Then run the following command,
python3 updateProtocols.py
  • Manual: Create a file in the protocols sub-folder with the same name as that of the TOML files corresponding to the protocols/projects in the Electric Capital Crypto Ecosytems and copy and paste the contents in it.

Usage

Protocol core development

python3 dev.py [PROTOCOL_NAME]

This analyses historical commits, code changes and statistics for the each of the GitHub organisations belonging to the protocol, summed across repositories for the default branch (main/master). Results are written to 2 files:

[PROTOCOL_NAME]_stats.json: Latest stats, such as star count and code churn in the last month.

[PROTOCOL_NAME]_history.json: Historical commits and code churn (additions and deletions) on a week-by-week basis.

Protocol core contributing developers

python3 contr.py ./protcocols/[PROTOCOL_NAME].toml

The total number active in the past year is printed, and the usernames written to [PROTOCOL_NAME]_contributors_.json. It saves all the seen repositories in the [PROTOCOL_NAME]_repos_seen.txt. If an error occurs, rerunning this script will start analysing from the point where it crashed (ignoring all seen repos).

Visualizing results

Once you have run both of the above run for all the protocols/projects, you can visualize results using the following command.

python3 vis.py

Results are written to files commits.csv, commits.png, commits_change.png, churn.csv,churn.png, churn_change.png, devs.csv, devs.png and devs_change.png. Note that churn refers to the number of code changes.

One stop shell script

Methodology

We have based our analsysis on core repositories for each protocol using Electric Capital’s crowdsourced Crypto Ecosystems index as the base, with manual curation of relevant organisations per ecosystem.

All the core repositories of each of the GitHub organizations of a protocol were taken and the forked repositories, when marked as such on GitHub, were ignored. Forking repositories is very common practice, and leads to the development activity of one ecosystem being included in another. Including all forks in the analysis adds a lot more noise than signal. For similar reasons, only activity for the default branch (main or master) of each repository was included. In these ‘unforked’ repositories, all commits to the default branch were indexed and analyzed.

We attribute the development activity for each organization on GitHub to a single protocol, and don’t include individual repositories outside of those organizations, to most accurately show development activity to the core development of protocols.

For the Blockchain Development Trends 2021 report, GitHub data was pulled for the duration of 27 January - 31 December 2020. The TOML configuration files used for organisations and repositories analysed for the core development and developer count are in the protocols folder.

Core protocol development: historical commits and code changes

Commits and code changes are pulled directly from the GitHub API. These are pulled per-repository, and then summed for all repositories in a given organisation for their default branch (main/master).

The data points used are the total number of commits and total number of code changes (additions + deletions) each week across all branches.

In the visualisation, a 4-week moving average is taken to smooth the data.

The data collection is in dev.py and the visualization is in vis.py.

Core developer contributing to a protocol

All commits are pulled from each repo and the date as well as the author (GitHub username) returned. Any commits with a date from more than one year in the past are filtered out. The process is repeated for all repos in the .toml file, with the resulting list of contributors combined and de-duplicated.

The data collection is in contr.py and the visualization is in vis.py.

GitHub Statistics

A measurement of the sum total of Stars, Forks and Releases of each of the core repositories of the protocols’ GitHub organization indicating in some way its popularity and activity.

Visualization, including growth calculation

Commit and churn charts are visualised using a 4-week moving average to smooth data. Therefore, the curve lags by approximately 2 weeks.

Developer activity charts display the raw data.

Growth charts (percentage change) take an average of the last 8 weeks of the year and compare this figure to the first 8 weeks of the year, rounding to the nearest whole percentage point. This applies to all growth charts: commit, churn and developer activity.

blockchaindevreport's People

Contributors

aronvanammers avatar dependabot[bot] avatar muditmarda avatar ryjones avatar theoturner avatar zbraiterman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

blockchaindevreport's Issues

'TypeError' object has no attribute 'status' - for quite a lot of the ecosystems

To recreate:

Maker as an example,

run 'python3 dev.py maker 1'
code runs successfully until it gets to the repo - 'Fetching repo data for makerdao/deployed-collateral-contracts'
which then throws the exception:
Exception occured while fetching single repo data 'TypeError' object has no attribute 'status'

I wonder if there has been a solution for this - it seems to happen within most of the ecosystems and means that the final {ecosystem}_stats.json file is not created.

I've attempted to fix it by removing the error ecosystems from the toml files but hasn't worked so far.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.