GithubHelp home page GithubHelp logo

github / innovationgraph Goto Github PK

View Code? Open in Web Editor NEW
325.0 112.0 30.0 6.75 MB

GitHub Innovation Graph

Home Page: https://innovationgraph.github.com/

License: Creative Commons Zero v1.0 Universal

Python 100.00%
data github open-data

innovationgraph's Introduction

GitHub Innovation Graph

This repo contains structured data files of public activity on GitHub, aggregated by economy on a quarterly basis from 2020 onward.

Through offerings such as the GitHub Innovation Graph, we hope to inform research and public policy that could benefit from data on software development activity globally. We welcome developers, data analysts, researchers, policymakers, and all other interested stakeholders to explore the data, discover insights, and create visualizations, among much more.

The GitHub Innovation Graph provides data on the following areas:

See the datasheet for more information.

Exploring Innovation Graph data

For an overview of the dataset, check out the charts and tables at the GitHub Innovation Graph website.

To dive deeper into the data and run your own analyses, feel free to fork this repo, explore the structured data files using the exploratory data analysis tool of your choice, and share your findings in our Discussions page.

Limitations

The GitHub Innovation Graph dataset contains data on (1) public activity (2) on GitHub (3) aggregated by economy (4) on a quarterly basis. As such, this dataset would not be useful for understanding:

  1. private activity;
  2. outside of GitHub;
  3. at a more granular geographic level than economy; or
  4. at a more granular temporal level than quarterly.

Additionally, economies that have fewer developers on GitHub (which generally correlates with the population of an economy) will have less data associated with them in this dataset.

See the datasheet for more information on limitations.

Representativeness of Innovation Graph data

How many economies are included?

We endeavor to publish as much data about public activity on GitHub as possible. However, the number of developers varies considerably by economy, and in some cases we decline to publish specific statistics for economies with fewer than 100 unique developers performing the relevant activity during the specified quarter out of an abundance of caution for developers’ privacy. You can find more information on our methodology in the datasheet.

Below a heatmap shows the count of economies reported for each data file by quarter:

Count of economies by data file by quarter

A heatmap of the count of economies for each GitHub Innovation Graph data file by quarter, which shows that the data for repositories and developers are fairly comprehensive, with over 215 distinct economies represented since Q1 2020. The other data files (with the exception of the topics data file) have fewer economies represented, ranging from about 110 - 180 economies. The topics data file shows distinct economy counts ranging from about 45 - 130 over time.

You can also find the CSV for this heatmap in the data/representativeness_data directory.

Which economies are included?

We aggregate GitHub activity for economies using a definition broader than recognized UN member states. For example, AQ reports activity from developers stationed on Antarctica. Below a heatmap reports the count of data files for each economy by quarter:

A heatmap of the count of GitHub Innovation Graph data files for each economy by quarter, which shows that the more populous economies are more likely to be represented in more data files.

You can also find the CSV for this heatmap in the data/representativeness_data directory.

License

This project is released under CC0-1.0.

Maintainers

See CODEOWNERS

Support

See SUPPORT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.