GithubHelp home page GithubHelp logo

vdaubry / github-awards Goto Github PK

View Code? Open in Web Editor NEW
1.6K 29.0 124.0 2.81 MB

Discover your ranking on github :

Home Page: http://git-awards.com

License: MIT License

Ruby 84.13% JavaScript 0.36% HTML 14.63% CSS 0.88%
github rankings language star

github-awards's Introduction

Build Status Help Contribute to Open Source

Important notice : Github Awards becomes Git Awards !

Git Awards

Git Awards gives your ranking on GitHub by language and by location (city, country and worldwide) based on the number of stars on your repos.

How does it work ?

In order to calculate your ranking on GitHub we:

  • Get all GitHub users with their location
  • Geocode their location
  • Get all GitHub repositories with language and number of stars

With this information we are able to compute your ranking for a given language in a given city.

Step 1 : Get all users and repositories

There are over 10 Millions users and 15 Millions repositories on GitHub, we cannot just call the GitHub API for each user and his repos.

However the GitHub list API returns 100 results at a time with basic information :

With this one can get up to 500k user / repo per hour : this is enough to get the entire list of users and repositories with basic informations (username, repo name, etc).

Rake tasks are :

rake user:crawl
rake repo:crawl

Now we need to get detailed informations such as location, language, number of stars.

Step 2 : Use Google Big Query to get details about active users and repositories

GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

The GitHub Archive dataset is public, with Google Big Query we can filter the dataset to get only the latest event for each repo and users. Unfortunatly the GitHub Archives events starts from 2011, so we won't get ranking informations for users and repos that have been inactive since 2011.

  • Request for users :

users.sql

  • Request for repositories :

repos.sql

We can then download the results as JSON, parse the result, and fill missing information about users and repos.

Rake tasks are :

rake user:parse_users
rake repo:parse_repos

We now have the users location, and repositories language and number of stars. In order to get country and world rank we need to geocode user locations

Step 3 : Geocoding user locations

Location on GitHub is a plain text field, there are about 1 million profiles with location on GitHub. Free geocoding APIs usually have a hard rate limiting. First step is to geocode only distinct location, which leaves about 100k locations to geocode. A solution to speed up the geocoding is to use a combination of :

Rake task is :

rake user:geocode_locations

We now have all the information we need to compute ranking.

Step 4 : Compute rankings by language and by location (city/country/world)

To get rankings we first calculate a score for each user in each language using this formula :

sum(stars) + (1.0 - 1.0/count(repositories))

Then we use Postgres ROW_NUMBER() function to get ranks compared to other developers with repositories in the same languages, in the same location (by city, by country or worldwide).

Ok, now we have all GitHub users' ranking :)

In order to speed up queries based on user ranks, we create a table with all rankings information. Once we have all rankings informations on a single table we can properly index it, we get acceptable response time when we query it from a web application.

The query to create the language_rankings table can be found here :

rank.sql

Step 5 : VOILA ! Look for your ranking and have fun :)

Next steps :

  • Github connect
  • Manually refresh your informations
  • Automating data update
  • Improve UI

Contributing :

  • Fork it https://github.com/vdaubry/github-awards/fork
  • Create your feature branch git checkout -b my-new-feature
  • Commit your changes git commit -am 'Add some feature'
  • Push to the branch git push origin my-new-feature
  • Create a new Pull Request

License

This project is available under the MIT license. See the license file for more details.

github-awards's People

Contributors

alexisbernard avatar askl56 avatar chrismissal avatar denheck avatar flexbox avatar ghecho avatar jimmithy avatar mikicaivosevic avatar nunogoncalves avatar peterdavehello avatar schneems avatar shashankanataraj avatar tanuck avatar tonkpils avatar vdaubry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

github-awards's Issues

Data not updated

Thank you for your awesome project., but the data has not been updated for days.

[Feature request] Link to user repositories on the specific language

It would be nice to have a link to github with a list of the users repositories on that specific listed language. Places like the ranking of the language, or the user profile.

Example

example

Suggestion

Github provide a simple way to generate this link, just using their "search query language"

https://github.com/search?q=user:$USERNAME+language:$LANGUAGE

  • $LANGUAGE is replaced with ruby, javascript ...
  • $USERNAME is replaced with the username, paul, tenderlove ...

There is also a lot of different params, to filter, sort.

If you guys found the idea interesting, I would be happy to open a pull request

Can't find my city

Hi,

At first I thought this was because my Github location was "Australia" but I've updated it to refer to a city, and several days later this tool still claims it can't find my city, and goes on to present some defaults? for a search of Cuba.

Separate ranking for organizations versus users

Would like to see separate rankings for organizations versus users.

Organizations often have multiple users contributing to their repositories. At the risk of sounding like a whiner, it's not "fair" to compare organizations with a large number of contributors, to independent developers.

[Feature request] Need finer grain locality support.

Grouping results by city is not accurate enough. For example, the city "Manchester" exists in England, and there are many cities "Manchester" (36) in the United States.

Filtering by states/provinces/territories would be great.

Unable to refresh project

How can I refresh organization page in github-awards: oblac? Only my profile gets refreshed, but not organization?

With great fun comes great responsibility :) Please make it work, some serious bet is in place ;)))

ghost ranking

When checking "Top C GitHub developers in France",
there is no 3rd place.
It goes straight from 2nd to 4th.

As if there was some ghost data still counting in the ranking but not displayable (partially deleted, but not completely).

Remove profiles

Users must be able to remove themselves from all ranking.

Here are some reasons why a user may prefer not to be listed.

  • For instance, a user may prefer not to be listed in a list of users in their city (the information is publicly accessible, but they would prefer not to be aggregated).
  • A user may have other privacy concerns, not just of their location
  • A user may believe that ranking is not productive, and hence would prefer not to be used to rank other developers.

(Personally, I subscribe to the third, but I can imagine others who might subscribe to 1 or 2.)

So, minimally, please remove me from all ranking -- and, ideally, add functionality so that others can do the same for themselves.

Thanks,
joshua

My profile is empty

My profile kevnz is blank with no information at all. This despite the following stats I was able to gather from github api
My repo counts
JavaScript: 278,
null: 47,
'C#': 15,
Ruby: 7,
CoffeeScript: 3,
CSS: 10,
ASP: 1,
ActionScript: 1,
Python: 3,
HTML: 4,
C: 2,
Perl: 1,
LiveScript: 1,
PowerShell: 1,
Java: 2,
'Objective-C': 3,
'C++': 4

and my stars per repo

JavaScript: 136,
null: 4,
'C#': 3,
Ruby: 5,
CoffeeScript: 2,
CSS: 4,
ASP: 0,
ActionScript: 0,
Python: 2,
HTML: 2,
C: 1,
Perl: 0,
LiveScript: 0,
PowerShell: 0,
Java: 0,
'Objective-C': 1,
'C++': 3

Maybe the null returns are a problem?

Profile Edits

Are profiles refreshed after editing? Is there any way to trigger this? Fixed my GitHub location information, but still getting “we couldn’t find your city from your location”.

[Feature request] Give overall ranking

A ranking considering all repositories of a user, including all languages.

P S - The language list gives the impression that you may have thought about implementing it. I see a blank field at the end of language list. (example city page)

Seems that the ranking charts dont update?

I am unfortunately not a rubyist, or this would come in the form of a PR instead of an issue. But it seems that despite the new ability to update your user data, the rankings for a region do not update. Seems to me like that would be an good addition, maybe just reshuffle that data when a user that exists in it updates their profile?

Stars per language may be wildly inaccurate.

First off, this is a really cool project!

I did notice that in some cases the ranking seems to be completely wrong...

If you look up who the top Atlanta based users in CSS are, user Alindeman is miles ahead of everyone else. But when you dive it to see _why_that is, there is only one CSS based repo with any stars at all and that repo has nothing to do with CSS at all: upgradingtorails4

Maybe someone with more knowledge of how GitHub determines the primary languages can chime in here. I see that GitHub marks the repo in question as over 97% CSS:

screen shot 2015-02-26 at 1 34 13 pm

But looking at the files themselves, there are only 5 (very minimal) CSS files:

screen shot 2015-02-26 at 1 33 55 pm

It seems that even if GitHub didn't take Markdown into consideration, Ruby should still be considered the Repo's primary language.

I only checked the top 3 CSS contributors in Atlanta, but all of them seemed to be suffering from the same issue.

I don't have any brilliant solution to offer at the moment.. but I thought it was worth starting a discussion.

Rank by stars on repos that the user has ownership of

I feel a more accurate representation would come from including all stars for repository that the user has ownership of. A lot of developers will use their local profiles for fiddling around but release their open source software via an organization.

Stars missing since 2 days

Great great great job on the github-awards. We have some bets here going on (or a competition) and sadly github-awards hasn't updated my profile and a teammates profile for 2 days or so. I know the missing stars is a github-archives issue, but is there a way you can help us out fixing it?
Cheers

All search inputs don't handle whitespace

If you have a whitespace at the start or end of a search, be it for a user/city/country, the search fails. Fairly simple principle to trim both ends of the search text?

Regards,

Haxe is broken

Check my profile for instance

Also check the world ranking for Haxe

Take into account organization repositories

Some users have repositories in organization accounts - it would be nice to take this into account. Maybe one could look at the fraction of commits made to a repo and share out the stars that way. So say two users contribute to a 1000 star repo equally, they would both get 500 stars. Or something like this?

Number of stars is incorrect

The reporting on the number of star is incorrect. According to Github Awards I have 3577 stars for Objective-C:

But my most popular repo, ViewDeck, alone has 4200+ stars:

So that can't be right (and I did not gain 700+ overnight or even overweek, so it's not a "just wait for the cache to be updated" issue, I think).

Any ideas?

Still can not find my profile

When I press the refresh button and redirect to the github-awards page, I meet an error saying that We're sorry, but something went wrong. . Would you please check the log file and fix this problem?

Branding and naming goes against GitHub's policies

GitHub Awards has been confused as an actual GitHub project. GitHub explicitly ask that you avoid naming projects anything that makes it seem like an endorsement:

Naming projects and products

Please avoid naming your projects anything that implies GitHub’s endorsement. This also applies to domain names.

The background for the site also seems to be a modified version of the Octocat, which is again against their policies:

Please don’t do these things

Create a modified version of the Octocat or GitHub logo

Whilst GitHub Awards does describe what the project does, I feel that it needs to be renamed or it may be closed down by GitHub.

Source: https://github.com/logos

My account can't be found

Maybe because I used to have "Germany / Denmark" in my location field? I recently changed it to the city I'm currently living in. Thanks :)

Re-collecting user data

First of all I think this is a really cool project! 🎉

I was wondering if it was possible to re-index geolocation data. I added my location recently and its not present on the site. Is there any way to re-index that data?

Thanks

Usernames are case sensitive

Hey,

Nice project :)

Tried to "github-awards" me but didn't work. My username is gawen but the page answers "User gawen not found".

As you see, I do exist ;-)

[Feature Request] Support for repo in organization (use more accurate ranking method)

Star-based ranking from our own repository is alright for a start.
However, this doesn't take into account contributions to someone else's repository.

For example, Jonny has one repo with 100 stars. Andy has no repo, but is a regular open-source contributor and has more than 100 commits on Linux repo (which has almost 20K stars)

It would be much better if the ranking method takes contributions into account as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.