Comments (11)
Although the above might work, there is no need to be unopinionated here. I tend to go with an opinionated version instead, which is, in the simplest case, deciding about weights for the stats.
For example, having the following coefficients:
- 1 point per commit
- 20 points per contributor
- 8 points for anything else
Alternatively, @cool-RR suggested the following, dynamic, calculation: For each statistic (commits, contributors, merged PR, etc.) calculate the percentile of the repo. Then, add the percentiles of all statistics together to get the krihelimeter. This will ensure that the krihelimeter will be bounded between 0 and 100 * num_of_statistics.
If someone have better idea for calculating the krihelimeter please do tell.
from krihelinator.
See f3d2ec2. The above basic calculation was implemented.
from krihelinator.
Looks good :)
from krihelinator.
I'm satisfied with the current calculation. See no reason to change it soon. Therefore, I'm closing the ticket.
from krihelinator.
FWIW I think that adding points per contributor flat is creating a bit of imbalance for smaller projects.
Lets say I have:
- a project with 3 authors that did 20 commits (18 by one and 1 by the other two each and they were contributed via pull request which were fixes for reported issues): 112 pts (3 * 20 + 20 * 1 + 4 * 8)
- another project with one author that did 60 commits: 80 pts
I suggest adding points for commits, PRs, issues and then multiply them with a coefficient for the authors. So projects that have more contributors get a slightly higher rating than one with less contributors without being completely imbalanced.
from krihelinator.
First, thanks for the feedback!
I suggest adding points for commits, PRs, issues and then multiply them with a coefficient for the authors.
I'm not quite sure what do you mean. With this suggestion the difference between the two scenarios you provided will be even bigger, isn't it?
Assuming all the weights for commits / issues / PRs remain the same
- the first scenario will get: (20 * 1 + 4 * 8) * 3 = 156.
- and the 2nd senario will get: (60 * 1) * 1 = 60.
Maybe I'm completely wrong in understanding your suggestion. Can you please elaborate?
from krihelinator.
I'm not quite sure what do you mean. With this suggestion the difference between the two scenarios you provided will be even bigger, isn't it?
If you multiply by the author number, yes. But that is not what I meant. Let's just for examples sake say that you multiply by 1 + 0.1 * (authors-1) then you get:
(20 * 1 + 4 * 8) * 1.2 = 62,4
(60 * 1) * 1 = 60
I also think that PR and issues should not weigh so much more than commits. They should also be rebalanced. It might take a bit more effort to actually come up with a formula that represents the activity properly across multiple projects. And I am not getting into the time component which might also be interesting (like what project is more active? One that gets a couple commits every day or one that gets a bunch on one day of the month and nothing happens for the rest of the time). As you see it can become quite complicated, question is are you aiming for that or a simple but inaccurate (imo) number.
from krihelinator.
I think that the best way to decide if the suggested metric is better is to generate a new "most active" list based on it and investigate the results. After all, it is all very subjective.
I will do this for the entire DB, and maybe for the python language, as it is both very active and I'm relatively familiar with. Would you like to see the results for other languages?
from krihelinator.
Top 50 repos
Current metric | Suggested metric |
---|---|
CocoaPods/Specs | CocoaPods/Specs |
Microsoft/vscode | Microsoft/azure-docs |
kubernetes/kubernetes | NixOS/nixpkgs |
Microsoft/azure-docs | kubernetes/kubernetes |
NixOS/nixpkgs | githubschool/open-enrollment-classes-introduction-to-github |
aburasali/cs362w17online | Microsoft/vscode |
BlissRoms/platform_frameworks_base | ansible/ansible |
ansible/ansible | rust-lang/rust |
githubschool/open-enrollment-classes-introduction-to-github | dotnet/corefx |
rust-lang/rust | gentoo/gentoo |
dotnet/corefx | caskroom/homebrew-cask |
caskroom/homebrew-cask | Automattic/wp-calypso |
freebsd/freebsd-ports | tensorflow/tensorflow |
gentoo/gentoo | tgstation/tgstation |
tgstation/tgstation | aburasali/cs362w17online |
ampproject/amphtml | jlord/patchwork |
Automattic/wp-calypso | Homebrew/homebrew-core |
tensorflow/tensorflow | DefinitelyTyped/DefinitelyTyped |
jlord/patchwork | hashicorp/terraform |
flutter/flutter | facebook/react-native |
hashicorp/terraform | DroidKaigi/conference-app-2017 |
everypolitician/everypolitician-data | saltstack/salt |
DefinitelyTyped/DefinitelyTyped | dart-lang/sdk |
Homebrew/homebrew-core | freebsd/freebsd-ports |
DroidKaigi/conference-app-2017 | golang/go |
angular/angular-cli | ampproject/amphtml |
saltstack/salt | docker/docker |
docker/docker | JuliaLang/julia |
facebook/react-native | flutter/flutter |
dotnet/roslyn | dotnet/coreclr |
dart-lang/sdk | angular/angular-cli |
JuliaLang/julia | liferay/liferay-portal |
golang/go | apple/swift |
dotnet/coreclr | dotnet/roslyn |
krexus/frameworks_base | nodejs/node |
earl/llvm-mirror | elastic/elasticsearch |
llvm-mirror/llvm | d3athrow/vgstation13 |
liferay/liferay-portal | home-assistant/home-assistant |
apple/swift | mantidproject/mantid |
NixOS/nixpkgs-channels | servo/servo |
convox/rack | everypolitician/everypolitician-data |
elastic/elasticsearch | openstack/openstack |
openstack/openstack | docker/docker.github.io |
nodejs/node | cockroachdb/cockroach |
freebsd/freebsd | joomla/joomla-cms |
dimagi/commcare-hq | dimagi/commcare-hq |
cockroachdb/cockroach | librenms/librenms |
d3athrow/vgstation13 | ManageIQ/manageiq |
beagleboard/linux | code-dot-org/code-dot-org |
joomla/joomla-cms | llvm-mirror/llvm |
Top 50 python repos
Current metric | Suggested metric |
---|---|
ansible/ansible | ansible/ansible |
saltstack/salt | saltstack/salt |
dimagi/commcare-hq | home-assistant/home-assistant |
home-assistant/home-assistant | dimagi/commcare-hq |
odoo/odoo | odoo/odoo |
LLNL/spack | LLNL/spack |
mozilla/addons-server | mozilla/addons-server |
wikimedia/mediawiki-extensions | wikimedia/mediawiki-extensions |
edx/edx-platform | edx/edx-platform |
rg3/youtube-dl | rg3/youtube-dl |
fchollet/keras | cloudmesh/classes |
zulip/zulip | zulip/zulip |
cloudmesh/classes | ros/rosdistro |
ros/rosdistro | duckduckgo/zeroclickinfo-fathead |
duckduckgo/zeroclickinfo-fathead | fchollet/keras |
Azure/azure-cli | coala/coala |
AdguardTeam/AdguardFilters | Azure/azure-cli |
openshift/openshift-ansible | inasafe/inasafe |
coala/coala | statsmodels/statsmodels |
statsmodels/statsmodels | openshift/openshift-ansible |
google/ggrc-core | ipython/ipython |
Theano/Theano | uclouvain/osis |
inasafe/inasafe | buildbot/buildbot |
pandas-dev/pandas | frappe/erpnext |
matplotlib/matplotlib | pandas-dev/pandas |
conda/conda | matplotlib/matplotlib |
frappe/erpnext | Theano/Theano |
scikit-learn/scikit-learn | google/ggrc-core |
ipython/ipython | mirumee/saleor |
rcbops/rpc-openstack | scikit-learn/scikit-learn |
uclouvain/osis | rcbops/rpc-openstack |
mirumee/saleor | python/mypy |
buildbot/buildbot | bigchaindb/bigchaindb |
pisilinux/main | django/django |
openshift/openshift-tools | pymedusa/Medusa |
python/mypy | airbnb/superset |
openembedded/openembedded-core | ManageIQ/integration_tests |
bigchaindb/bigchaindb | terasolunaorg/guideline |
kubernetes-incubator/kargo | django-oscar/django-oscar |
ManageIQ/integration_tests | Cloud-CV/EvalAI |
django/django | kubernetes-incubator/kargo |
pfnet/chainer | openbmc/openbmc |
airbnb/superset | AdguardTeam/AdguardFilters |
Cloud-CV/EvalAI | pfnet/chainer |
blueboxgroup/ursula | openstates/openstates |
pymedusa/Medusa | astropy/astropy |
django-oscar/django-oscar | galaxyproject/galaxy |
xonsh/xonsh | edx/configuration |
getsentry/sentry | SatelliteQE/robottelo |
terasolunaorg/guideline | conda/conda |
from krihelinator.
I actually don't want to crunch some statistics but those lists without the numbers that lead to this outcome don't provide any information to me to see if it got better (imo) or not. :)
from krihelinator.
You are absolutely right!
Here is a .csv file with all of the repos data currently in the DB. Waiting to see what you get ;-)
from krihelinator.
Related Issues (20)
- Always interact with models through contexts
- Replace `input_validator` with ecto based implementation HOT 1
- After import insert new language fail on `unique: languages_pkey`
- Add a GH.Repo changeset validation on name format (user/repo)
- Fix language history for F# HOT 1
- Set up uptime monitoring (using some online service) HOT 1
- Investigate the memory issues in production HOT 3
- Consider handling BigQuery cache hits HOT 1
- Change `to_retry` list to black list instead of white list
- Fix 1.5 deprecation warnings
- Present language description on language page
- Investigate tzdata_release_updater failure
- Move back to digital ocean. AWS is too expensive
- Change to upstream elixir_bigquery after moving to docker based deployment
- Make sure certbot can renew the certificates
- Make dev and prod more similar. Ideally with ssl on dev, without MIX_ENV, only env vars HOT 1
- No timestamp in logs
- Commits and contributors are not scraped anymore
- Feature request: Support searching from the address bar by pressing Tab
- Feature request: Keyboard support. Press / to focus on search box HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from krihelinator.