Comments (5)
Apologies. This is a long comment. There's a lot of context that goes with it.
So...way back in #38 it was identified that some users were, oddly, not appearing in the contributors response. This was fixed by adding a short list of the users identified who were known to not be showing up in the contributors response. The problem goes well beyond those two users. Let's take a look at the main Rust repo. Here's what GitHub has to say about the number of contributors:
Let's take a look at the contributors request (specifically the header in the response):
$ http --print=h "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100"
HTTP/1.1 200 OK
...
Link: <https://api.github.com/repositories/724712/contributors?per_page=100&page=2>; rel="next", <https://api.github.com/repositories/724712/contributors?per_page=100&page=5>; rel="last"
...
That shows there's only five pages of contributors of 100 users each! That must mean that Highfive is greeting repeat contributors all of the time in rust-lang/rust. (You probably already see this, but I don't pay very close attention there right now.)
What is going on?! Well, let's try the anon
parameter in the contributors request documentation.
$ http --print=h "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100&anon=1"
HTTP/1.1 200 OK
...
Link: <https://api.github.com/repositories/724712/contributors?per_page=100&anon=1&page=2>; rel="next", <https://api.github.com/repositories/724712/contributors?per_page=100&anon=1&page=24>; rel="last"
...
That shows there's twenty-four pages of contributors of 100 users each. It also lets us finally see User Aatch (I'm not @-ing him since I would guess he's not inclined to participate in this discussion).
$ http "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100&anon=1" | jq 'map(select(.name == "James Miller"))'
[
{
"email": "[email protected]",
"name": "James Miller",
"type": "Anonymous",
"contributions": 159
}
]
It seems like it's collapsed all of his contributions into that one email address because I didn't find his other email address in any of the responses. The numbers don't quite add up.
Anyway, twenty-four pages is closer, but now it's higher than expected. Here's the critical part from the documentation:
GitHub identifies contributors by author email address. This endpoint groups contribution counts by GitHub user, which includes all associated email addresses. To improve performance, only the first 500 author email addresses in the repository link to GitHub users. The rest will appear as anonymous contributors without associated GitHub user information.
This seems like a critical issue with the way Highfive is currently identifying first-time contributors in popular repositories since Highfive is attempting to match usernames.
Here are a couple alternatives we could look at:
- Use the commit search API. That appears to allow searching the master branch for commits by email address, for instance. This sounds pretty perfect, except the API is a preview and is subject to change without notice.
- The payloads Highfive receives via the webhook contain an attribute called
author_association
. Its possible values are here and include things likeCONTRIBUTOR
,FIRST_TIMER
, andFIRST_TIME_CONTRIBUTOR
. That looks pretty nice, but I haven't been able to fully grok its behavior yet. For instance, it behaves strangely in this repository. I'm guessing it's because it's a fork, but you'll note I don't have the little "Contributor" badge next to my name at the top of this comment, so at least the weirdness is consistent with the UI.
from highfive.
That must mean that Highfive is greeting repeat contributors all of the time in rust-lang/rust. (You probably already see this, but I don't pay very close attention there right now.)
There an example of this in rust-lang/rust#49329. An earlier, already merged, PR from that user is in rust-lang/rust#48076.
from highfive.
The commit search API looks like it does what Highfive really wants (easily determine if a given user has commits on the default branch). Like the author_association
field, however, it does not work for forks. I've submitted a question to figure out if there's a way around this.
from highfive.
Given that GH can see this because it gives you the option of highlighting PRs from first-time contributors, I assume there must be API for it. Probably commit search is it, but there might also be something in the GraphQL API?
I don't understand the fork issue. Does that just mean it doesn't work for Rust Highfive, for example?
from highfive.
Given that GH can see this because it gives you the option of highlighting PRs from first-time contributors, I assume there must be API for it. Probably commit search is it, but there might also be something in the GraphQL API?
There might be. I can take a look.
I don't understand the fork issue. Does that just mean it doesn't work for Rust Highfive, for example?
That's right. Commit searches on fork repositories like rust-lang-nursery/highfive say that everyone has made zero commits.
from highfive.
Related Issues (20)
- Add some local tests
- rust-highfive only warns about updated submodules when a PR is created HOT 4
- Remove jemalloc config
- `r? @rust-highfive` to reassign a review randomly (again) HOT 1
- naive startswith can cover wrong directories for "mentions" HOT 1
- directory choice only examines first two components HOT 1
- feature request: warn if src/test/ is modified without modifying compiler/ or src/librustdoc/ HOT 5
- Warn on change to targets HOT 1
- Please detect commits that mark an API as stable and display a warning
- highfive should not assign someone to their own PR HOT 3
- Warn when `r?`d username doesn't have `r+` rights
- Mention notifications for changes to subtrees HOT 8
- Add link to rustbot commands in first message
- Why is it written in Python? 👀 HOT 1
- Add repo link to the `rust-highfive` user account
- Update rust config to add wgs/projects
- Give helpful info with r? is invalid HOT 1
- highfives misses some PRs HOT 1
- no longer can assign specific users HOT 1
- Re-assigning can assign the current assignee HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from highfive.