GithubHelp home page GithubHelp logo

Comments (5)

davidalber avatar davidalber commented on June 12, 2024 1

Apologies. This is a long comment. There's a lot of context that goes with it.

So...way back in #38 it was identified that some users were, oddly, not appearing in the contributors response. This was fixed by adding a short list of the users identified who were known to not be showing up in the contributors response. The problem goes well beyond those two users. Let's take a look at the main Rust repo. Here's what GitHub has to say about the number of contributors:

image

Let's take a look at the contributors request (specifically the header in the response):

$ http --print=h "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100"
HTTP/1.1 200 OK
...
Link: <https://api.github.com/repositories/724712/contributors?per_page=100&page=2>; rel="next", <https://api.github.com/repositories/724712/contributors?per_page=100&page=5>; rel="last"
...

That shows there's only five pages of contributors of 100 users each! That must mean that Highfive is greeting repeat contributors all of the time in rust-lang/rust. (You probably already see this, but I don't pay very close attention there right now.)

What is going on?! Well, let's try the anon parameter in the contributors request documentation.

$ http --print=h "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100&anon=1"
HTTP/1.1 200 OK
...
Link: <https://api.github.com/repositories/724712/contributors?per_page=100&anon=1&page=2>; rel="next", <https://api.github.com/repositories/724712/contributors?per_page=100&anon=1&page=24>; rel="last"
...

That shows there's twenty-four pages of contributors of 100 users each. It also lets us finally see User Aatch (I'm not @-ing him since I would guess he's not inclined to participate in this discussion).

$ http "https://api.github.com/repos/rust-lang/rust/contributors?per_page=100&anon=1" | jq 'map(select(.name == "James Miller"))'
[
  {
    "email": "[email protected]",
    "name": "James Miller",
    "type": "Anonymous",
    "contributions": 159
  }
]

It seems like it's collapsed all of his contributions into that one email address because I didn't find his other email address in any of the responses. The numbers don't quite add up.

Anyway, twenty-four pages is closer, but now it's higher than expected. Here's the critical part from the documentation:

GitHub identifies contributors by author email address. This endpoint groups contribution counts by GitHub user, which includes all associated email addresses. To improve performance, only the first 500 author email addresses in the repository link to GitHub users. The rest will appear as anonymous contributors without associated GitHub user information.

This seems like a critical issue with the way Highfive is currently identifying first-time contributors in popular repositories since Highfive is attempting to match usernames.

Here are a couple alternatives we could look at:

  • Use the commit search API. That appears to allow searching the master branch for commits by email address, for instance. This sounds pretty perfect, except the API is a preview and is subject to change without notice.
  • The payloads Highfive receives via the webhook contain an attribute called author_association. Its possible values are here and include things like CONTRIBUTOR, FIRST_TIMER, and FIRST_TIME_CONTRIBUTOR. That looks pretty nice, but I haven't been able to fully grok its behavior yet. For instance, it behaves strangely in this repository. I'm guessing it's because it's a fork, but you'll note I don't have the little "Contributor" badge next to my name at the top of this comment, so at least the weirdness is consistent with the UI.

from highfive.

davidalber avatar davidalber commented on June 12, 2024

That must mean that Highfive is greeting repeat contributors all of the time in rust-lang/rust. (You probably already see this, but I don't pay very close attention there right now.)

There an example of this in rust-lang/rust#49329. An earlier, already merged, PR from that user is in rust-lang/rust#48076.

from highfive.

davidalber avatar davidalber commented on June 12, 2024

The commit search API looks like it does what Highfive really wants (easily determine if a given user has commits on the default branch). Like the author_association field, however, it does not work for forks. I've submitted a question to figure out if there's a way around this.

from highfive.

nrc avatar nrc commented on June 12, 2024

Given that GH can see this because it gives you the option of highlighting PRs from first-time contributors, I assume there must be API for it. Probably commit search is it, but there might also be something in the GraphQL API?

I don't understand the fork issue. Does that just mean it doesn't work for Rust Highfive, for example?

from highfive.

davidalber avatar davidalber commented on June 12, 2024

Given that GH can see this because it gives you the option of highlighting PRs from first-time contributors, I assume there must be API for it. Probably commit search is it, but there might also be something in the GraphQL API?

There might be. I can take a look.

I don't understand the fork issue. Does that just mean it doesn't work for Rust Highfive, for example?

That's right. Commit searches on fork repositories like rust-lang-nursery/highfive say that everyone has made zero commits.

from highfive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.