GithubHelp home page GithubHelp logo

Comments (16)

JADEUSC avatar JADEUSC commented on August 16, 2024 2

This would certainly be possible. The workflow is the way it is right now because we never really intendet to have teams submit their forecasts in an automatic manner and wanted all submissions and possible failures visible in the PR tab.
We only automated some submissions because the teams that run these models do not provide their data in our desired format.

Directly pushing to the master branch in an automatic manner could also lead to conflicts, especially when there are multiple teams using a setup like this in a short timeframe (tuesday in our case). Therfore, I dont think that directly pushing forecasts to the main repositories master branch is the right way to do this.

The "optimal" solution, at least in my opinion, would execute all the pre-processing steps in the teams fork and only create an automatic PR to this repository (Therfore the action would not run on this repository but on the teams fork). Otherwise we would end up with a lot more jobs that have to be managed in one repository.

I do agree with you that this could be improved upon, but i have to talk to @jbracher about it.

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024 1

Hi Sam,

I executed a manual rerun, you can see the output here: https://github.com/KITmetricslab/covid19-forecast-hub-de/runs/1701140680?check_suite_focus=true
The "problem" seems to be the fact that the files are already present in the hub repository. If this is the case the job should work next week.

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024 1

Fair enough. It sounds like we should provide our data in the wrong format then 😆.

I see the point about having too many jobs on master. Is you code adaptable for automating a PR from a fork or do you have suggestions for that. For the conflicts I think it shouldn't as presumably teams aren't submitting other teams forecasts?

Looking forward to the developments 👀 .

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024 1

Yeah, in a perfect world there would be no conflicts but I personally would not take this risk on the repositories main master branch. I will have a quick look at the used packages, if I am able to get an automatic PR to the upstream repository to work I will report it here.

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024 1

No problem Johannes as I said before it has been good to clarify this issue.

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024

Hi Sam @seabbs,

the GithubAction did go through this week, however I realized that the forecasts are directly pushed to the master branch of the forecast-Hub repository. This is not the intendet way of submitting forecasts because it basically bypasses our submission checks (which are only executed whenever a pull requests is opened). Therefore, I would like to ask you to change the action to create a PR instead of directly comitting to the master branch.

This can be done in multiple ways:

  1. You run the action on your own fork of this repository and manually create a PR every Thuesday.
  2. A fully automated solution can be implemented aswell, the basic idea is as follows:
    • Currently you Github actions job modifies the master branch, this should be changed. I currently do this by checking out a dedicated "model-branch" for each automated model (Im pretty sure that this could be improved upon)
    • Create a PR from the action script. This is a little tricky because PR created like this do not trigger PR Hooks (which are necessary to run the validation script). As a workaround I created this file which can be called from the action script.
    • You can see the whole workflow here (Line 43ff)

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

Hi @JADEUSC,

Thanks for clarifying, it might be good to have some docs on this as it has been quite a time sink.

We would like to move away from manual submission if possible. I will have a look at updating to using a PR approach as you describe though I am a little busy this week so it may take a while. In the interim disabling might be the best solution and hopefully will be able to sort something out for next week or if not the week after.

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

Looking at your PR check (

- run: pip3 install -r github-actions/pr_requirements.txt
) I see a call to two python scripts. Wouldn't it be easier to check out to master as here and then as an extra step prior to pushing run those validations without trying to trigger a PR (i.e directly calling the python scripts)? As that would trigger an error it would prevent the next step from triggering so no commit?

Also looking at these couldn't you do a has this changed check (i.e something like this (my python is bad): https://stackoverflow.com/questions/33733453/get-changed-files-using-gitpython) and so only validate forecasts that have actually been updated/added rather than all forecasts present? If doing so in a PR would need to do something more complex and check for files that exist in the PR but not in master.

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024

I currently use PyGithub for automated interactions with Github. It seem like this Issue is exactly what you are looking for. However if I understand it correctly it is NOT possible to set a different repositories branch as base (upstream in our case). See here. This is exactly what we would need. It does work in the other direction tho. This means that you can pull in changes from forks of the repository if you have the required credentials and permissions for the upstream repo (KITmetricslab/covid-19-forecast-hub).

This boils down to the following:

Executing

g = Github(your-credentials)
repo = g.get_repo("KITmetricslab/covid19-forecast-hub-de")
repo.create_pull(title="This is a test", body="Test", head="JADEUSC:master", base="master")

works and pulls the changes from JADEUSC:master (my fork) into KITmetricslab:master

however:

g = Github(your-credentials)
repo = g.get_repo("JADEUSC/covid19-forecast-hub-de")
repo.create_pull(title="This is a test", body="Test", head="master", base="KITMetricslab:master")

does not work. The second case is the one we would need.

The result of this is that it is necessary to have the Github credentials of someone in the developer team of the upstream repository to create the PR (otherwise the "get_repo" part fails), which is obviously not feasible (The same applies to the solution mentioned in the issue I linked).

The only solution I can come up with at the moment is a Github Action that is executed in the Upstream Repo (Main Forecast Hub repo), iterates over a list of known forks and creates a PR for every one of them. In this case the forecasters would only have to update their fork. (this could be done with the action you already wrote + some minor changes)
With this setup you still need the credentials, but you use the from within the main repository where they can be stored as github Secrets and you have full control over them.

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024

But this whole workflow would basically only save the forecasters one click on the create pull request button

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

Hmm that is a shame as its so close.

I came across hub which looks like it might be the answer. usage and repository and docs for hub-pull-request.

It's surprising you would think that but if other groups are like ours we have 3 ~ 4 submissions due on a Monday + our own scheduled publishing jobs and it all adds up when things need to be manually checked. I'd much rather use the cycles looking at a plot of the forecast etc or exploring the hub interactive than opening a PR and typing the same thing every week. It also means it has to be done every week and even with the best will in the world people forget.

from covid19-forecast-hub-de.

JADEUSC avatar JADEUSC commented on August 16, 2024

I will have a look at it!

Maybe i have to rephrase a bit. I am all for automation of repeated tasks! However, after taking all the above (no processing jobs in Hub, PR via "PR-tab", restrictions of Github api) into consideration, I dont think that the main repository of the Hub is the right place to do this.

My proposed solution expects a branch with a teams latest forecasts and automates the PR process. If this helps the team - great! However, Teams would still need to update their branch manually, thats why I said it basically "only" saves one click.

In your case, I think the easiest way would be to set up Github actions in your forked repository, execute the script you already ran on the hubs master branch on your fork (probably on a dedicated "forecasts" branch) and you are set. If we would then add the PR part in the main Hub (e.g pull seabbs:forecasts in master at 7pm every Tuesday) everything would be automated.
But this part is individual for each team (some teams upload their forecasts to their forks by hand and create a PR afterwards). Thats why I hesitade to think that we can provide more than the automated PR. I dont think that the hubs "reach" should go further than the creation of a pull request.

I could try to set up a "bare bone example fork" that could maybe be some guidance for my proposed workflow but the main pieces and scripts (where is are the forecasts coming from? How should they be preprocessed?) have to be provided by each team itself.

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

They wouldn't need to update the PR if using hub but I see your point.

I'll remove our action from this repo and work on our internal solution instead. I think it would be sensible to add some documentation surfacing the comments in this thread to others. Especially considering the automation of a fairly high percentage of the Hubs submissions via GitHub Actions in the hub repository (it certainly confused me).

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

Closing with #602

from covid19-forecast-hub-de.

seabbs avatar seabbs commented on August 16, 2024

note hub now superseded by cli: https://cli.github.com/manual/gh_pr_create

from covid19-forecast-hub-de.

jbracher avatar jbracher commented on August 16, 2024

Hi @seabbs and @JADEUSC,

I didn't have time to follow this discussion yesterday evening and given my limited github skills wouldn't have been much of a help. However, I confirm Jannik's statement that we can't allow pushing forecast files directly to the master branch (as a matter of fact, not even the Hub team are allowed to do that). That's also documented in the Wiki ("To ensure a standardized data format across all participating teams, new forecast data has to be submitted using pull requests from a forked version of this repository.", https://github.com/KITmetricslab/covid19-forecast-hub-de/wiki/Submission-via-github-and-commandline). We probably should have noticed this issue with your scripts earlier. But as Jannik said we did not originally intend to have teams' code run in our repo and therefore never added documentation or had specific procedures to check things. I'm sorry that this is taking up your time, but our main concern is to keep the system working for all teams. I hope you'll understand that. We'll add a brief statement to the documentation on the role of automated processing scripts.

Best, Johannes

from covid19-forecast-hub-de.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.