Comments (16)
This would certainly be possible. The workflow is the way it is right now because we never really intendet to have teams submit their forecasts in an automatic manner and wanted all submissions and possible failures visible in the PR tab.
We only automated some submissions because the teams that run these models do not provide their data in our desired format.
Directly pushing to the master branch in an automatic manner could also lead to conflicts, especially when there are multiple teams using a setup like this in a short timeframe (tuesday in our case). Therfore, I dont think that directly pushing forecasts to the main repositories master branch is the right way to do this.
The "optimal" solution, at least in my opinion, would execute all the pre-processing steps in the teams fork and only create an automatic PR to this repository (Therfore the action would not run on this repository but on the teams fork). Otherwise we would end up with a lot more jobs that have to be managed in one repository.
I do agree with you that this could be improved upon, but i have to talk to @jbracher about it.
from covid19-forecast-hub-de.
Hi Sam,
I executed a manual rerun, you can see the output here: https://github.com/KITmetricslab/covid19-forecast-hub-de/runs/1701140680?check_suite_focus=true
The "problem" seems to be the fact that the files are already present in the hub repository. If this is the case the job should work next week.
from covid19-forecast-hub-de.
Fair enough. It sounds like we should provide our data in the wrong format then
I see the point about having too many jobs on master. Is you code adaptable for automating a PR from a fork or do you have suggestions for that. For the conflicts I think it shouldn't as presumably teams aren't submitting other teams forecasts?
Looking forward to the developments
from covid19-forecast-hub-de.
Yeah, in a perfect world there would be no conflicts but I personally would not take this risk on the repositories main master branch. I will have a quick look at the used packages, if I am able to get an automatic PR to the upstream repository to work I will report it here.
from covid19-forecast-hub-de.
No problem Johannes as I said before it has been good to clarify this issue.
from covid19-forecast-hub-de.
Hi Sam @seabbs,
the GithubAction did go through this week, however I realized that the forecasts are directly pushed to the master branch of the forecast-Hub repository. This is not the intendet way of submitting forecasts because it basically bypasses our submission checks (which are only executed whenever a pull requests is opened). Therefore, I would like to ask you to change the action to create a PR instead of directly comitting to the master branch.
This can be done in multiple ways:
- You run the action on your own fork of this repository and manually create a PR every Thuesday.
- A fully automated solution can be implemented aswell, the basic idea is as follows:
- Currently you Github actions job modifies the master branch, this should be changed. I currently do this by checking out a dedicated "model-branch" for each automated model (Im pretty sure that this could be improved upon)
- Create a PR from the action script. This is a little tricky because PR created like this do not trigger PR Hooks (which are necessary to run the validation script). As a workaround I created this file which can be called from the action script.
- You can see the whole workflow here (Line 43ff)
from covid19-forecast-hub-de.
Hi @JADEUSC,
Thanks for clarifying, it might be good to have some docs on this as it has been quite a time sink.
We would like to move away from manual submission if possible. I will have a look at updating to using a PR approach as you describe though I am a little busy this week so it may take a while. In the interim disabling might be the best solution and hopefully will be able to sort something out for next week or if not the week after.
from covid19-forecast-hub-de.
Looking at your PR check (
) I see a call to two python scripts. Wouldn't it be easier to check out to master as here and then as an extra step prior to pushing run those validations without trying to trigger a PR (i.e directly calling the python scripts)? As that would trigger an error it would prevent the next step from triggering so no commit?Also looking at these couldn't you do a has this changed check (i.e something like this (my python is bad): https://stackoverflow.com/questions/33733453/get-changed-files-using-gitpython) and so only validate forecasts that have actually been updated/added rather than all forecasts present? If doing so in a PR would need to do something more complex and check for files that exist in the PR but not in master.
from covid19-forecast-hub-de.
I currently use PyGithub for automated interactions with Github. It seem like this Issue is exactly what you are looking for. However if I understand it correctly it is NOT possible to set a different repositories branch as base (upstream in our case). See here. This is exactly what we would need. It does work in the other direction tho. This means that you can pull in changes from forks of the repository if you have the required credentials and permissions for the upstream repo (KITmetricslab/covid-19-forecast-hub).
This boils down to the following:
Executing
g = Github(your-credentials)
repo = g.get_repo("KITmetricslab/covid19-forecast-hub-de")
repo.create_pull(title="This is a test", body="Test", head="JADEUSC:master", base="master")
works and pulls the changes from JADEUSC:master (my fork) into KITmetricslab:master
however:
g = Github(your-credentials)
repo = g.get_repo("JADEUSC/covid19-forecast-hub-de")
repo.create_pull(title="This is a test", body="Test", head="master", base="KITMetricslab:master")
does not work. The second case is the one we would need.
The result of this is that it is necessary to have the Github credentials of someone in the developer team of the upstream repository to create the PR (otherwise the "get_repo" part fails), which is obviously not feasible (The same applies to the solution mentioned in the issue I linked).
The only solution I can come up with at the moment is a Github Action that is executed in the Upstream Repo (Main Forecast Hub repo), iterates over a list of known forks and creates a PR for every one of them. In this case the forecasters would only have to update their fork. (this could be done with the action you already wrote + some minor changes)
With this setup you still need the credentials, but you use the from within the main repository where they can be stored as github Secrets and you have full control over them.
from covid19-forecast-hub-de.
But this whole workflow would basically only save the forecasters one click on the create pull request button
from covid19-forecast-hub-de.
Hmm that is a shame as its so close.
I came across hub
which looks like it might be the answer. usage and repository and docs for hub-pull-request
.
It's surprising you would think that but if other groups are like ours we have 3 ~ 4 submissions due on a Monday + our own scheduled publishing jobs and it all adds up when things need to be manually checked. I'd much rather use the cycles looking at a plot of the forecast etc or exploring the hub interactive than opening a PR and typing the same thing every week. It also means it has to be done every week and even with the best will in the world people forget.
from covid19-forecast-hub-de.
I will have a look at it!
Maybe i have to rephrase a bit. I am all for automation of repeated tasks! However, after taking all the above (no processing jobs in Hub, PR via "PR-tab", restrictions of Github api) into consideration, I dont think that the main repository of the Hub is the right place to do this.
My proposed solution expects a branch with a teams latest forecasts and automates the PR process. If this helps the team - great! However, Teams would still need to update their branch manually, thats why I said it basically "only" saves one click.
In your case, I think the easiest way would be to set up Github actions in your forked repository, execute the script you already ran on the hubs master branch on your fork (probably on a dedicated "forecasts" branch) and you are set. If we would then add the PR part in the main Hub (e.g pull seabbs:forecasts in master at 7pm every Tuesday) everything would be automated.
But this part is individual for each team (some teams upload their forecasts to their forks by hand and create a PR afterwards). Thats why I hesitade to think that we can provide more than the automated PR. I dont think that the hubs "reach" should go further than the creation of a pull request.
I could try to set up a "bare bone example fork" that could maybe be some guidance for my proposed workflow but the main pieces and scripts (where is are the forecasts coming from? How should they be preprocessed?) have to be provided by each team itself.
from covid19-forecast-hub-de.
They wouldn't need to update the PR if using hub
but I see your point.
I'll remove our action from this repo and work on our internal solution instead. I think it would be sensible to add some documentation surfacing the comments in this thread to others. Especially considering the automation of a fairly high percentage of the Hubs submissions via GitHub Actions in the hub repository (it certainly confused me).
from covid19-forecast-hub-de.
Closing with #602
from covid19-forecast-hub-de.
note hub
now superseded by cli
: https://cli.github.com/manual/gh_pr_create
from covid19-forecast-hub-de.
I didn't have time to follow this discussion yesterday evening and given my limited github skills wouldn't have been much of a help. However, I confirm Jannik's statement that we can't allow pushing forecast files directly to the master branch (as a matter of fact, not even the Hub team are allowed to do that). That's also documented in the Wiki ("To ensure a standardized data format across all participating teams, new forecast data has to be submitted using pull requests from a forked version of this repository.", https://github.com/KITmetricslab/covid19-forecast-hub-de/wiki/Submission-via-github-and-commandline). We probably should have noticed this issue with your scripts earlier. But as Jannik said we did not originally intend to have teams' code run in our repo and therefore never added documentation or had specific procedures to check things. I'm sorry that this is taking up your time, but our main concern is to keep the system working for all teams. I hope you'll understand that. We'll add a brief statement to the documentation on the role of automated processing scripts.
Best, Johannes
from covid19-forecast-hub-de.
Related Issues (20)
- Automatically add case data from RKI, JHU and ECDC for DE + PL
- Check whether Imperial also provide case forecasts
- Change handling of dates for IHME as discussed
- automomatic comparison of ECDC and RKI data HOT 1
- Prepare truth data for app to speed up loading
- Extract truth data in weekly format for plots HOT 2
- Extend forecasts_to_plot.csv file
- Integrate evaluation of forecasts in Travis Job
- Add Polish Truth data to repo HOT 1
- Integrate voivodeship level data into data-to-plot
- Check format of Poland data
- Discrepancies data truth 2020/10/21 2020/10/22 HOT 1
- Spelling mistake in new app HOT 1
- EpiForecast github action HOT 1
- EpiNow2 state level forecasts HOT 1
- State name change in RKI truth data HOT 6
- Add caching of python dependencies
- Change forecast preview HOT 1
- Auto Baselines HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covid19-forecast-hub-de.