GithubHelp home page GithubHelp logo

regro / cf-scripts Goto Github PK

View Code? Open in Web Editor NEW
45.0 45.0 70.0 5.44 MB

Flagship repo for cf-regro-autotick-bot

License: Other

Python 96.86% Batchfile 0.14% Shell 2.78% Xonsh 0.03% Go 0.01% Dockerfile 0.18%

cf-scripts's People

Contributors

beckermr avatar bgruening avatar bollwyvl avatar chrisburr avatar cj-wright avatar dbast avatar dependabot[bot] avatar duncanmmacleod avatar ericdill avatar h-vetinari avatar henryiii avatar hmaarrfk avatar isuruf avatar jaimergp avatar jakirkham avatar jayfurmanek avatar jdblischak avatar justcalamari avatar leofang avatar maresb avatar mariusvniekerk avatar mbargull avatar minrk avatar ocefpaf avatar regro-cf-autotick-bot avatar scopatz avatar viniciusdc avatar wolfv avatar xhochy avatar ytausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cf-scripts's Issues

Write out human readable graph and other scripts

@CJ-Wright commented on Tue Feb 27 2018

Human readable graphs (eg in yaml) take a bunch of time to write. Additionally we have some scripts that we want to run but don't want to take up much needed time from the actual work of the bot (top 100 lists, checking for cyclic deps, etc.). I think we can tack some of these on 01 since that only uses about 15 mins of time total.

Spoof different OSs

@CJ-Wright commented on Fri Feb 23 2018

We may need to spoof different OSs for reading the meta.yaml (for packages which have different urls/sha for different OSs)

eg
https://github.com/conda-forge/git-lfs-feedstock/pull/6/files


@CJ-Wright commented on Thu Mar 01 2018

Also conda-forge/git-feedstock#33


@jakirkham commented on Thu Mar 01 2018

IDK if you are using conda-build 2 or 3. If 2, I can point you to some code in conda-smithy/conda-build-all for this. If 3, there is a feature in conda-build that you can use.


@CJ-Wright commented on Thu Mar 01 2018

Currently we are using 2, but I don't think there is any reason we can't use 3, so which ever will be easier to implement.


@jakirkham commented on Thu Mar 01 2018

Probably both are easy. Will leave it up to you.

We use a function called fudge_subdir to do this in conda-smithy (though it will be gone in conda-smithy 3). In conda-build 3, there is a function called render.

Moving to a web service

It’s certainly reasonable to start out with a corn job for these sorts of things. Also as we resolve some technical debt, the cron job is very helpful. That said, we have generally found in conda-forge that cron jobs inevitably struggle to scale.

To solve this problem, have ultimately moved all of them to web services that use webhooks. This allows them to deal with notifications as they come in and respond by doing some task. This approach seems well suited for updates. However it will require some thought into how we can get notifications from package indexes, GitHub, etc. Expect this will iron out any issues related to load.

version missed

@CJ-Wright commented on Thu Mar 01 2018

conda-forge/plumpy-feedstock#2
https://github.com/conda-forge/plumpy-feedstock/blob/master/recipe/meta.yaml#L2


@CJ-Wright commented on Thu Mar 01 2018

At this point we should have a matrix of find and replaces which also includes some good regex matches (arb number of spaces) and a correcting system (normalize the output)


@jakirkham commented on Fri Mar 02 2018

This becomes more important when we start to consider downstream packages that have tight version constraints on upstream packages. Admittedly that can get into a whole other can of worms finding the dependencies (w/version constraints) and using them to solve how best to update the graph.


@CJ-Wright commented on Fri Mar 02 2018

I meant just normalizing the jinja2 variables. Yes, optimal bumping on the graph would be awesome. It may even be possible as soon as we get rid of the cycles in our otherwise perfect DAG (although maybe we can excise the cycles from the DAG and handle them separately?)

DOC: encourage maintainers to push to bot branches

In the PR body we should include a statement which encourages maintainers to push to bot branches as needed. The bot itself doesn't push to these branches (unless we specify a redo in the graph, which is rare) and the bot devs aren't really equipped to troubleshoot each feedstock's issues.

Multi package/source recipes

With conda build 3 there is a capacity for multiple package/source recipes. This may create a whole bunch of issues for the bot.

  1. Our whole replacement setup is based off of singular jinja2 variables and values for keys, this could break that a whole bunch. (this has come up with multi os checksums)
  2. I'm not certain what this does to the structure of the graph.

Overwrites {{ build }} jinja

@tkelman commented on Fri Feb 23 2018

see conda-forge/awscli-feedstock#29


@CJ-Wright commented on Fri Feb 23 2018

Thank you for reporting!


@CJ-Wright commented on Fri Feb 23 2018

This came about because you were able to update the version before the bot realized there is a new version. Normally that PR would have included a version bump.

@scopatz thoughts?
This is being overwritten because in the replacements a) there is no {{ build ...} in our replacement setup, b) there is also a replacement which overwrites the build setting. I don't suppose that there is a meaningful way to do this in the regex?

To be fair, having the build number as a jinja variable is a bit uncommon, especially as it is usually only used once, hence why we didn't include it in the initial setup.


@scopatz commented on Mon Feb 26 2018

Well, you could check and make sure that the build: NUM really is just a a simple number before overwriting it. If it isn't you can punt.


@scopatz commented on Mon Feb 26 2018

Basically, this comes down to the fact that the ticker is assuming that there is a specific form of meta.yaml, and if it isn't in that form, it will fail

Refresh metadata

Sometimes during version updates, URLs and summaries change. It would be nice if the bot could add these changes as well. This should be possible in Python by running python setup.py egg_info inside the sdist and parsing *.egg-info/PKG-INFO. There may also be a way to get this info from the PyPI API. Might be possible to pick up on Python version support as well (e.g. is Python 2 supported or now dropped?).

This overlaps to some extent with issue ( #22 ).

Try adding noarch

noarch all most of the things!

In the process of bumping version numbers on many feedstocks I found that a lot of them may be good candidates for noarch. It would be good to have something that goes round and tries to build noarch version of feedstocks which match some criteria which make a project likely to be noarch friendly.
I expect that some of the criteria would be:

  1. no selectors
  2. no toolchain dep
  3. python installed

I don't know if that is a comprehensive list, but it may be better to just put out the PRs and see if they work.

More efficient use of Travis/API calls

Currently we have split up the scripts over different repos, however that may not be the most efficient use of Travis or our API calls. We could consider back filling our current scripts with 03 to fill up the API calls and other status scripts (cycle checking, current PR/out-of-date, etc.) to fill up the time.

Current usage status:

Script API calls time
00 few 4:12
01 3756 (many) 21:09
02 few 7:10
03 few 43:43

For example we could tack a run of 03 onto 00 to almost double our time creating PRs.

Note that the main changes required are:

  1. Each worker needs to run a generic script which we can update to have subscripts
  2. Each worker needs to keep track of their internal time better than we are doing currently, so we can stop before we get timed out (otherwise we may not write to the graph critical information back to the repo)

restart bad and bad upstream

As we make enhancements to the code we should consider cleaning out bad and bad upstream. That way things that we couldn't process we can get another shot at.

manual/auto version bump PRs yield different in CI results

There are two PRs in this feedstock, one created manually by me, and the other was created by the autotick-bot:

conda-forge/menuinst-feedstock#1

and

conda-forge/menuinst-feedstock#2

The changeset is exactly the same, but the CI succeeds in the manual PR, but fails in the bot PR with this error

INFO:conda.gateways.disk.delete:rm_rf failed for c:\users\appveyor\appdata\local\temp\1\tmpl4pwk8
Traceback (most recent call last):
  File "C:\Miniconda-x64\Scripts\conda-build-script.py", line 5, in <module>
    sys.exit(conda_build.cli.main_build.main())
  File "C:\Miniconda-x64\lib\site-packages\conda_build\cli\main_build.py", line 342, in main
    execute(sys.argv[1:])
  File "C:\Miniconda-x64\lib\site-packages\conda_build\cli\main_build.py", line 333, in execute
    noverify=args.no_verify)
  File "C:\Miniconda-x64\lib\site-packages\conda_build\api.py", line 97, in build
    need_source_download=need_source_download, config=config)
  File "C:\Miniconda-x64\lib\site-packages\conda_build\build.py", line 1524, in build_tree
    config=config)
  File "C:\Miniconda-x64\lib\site-packages\conda_build\build.py", line 1159, in build
    built_package = bundlers[output_dict.get('type', 'conda')](output_dict, m, config, env)
  File "C:\Miniconda-x64\lib\site-packages\conda_build\build.py", line 939, in bundle_conda
    path_to_package=tmp_path)
  File "C:\Miniconda-x64\lib\site-packages\conda_verify\verify.py", line 30, in verify_package
    getattr(package_check, method)() is not None]
  File "C:\Miniconda-x64\lib\site-packages\conda_verify\checks.py", line 309, in check_windows_arch
    file_object_type = get_object_type(file_header)
  File "C:\Miniconda-x64\lib\site-packages\conda_verify\utilities.py", line 117, in get_object_type
    return "DLL " + DLL_TYPES.get(i)
TypeError: cannot concatenate 'str' and 'NoneType' objects
Command exited with code 1

Why is this happening? I imagine that build config is different for bot and manual, but probably there is something to be fixed here.

Awesome data visualization

We now have quite the treasure trove of data on how packages depend on one another. It would be awesome to have some cool visualization of this data.

track build and run reqs

@CJ-Wright commented on Sat Feb 24 2018

This might be nice for understanding the highest build and run deps separately.


@jakirkham commented on Fri Mar 02 2018

What sorts of things are you thinking about here? Are you contemplating what rebuilds of packages might be needed based on an upstream version change?


@CJ-Wright commented on Fri Mar 02 2018

Yes, also separating the most build depended on packages from the most run depended on packages (maybe for stress testing).

Triggering rebuilds of dependency chains

Admittedly this may not always make sense. However in case that it does, this would be very useful. Namely would be good to trigger rebuilds of downstream dependencies when an upstream dependency is rebuilt. As a simple example, oniguruma and jq.

Handling updates where version is more complex

We have some cases like this one where the version in Jinja ends up being a function of other Jinja variables. This is typically motivate by two things AFAIAA.

  1. The URL is some more complex combination of version components.
  2. The version is needed to pin its dependencies somehow.

These might be rare enough that the answer is we adjust the recipes so the bot can update them more easily. Figured I'd raise the issue anyways though to see if anyone had other ideas.

Not super clear what's going on if only main package is being updated.

@pkgw commented on Sat Feb 24 2018

I just got this PR from the bot. I was able to figure out the intent, but the initial message that it posted was not super clear to me, due to the empty table of "pending dependencies" (I'm not sure what "pending" means here). If I'm understanding the purpose of the bot properly, I think it would be helpful to have some text in the PR message along these lines:

This PR updates $PACKAGE to the latest version on PyPI, $VERSION, from $OLD_VERSION. It also updates the following dependencies in the meta.yaml file: $BLAH ...


@CJ-Wright commented on Sat Feb 24 2018

Thank you for reporting!

The purpose of the bot is to tick versions, we currently don't have a way to update which dependencies are in the recipe. The pending dependencies are stated dependencies which also need to be version bumped (since one may want to wait for the deps to be updated before updating the downstream packages).

The pending dependencies table is being removed in #25 (if there are no pending deps).

We can add something to the effect of your statement, although we support more than pypi.
eg

This PR updates $PACKAGE to $VERSION from $OLD_VERSION.

Keep track of status

  1. How many packages are currently out of date?
  2. How many packages are out of date and have a PR against them?
  3. How many packages have unknown upstream version numbers?

Bot should check whether PRs are already open with update

@bsipocz commented on Fri Mar 02 2018

First of all, thank you for this bot, I'm sure many maintainers will agree that this is a great to have feature to conda-forge.

I have one feature request though. We usually include the conda-forge update in our release procedure, so it happened already a few times that there is already a version update PR (usually waiting for CIs to pass) when the bot opens one, too clogging the CI services even more. I think it would be rather awesome if it would check not only the main repo, but the content of already opened PRs, too.


@CJ-Wright commented on Fri Mar 02 2018

Thank you for reporting!


@CJ-Wright commented on Fri Mar 02 2018

This may need its own script and/or worker since it may be GH API heavy.
My understanding of what needs to happen:

feedstocks = get_feedstocks()
for feedstock in feedstocks:
    for PR in feedstock.PRs():
        get_yaml()
        update_node_attributes_with_new_info()

@isuruf commented on Fri Mar 02 2018

Or the linter can write to a file/db on each PR and the bot could check this file/db


@CJ-Wright commented on Fri Mar 02 2018

This may need atomic-like operations on the graph.
I'll open another issue to discuss that.
See: https://github.com/regro/cf-graph/issues/52


@jakirkham commented on Fri Mar 02 2018

Definitely agree with this issue. Though I wonder to what extent this is a consequence of the bot recently coming online vs. a recurrent problem we will face well into the future (if not otherwise addressed).


@bsipocz commented on Fri Mar 02 2018

@jakirkham - you're probably right, if this bot becomes the default behaviour I suspect most maintainers will top opening those update PRs the first place. However in that case having a way to opt out may be useful.


@CJ-Wright commented on Fri Mar 02 2018

@bsipocz although the bot is currently pushing the CI's rather hard, I think it will get easier once we enter steady state (and finish running through all the packages). At that point I think it would be ok to just close the bot's PRs. My assumption (which may not be true) is that the rate of version bumps will be slow enough that the bot opening an erroneous PR would not be too burdensome.

How to handle hash 404 errors

So it seems that some of the feedstocks we fail on are due to 404 errors.
Some of these are from the discrepancy between pypi.io and pypi.python.org
Others are 404 errors from github where there is a missmatch between what 02 found and what exists.

Subscribing to packages

@jakirkham commented on Thu Mar 01 2018

Not sure how you are checking for updates currently, but you may find issue ( pypi/warehouse#1683 ) interesting. Basically asking PyPI to include some sort of feed for subscribing to specific packages for updates.


@CJ-Wright commented on Fri Mar 02 2018

That would be very cool.
Currently we troll through all of the packages looking at their upstream URLs, but having all the pypi packages managed through their own stream would be great!


@jakirkham commented on Fri Mar 02 2018

By upstream URLs do you mean home, dev_url, or something else?


@CJ-Wright commented on Fri Mar 02 2018

Wherever the meta.yaml describes url to be.

Track entire yaml

We should track the entire meta.yaml. There are just too many interesting introspections that need more than what we are currently tracking.

Capture conda build version

@CJ-Wright commented on Mon Feb 19 2018

I'm not certain if/how possible this is, but it might be nice to capture the conda build version being used and the binary compatibility information.


@msarahan commented on Mon Feb 19 2018

conda-build version used is easy. It's encoded in a package's about.json, so you don't need to record it at build time.

Binary compatibility is a lot harder. It's different on every platform, and I think you'd need to start up some kind of database to match up symbols provided with symbols used. Conda-build 3's run_exports is a decent approximation, but not truly tracking binary compatibility directly.

Have webservice write to the graph

@CJ-Wright commented on Wed Feb 21 2018

It would be nice to have the webservices write to the graph as it would eliminate some of the workers.

  1. On merge of a staged-recipes PR add information to the graph about the new package (add the node and it's meta.yaml info
  2. On merge of a feedstock PR update the cf version number in the graph.

This would eliminate scripts 00 and 01.
From @isuruf

Writing to Graph is not Atomic

@justcalamari commented on Fri Mar 02 2018

We can run into problems when multiple jobs update the graph at the same time. We do not pull before pushing with doctr, so if the repo has been updated the push will fail. If we do pull, we can also have merge conflicts when the graph is updated by multiple people/bots, so we need a way to either prevent such merge conflicts or resolve them correctly.

The jinja var for build number is ignored

@ocefpaf commented on Thu Mar 01 2018

See conda-forge/dropbox-feedstock#4 (comment)

IMO we should just drop the jinja variable for build number. It is silly to have a variable that s used in one place. If that action is taken there is nothing to do in cf-graph. If not we need to fix the bot πŸ˜’


@dougalsutherland commented on Thu Mar 01 2018

The reason I've often used it is just because it makes it harder to forget to reset the build number to 0 when you're bumping the version yourself if the build number is specified right next to the version, instead of potentially 15 lines away. It makes just as much sense as an sha256 variable that's also only used once (as you noted there)....

If both sha256 and build variables are going away in general to make life easier for bots, that's fine in that the build will definitely break if you forget to update the checksum. :) (Alternatively, maybe the linter could check that the build number is 0 for package versions that don't have a build yet?)


@ocefpaf commented on Thu Mar 01 2018

The reason I've often used it is just because it makes it harder to forget to reset the build number to 0 when you're bumping the version yourself if the build number is specified right next to the version, instead of potentially 15 lines away. It makes just as much sense as an sha256 variable that's also only used once (as you noted there)....

I disagree but I am more used to the conda recipe format than most people. Anyway, I don't oppose to use the jinja, but I would love to get rid of the excess, like the file extension for example πŸ˜„

If both sha256 and build variables are going away in general to make life easier for bots, that's fine in that the build will definitely break if you forget to update the checksum. :)

Yeah, in like of the age of the bots all this may change πŸ˜„


@CJ-Wright commented on Thu Mar 01 2018

I think we can implement either way. I like the idea of using the bot to PR in basic maintenance to recipes (removing excess variables, fixing jinja variables (there was at least on instance of {%set... rather than {% set...).

At the end of the day though the bots serve the humans, so unless a change/feature is especially painful for the bot, I'd rather have things be human friendly than tailored to the bot.


@CJ-Wright commented on Fri Mar 02 2018

Attn: @jakirkham


@CJ-Wright commented on Fri Mar 02 2018

@justcalamari would you mind taking a look at this?

pre-releases

@CJ-Wright commented on Thu Mar 01 2018

It seems that some pre-releases are getting through.
I think we've blacklisted rc in new version numbers. Maybe we need to add dev. It might be nice if we had a format for pre-release tags.


@CJ-Wright commented on Thu Mar 01 2018

Attn: @ocefpaf


@ocefpaf commented on Thu Mar 01 2018

If you want to filter them out you can use pep440 to get a list of valid versions.


@CJ-Wright commented on Thu Mar 01 2018

See:


@jakirkham commented on Fri Mar 02 2018

PEP440 will work with Python packages. However not all packages are Python (or follow Python versioning rules). Might be good to reach out to conda and conda-build devs about what is a good indicator of an rc, dev, ... version.

cc @kalefranz @msarahan @mingwandroid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.