GithubHelp home page GithubHelp logo

CI fails are hard to view about rust-bitcoin HOT 9 OPEN

tcharding avatar tcharding commented on June 25, 2024
CI fails are hard to view

from rust-bitcoin.

Comments (9)

apoelstra avatar apoelstra commented on June 25, 2024

You can enable github notifications on CI jobs. (Somehow I did this for rust-bitcoin, but I forget how.)

On failing PRs you can also click the "Actions" tab up top which I think gives you a more accessible view.

from rust-bitcoin.

dpc avatar dpc commented on June 25, 2024

Github Action CI, not unlike most CI systems breaks down the moment things are no longer trivial.

Generally no matter where I go sooner or later I just have a handful of parallel CI machines/workflows for system-level parallelism with distjoin workload each, then each executes bash script that actually takes care of running stuff with e.g. Gnu Parallel, outputting only things that failed, and doing all sorts of things that are too fancy to express in yaml.

In short: IMO the way to go is to try to avoid having to touch any Yaml, and use real programming to actually handle the problem in scalable and portable way.

E.g. just recently I changed our backward compatibility matrix test that runs all the tests against matrix of component versions to print only output of the test that failed:

fedimint/fedimint#4526

because otherwise we would have to download it and analyze manually.

At this point I'm tempted to have a whole project with Rust command line tools for scripting CIs.

from rust-bitcoin.

apoelstra avatar apoelstra commented on June 25, 2024

We have mostly done this - our Github CI pretty-much just sets variables and calls our shellscript. But we use it to set the variables because then we get parallelism in the Github UI and (I think) we get more compute than we would if we were doing local parallelism within a single CI runner.

from rust-bitcoin.

dpc avatar dpc commented on June 25, 2024

Github will only allocate a fixed maxed amount of VMs running at the time for a project (I think 20?), so above certain number (depends on how many PRs typically run at the same) things are just start queuing and one needs to wait longer). In addition every VM has an initialization cost, so too many of them just makes things slower. But yeah - some VM-level (workflows, jobs) parallelism is great.

I now see rust-bitcoin has lots of jobs. IMO these would be better expressed as one GNU parallel run, with a benefit of that it would run in parallel locally as well.

See output from gnu parallel run that failed: https://github.com/fedimint/fedimint/actions/runs/8269976680/job/22626478246 . We get the output from ones that failed only + a summary. We could make the summary prettier or output only things that failed, but no one complained about it yet. :D

Another question is - do you really need to test so many of these on every PR. Possibly a PR could run a subset, and MQ (merge queue) the whole thing or something like that.

from rust-bitcoin.

tcharding avatar tcharding commented on June 25, 2024

Anecdotally I've found CI way slower since we added the 180 odd jobs.

from rust-bitcoin.

tcharding avatar tcharding commented on June 25, 2024

Another question is - do you really need to test so many of these on every PR. Possibly a PR could run a subset, and MQ (merge queue) the whole thing or something like that.

I'd really like to avoid getting green CI and having @apoelstra have to tell me when his local CI fails while trying to merge, that slows the process down. But I'd also like to run a more minimum set of CI jobs quickly to get better feedback. That is why I wrote just sane and it works pretty well but its missing some pieces (eg using the correct nightly version). This does beg the question that if devs are using local builds to check their work and the primary merge guy is using local builds to do pre-merge checks what is CI doing?

from rust-bitcoin.

junderw avatar junderw commented on June 25, 2024

what is CI doing?

Proving to onlookers that the commit passes CI.

If I look back on a PR from years ago where some strange change to cargo or my OS causes attempts to build that commit to fail, but I see Github CI passed at that time... I am more likely to write it off as "old stuff breaks sometimes" instead of "DID U GAIZ ACKSHUALLY TEST THIS GARBAGE!" accusations.

from rust-bitcoin.

junderw avatar junderw commented on June 25, 2024

Also, it catches the case where someone looks at the code, thinks "this should not affect anything" and tries to merge it with just a utACK.

Which doesn't really matter for this project in particular, but it's nice.

from rust-bitcoin.

apoelstra avatar apoelstra commented on June 25, 2024

CI covers several things that my local CI doesn't -- notably stuff like msan/asan andtesting on other architectures. I might also not be testing examples; IIRC that was hard/manual to set up.

Conversely my local CI tests every commit and has a larger feature matrix for the unit tests.

I've seen many cases where one would pass but not the other.

As for having CI run a random subset ... Murphy's law says that literally the first PR after we do that will exhibit some intermittent CI failure that isn't detected and then master will be broken :).

Anecdotally I've found CI way slower since we added the 180 odd jobs.

Yeah, I think this is true. Certainly, waiting for any specific job to run is way slower. I think we should re-consolidate some of the jobs and paralellize them using parallel or something.

from rust-bitcoin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.