We now have 187 CI jobs. When a job fails one has to scroll through the list to find i

CI fails are hard to view about rust-bitcoin HOT 9 OPEN

tcharding commented on June 25, 2024

CI fails are hard to view

from rust-bitcoin.

Comments (9)

apoelstra commented on June 25, 2024

You can enable github notifications on CI jobs. (Somehow I did this for rust-bitcoin, but I forget how.)

On failing PRs you can also click the "Actions" tab up top which I think gives you a more accessible view.

from rust-bitcoin.

dpc commented on June 25, 2024

Github Action CI, not unlike most CI systems breaks down the moment things are no longer trivial.

Generally no matter where I go sooner or later I just have a handful of parallel CI machines/workflows for system-level parallelism with distjoin workload each, then each executes bash script that actually takes care of running stuff with e.g. Gnu Parallel, outputting only things that failed, and doing all sorts of things that are too fancy to express in yaml.

In short: IMO the way to go is to try to avoid having to touch any Yaml, and use real programming to actually handle the problem in scalable and portable way.

E.g. just recently I changed our backward compatibility matrix test that runs all the tests against matrix of component versions to print only output of the test that failed:

fedimint/fedimint#4526

because otherwise we would have to download it and analyze manually.

At this point I'm tempted to have a whole project with Rust command line tools for scripting CIs.

from rust-bitcoin.

apoelstra commented on June 25, 2024

We have mostly done this - our Github CI pretty-much just sets variables and calls our shellscript. But we use it to set the variables because then we get parallelism in the Github UI and (I think) we get more compute than we would if we were doing local parallelism within a single CI runner.

from rust-bitcoin.

dpc commented on June 25, 2024

Github will only allocate a fixed maxed amount of VMs running at the time for a project (I think 20?), so above certain number (depends on how many PRs typically run at the same) things are just start queuing and one needs to wait longer). In addition every VM has an initialization cost, so too many of them just makes things slower. But yeah - some VM-level (workflows, jobs) parallelism is great.

I now see rust-bitcoin has lots of jobs. IMO these would be better expressed as one GNU parallel run, with a benefit of that it would run in parallel locally as well.

See output from gnu parallel run that failed: https://github.com/fedimint/fedimint/actions/runs/8269976680/job/22626478246 . We get the output from ones that failed only + a summary. We could make the summary prettier or output only things that failed, but no one complained about it yet. :D

Another question is - do you really need to test so many of these on every PR. Possibly a PR could run a subset, and MQ (merge queue) the whole thing or something like that.

from rust-bitcoin.

tcharding commented on June 25, 2024

Anecdotally I've found CI way slower since we added the 180 odd jobs.

from rust-bitcoin.

tcharding commented on June 25, 2024

Another question is - do you really need to test so many of these on every PR. Possibly a PR could run a subset, and MQ (merge queue) the whole thing or something like that.

I'd really like to avoid getting green CI and having @apoelstra have to tell me when his local CI fails while trying to merge, that slows the process down. But I'd also like to run a more minimum set of CI jobs quickly to get better feedback. That is why I wrote just sane and it works pretty well but its missing some pieces (eg using the correct nightly version). This does beg the question that if devs are using local builds to check their work and the primary merge guy is using local builds to do pre-merge checks what is CI doing?

from rust-bitcoin.

junderw commented on June 25, 2024

what is CI doing?

Proving to onlookers that the commit passes CI.

If I look back on a PR from years ago where some strange change to cargo or my OS causes attempts to build that commit to fail, but I see Github CI passed at that time... I am more likely to write it off as "old stuff breaks sometimes" instead of "DID U GAIZ ACKSHUALLY TEST THIS GARBAGE!" accusations.

from rust-bitcoin.

junderw commented on June 25, 2024

Also, it catches the case where someone looks at the code, thinks "this should not affect anything" and tries to merge it with just a utACK.

Which doesn't really matter for this project in particular, but it's nice.

from rust-bitcoin.

apoelstra commented on June 25, 2024

CI covers several things that my local CI doesn't -- notably stuff like msan/asan andtesting on other architectures. I might also not be testing examples; IIRC that was hard/manual to set up.

Conversely my local CI tests every commit and has a larger feature matrix for the unit tests.

I've seen many cases where one would pass but not the other.

As for having CI run a random subset ... Murphy's law says that literally the first PR after we do that will exhibit some intermittent CI failure that isn't detected and then master will be broken :).

Anecdotally I've found CI way slower since we added the 180 odd jobs.

Yeah, I think this is true. Certainly, waiting for any specific job to run is way slower. I think we should re-consolidate some of the jobs and paralellize them using parallel or something.

from rust-bitcoin.

CI fails are hard to view about rust-bitcoin HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs