Comments (4)
Is this actually a build problem? Shouldn't your implementation in fuzz.cpp
e.g. take the current time and use that as a seed (or even better, take it via a command line flag or environment variable so that it's actually reproducible)? You could quantize that time if you need reproducible execution over a given period of time.
It seems to me that the part that you really care about is not the buck2 build
part, but rather the buck2 run
one, which isn't cached at all.
from buck2.
Is this actually a build problem? Shouldn't your implementation in fuzz.cpp e.g. take the current time and use that as a seed (or even better, take it via a command line flag or environment variable so that it's actually reproducible)? You could quantize that time if you need reproducible execution over a given period of time.
It may not be a build problem, but more generally an automation problem, but that might come down to just semantics. I'd be quite happy for this to look something like;
buck2 run_pipeline //:some_pipeline
It's true you could use a consistent seed for fuzzing and just fuzz n-iterations and it's possible to get reproducible outputs in that case. However there are still use-cases where it's nice to have a full execution graph (that buck2 provides via DICE) where each node in the execution graph is not necessarily reproducible. A more concrete (though still toy) example of a non-reproducible execution graph might include;
- Use 5 different scanners to detect a mis-configurations in a website, each outputting there own json file.
- Take said json files and convert them into a markdown file.
- Take the markdown file and convert it to a static html page for developer to view.
But let's say that that the scanner's take 40min to run, and don't need to be run all that often. So having some caching involved would be great, but then you wouldn't want them to be cached indefinitely (which would be the case with buck2). It's also the case that this execution graph is by definition non-reproducible because the website that is being scanned is outside of your control.
Buck2 solves 90% of this automation problem by handling execution graphs, remote execution and caching etc. I'm aware that this doesn't necessarily fit with the primary goal of buck2 being a build system. But it has enough overlap for me to find it interesting as a generalised declarative automation framework. Does this sound too far out of left field for buck2? I'm aware that this is kind of build-system adjacent.
It seems to me that the part that you really care about is not the buck2 build part, but rather the buck2 run one, which isn't cached at all.
This is sort of true, although I think what I'm hoping for is something like buck2 run
but re-using some of the execution graph semantics in the runtime space.
from buck2.
- Never cache
Yeah, so we've talked about adding support for this kind of a thing before, primarily under the name "volatile actions." I think the hypothetical API is that when you call ctx.actions.run
, you can specify volatile = True
and then your action will get rerun on every command. Obviously this would need to be used with care.
The use-case that we had in mind at the time is better integration with system toolchains; for example, maybe you want to invalidate all your rust library builds when you upgrade your rustc version. You could define a volatile action that prints the rustc version into a file, and then add that as a never-read input to every rustc action.
I think the vibe on volatile actions is basically positive. Just needs someone to go and write some code I think.
- Cache expires after N seconds
- Cache expires on cron schedule
These two seem like they could be implemented on top of the first one. You can have a volatile action that prints the current timestamp / 3600 to a file, and then depend on that file from every other action - at the top of the hour, the contents of that file will change and your actions get invalidated.
I suppose that's not exactly the same as "expire after 1 hour," but its pretty close. If you don't care about RE, then you can actually modify this scheme to use incremental actions and then get exactly those semantics (have an action that writes the current timestamp to its output, if its been more than 1 hour since the timestamp written there right now).
- Lazy cache evaluation i.e. immediately return cached artifact and then update it next time it's used.
This one I'm a bit more hesitant on. My concern though isn't around the caching, but rather around the action execution management. Action executions currently are clearly tied to the lifetime of a single command, ie they are executed as part of that command, need to finish before the command can finish, and are cancelled if the command is cancelled. What you're suggesting seems like it would be a deviation from that, which I think is probably hard to do correctly, both in principle and in practice.
from buck2.
I've thought about the fuzzing thing a number of times, and I sort of came to the conclusion that you probably want to fix the seeds in your fuzzing tests and try to have a reasonable amount of them if you expect them to run under buck2 test
or whatnot. Actual major-scale runs of fuzzing e.g. with Clusterfuzz should probably be done by deploying some other kind of artifact (e.g. an OCI image to be deployed and probed.)
But "Volatile actions" are also really useful for a lot of other random things where a program may need to invoke some kind of ambient side effect on the system, which can actually be used to improve the precision of dependency tracking. When combined with early cut-off, a lot of the time they aren't so bad, like this example:
You could define a volatile action that prints the rustc version into a file, and then add that as a never-read input to every rustc action.
This is actually a great example that I used to do all the time when using Shake (through a feature called "Oracles.") I think it's really important for some cases. For example, let's say a user builds a project with CC=gcc
as the compiler, it's just picked up off $PATH
. Then they do a global system upgrade to their whole system, getting a new C compiler. If the user then enters the project and tries to build, it won't rebuild anything, because nothing seems to have changed; the build system can't track anything more than the fact it invokes "$CC" to compile objects, and as far as it can tell that command still exists just fine (probably /usr/bin/gcc
, so even the path doesn't tell you anything), so there's nothing left to do.
In C or C++, this kind of mistake isn't so bad, because they have de-facto stabilized ABIs. This exact case can happen today in Buck2 with system_rust_toolchain
and system_cxx_toolchain
; but in the case of Rust, this error could cause catastrophic and hard-to-understand build failures. You can upgrade rustc
, add a new library, run buck2 build
, and now you might end up with rlibs
that were cached and compiled previously with an old compiler, and rlibs
that were compiled freshly with the upgraded compiler, and you will be lucky if the linker just explodes on them.
from buck2.
Related Issues (20)
- Feature Request: Shutdown buck2 server after some idle time
- --show-output (and related options show nothing in column 2) when using DefaultInfo with default_outs HOT 1
- Full rebuild after killing daemon HOT 8
- Suggestion: Use tar or zip for release artifacts
- Why can not Buck trigger Buildbarn's g++ using system_cxx_toolchain?
- 'buck2 build //tests:' can not find 'sh' when using case 'buck2/examples/remote_execution/buildbarn/tests'
- License details (for SPDX compliance) HOT 2
- Buck2 performance on single-file update in erlang project HOT 13
- `SRCDIR` is not absolute in genrule script HOT 2
- It should be possible to consume multiple `default_outputs` in an `attrs.list(attrs.source())` HOT 5
- RE: upload cancelled with "stream error: stream no longer needed" HOT 5
- Zig support HOT 2
- Handling environment variables for local and remote builds HOT 2
- How can I check and/or override the path for the linker for rust/cxx? HOT 1
- C++ mixed linking HOT 4
- Read output of `actions.run` from BXL HOT 6
- Making sha256 or sha1 sum for http_archive optional HOT 3
- Transitive shared libraries not added to rpath of otherwise static binaries HOT 3
- Errant octal escape strings in prelude docs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from buck2.