Deion of the bug: Output directories declared with <code cla

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Actually, it was flipped in Bazel 6.0.0 (commit <a class="commit-link" da

Bazel 7 broke nested action directory outputs about bazel HOT 8 OPEN

jmillikin commented on June 15, 2024

Bazel 7 broke nested action directory outputs

from bazel.

Comments (8)

jmillikin commented on June 15, 2024

cc @tjgq as this may be related to #18646

from bazel.

tjgq commented on June 15, 2024

This is an intentional change; the associated incompatible flag is actually --incompatible_strict_conflict_checks, which was flipped in Bazel 7 and will be deleted in Bazel 8 (see #16729).

from bazel.

jmillikin commented on June 15, 2024

Hmm. Is that intentional change documented anywhere? I might just be missing it, but I don't see it mentioned anywhere in the v7.0 release notes.

And also, is it possible to undo that change? I don't understand the motivation behind it, and nested output directories are extremely useful for working with certain language ecosystems.

For example, Javascript package managers produce a node_modules/ output directory with per-package subdirectories, and those packages might themselves contain sub-packages (for TypeScript declarations, etc). Representing those various directory pseudo-roots as File values is convenient and natural, but given a File(path = "node_modules") the only way I know of to obtain a File(path = "node_modules/somepkg") is to have the same rule generate them both as nested output artifacts.

from bazel.

tjgq commented on June 15, 2024

Hmm. Is that intentional change documented anywhere? I might just be missing it, but I don't see it mentioned anywhere in the v7.0 release notes.

Actually, it was flipped in Bazel 6.0.0 (commit fe16965) but seems to be missing from the 6.0.0 release notes. We've had this issue before with changes that are cherry-picked into a release branch after it has already been cut from the main branch; I'll see if I can get it fixed.

And also, is it possible to undo that change? I don't understand the motivation behind it, and nested output directories are extremely useful for working with certain language ecosystems.

No, sorry. The value of this feature doesn't justify its implementation cost; it broke an extremely useful invariant (that every output file is owned by exactly one declared artifact) whose absence was a constant source of bugs, and whose existence simplifies several subsystems (notably conflict checking and input prefetching).

For example, Javascript package managers produce a node_modules/ output directory with per-package subdirectories, and those packages might themselves contain sub-packages (for TypeScript declarations, etc). Representing those various directory pseudo-roots as File values is convenient and natural, but given a File(path = "node_modules") the only way I know of to obtain a File(path = "node_modules/somepkg") is to have the same rule generate them both as nested output artifacts.

Do the consumers of this rule receive the entire node_modules directory as an input, or individual modules?

If consumers always receive the entire node_modules, then there's no need to create nested artifacts. You can pass around the File for the entire node_modules, and construct paths to subdirectories (if needed for action arguments, etc) through string concatenation instead of by instantiating a File object.

If consumers receive only certain modules as inputs but not the rest, I'm afraid you'll have to split each module into its own action. (You might also want to look into how rules_js does it, as I'm not aware of them having been broken by this flag flip.)

from bazel.

tjgq commented on June 15, 2024

Actually, it was flipped in Bazel 6.0.0 (commit fe16965) but seems to be missing from the 6.0.0 release notes. We've had this issue before with changes that are cherry-picked into a release branch after it has already been cut from the main branch; I'll see if I can get it fixed.

Correction: a flip was initially attempted for 6.0.0, but rolled back. A second attempt took place for 7.0.0 (commit: 7bd0ab6) and that one has stuck. tl;dr: 7.0.0 is the first release where --incompatible_strict_conflict_checks defaults to true.

from bazel.

jmillikin commented on June 15, 2024

No, sorry. The value of this feature doesn't justify its implementation cost; it broke an extremely useful invariant (that every output file is owned by exactly one declared artifact) whose absence was a constant source of bugs, and whose existence simplifies several subsystems (notably conflict checking and input prefetching).

Do you think there would be any way to provide an equivalent Starlark API to rules authors, while maintaining that new invariant? It's frustrating that a feature present since Bazel v1.0 has been silently broken without providing a replacement.

I'm somewhat rusty on the Bazel internals at this point, but given the example rules fragment:

a = ctx.actions.declare_directory(name + "/a")
b = ctx.actions.declare_directory(name + "/a/b")
ctx.actions.run(outputs = [a, b], ...)

It seems that there is a path forward in which a is a File representing the full output directory artifact, and b is a File representing a scoped view into it. There would be a potential conflict if these were outputs of different actions, but they're from the same action. The merkle tree of ./a will always and by necessity include ./a/b as a node.

Alternatively, maybe a modified API like this would be acceptable?

a = ctx.actions.declare_directory(name + "/a")
b = ctx.actions.declare_directory("b", parent = a)
ctx.actions.run(outputs = [a], ...)
# b is a File that identifies output directory "{name}/a/b"

I'd prefer to maintain the old API if possible, but even a new API would be better than just being broken.

Note that the desire to identify subsets of a directory wouldn't be as necessary if Bazel supported marking files as non-symlinkable (#10299), because that would allow synthetic node_modules directories to be assembled as needed. But without that ability, the node_modules directory has to be assembled all at once in a single action, otherwise the Node runtime's habit of symlink peeking will bypass the Bazel output layout.

Do the consumers of this rule receive the entire node_modules directory as an input, or individual modules?

If consumers always receive the entire node_modules, then there's no need to create nested artifacts. You can pass around the File for the entire node_modules, and construct paths to subdirectories (if needed for action arguments, etc) through string concatenation instead of by instantiating a File object.

If consumers receive only certain modules as inputs but not the rest, I'm afraid you'll have to split each module into its own action. (You might also want to look into how rules_js does it, as I'm not aware of them having been broken by this flag flip.)

It depends on the consumer. Given a tree of js_library targets and a few js_binary targets, the libraries will receive only the trees of the packages they directly declare dependencies on. When it comes time to bundle the transitive sources into a "binary", the entire node_modules directory needs to be depended on.

This strategy is because the full node_modules directory might contain tens or hundreds of thousands of files, but the direct packages might only be 10-20 files. Having to assemble the entire node_modules tree is terribly slow, and minimizing the scope of depended-on files is important for build performance.

Public JS rulesets such as rules_js tend to be aimed at JS programmers trying to minimally integrate with Bazel, so their approach to sandboxing and repository rule hermeticity tends to be very different from the Bazel norm. I had to write my own rulesets for JS to avoid having a bunch of package.json and node_modules and other weird JS cruft in my repository.

from bazel.

tjgq commented on June 15, 2024

There would be a potential conflict if these were outputs of different actions, but they're from the same action. The merkle tree of ./a will always and by necessity include ./a/b as a node.

Yes, that's strictly better than the nested artifacts being potentially created by different actions, but the fact that an action can consume the inner artifact but not the outer one still creates (created) a lot of implementation complexity and potential for bugs in various places, because the outer artifact can be in a state of having been "half produced" (if e.g. only the inner artifact is fetched from a disk/remote cache). A different API doesn't do away with this issue.

Note that the desire to identify subsets of a directory wouldn't be as necessary if Bazel supported marking files as non-symlinkable (#10299), because that would allow synthetic node_modules directories to be assembled as needed. But without that ability, the node_modules directory has to be assembled all at once in a single action, otherwise the Node runtime's habit of symlink peeking will bypass the Bazel output layout.

I think the sentence "Node runtime's habit of symlink peeking" is doing a lot of work here, but you have to connect the dots for me because I'm not that familiar with Node.js :) Are you saying that, given a node_modules layout with no symlinks, and another where one of the subdirectories has been replaced with a symlink (but the contents behind the symlink are identical to the subdirectory in the former layout), Node.js behaves differently? So you always have to assemble the node_modules tree with the dependencies for a particular action by copying instead of symlinking?

from bazel.

jmillikin commented on June 15, 2024

Ah well, it sounds like there's no going back to the old capability, so I'll just ask that the change be made clearer in the release notes. And any similarly breaking changes in the future, since otherwise they come as a bit of a shock.

Regarding Node:

When a .js file contains import "some-module", the Node runtime will compute a set of directories to search for a file named some-module/index.js (etc). The set of directories is computed relative to the fully-resolved path of the importing file -- in Java terms, Node calls java.nio.file.Path.toRealPath() for every input file.

For Bazel, this has two consequences:

Source files (symlinked from the source tree) get their imports resolved relative to the source tree, not the sandbox. Importing generated files generally doesn't work, and it's very easy to accidentally depend on undeclared source files.
A node_modules directory containing multiple packages generally cannot be assembled from the outputs of multiple actions. For actions that resolve cross-package imports (such as bundlers, the JS equivalent of C++ linkers) that node_modules directory must contain all the transitive dependencies.
- This means that if you want to be able to subset the (often very large) node_modules directory, that needs to happen in the same action using overlapping declare_directory artifacts.

For my own use I built a Remote Execution executor that can apply Node-specific fixups (replacing symlinks with their targets), but the resulting projects can't be built by Bazel's local executor. My hope is that Bazel will eventually provide enough control over artifact layout and sandbox setup such that such projects can be built with plain vanilla Bazel.

from bazel.

Bazel 7 broke nested action directory outputs about bazel HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs