rust-secure-code / cargo-auditable Goto Github PK
View Code? Open in Web Editor NEWMake production Rust binaries auditable
License: Apache License 2.0
Make production Rust binaries auditable
License: Apache License 2.0
There are multiple examples in the docs that are marked ```rust,ignore
because they require other crates that are normally not in the dependency tree. We should investigate whether adding extra dependencies in doctest mode only is possible.
Steps to reproduce:
rustup target add x86_64-unknown-linux-musl
AUDITABLE_TEST_TARGET=x86_64-unknown-linux-musl cargo test --all-features
The test test_cargo_auditable_workspaces
fails:
---- test_cargo_auditable_workspaces stdout ----
Test fixture binary map: {"binary_and_cdylib_crate": ["/home/shnatsel/Code/cargo-auditable/cargo-auditable/tests/fixtures/workspace/target/x86_64-unknown-linux-musl/debug/binary_and_cdylib_crate"], "crate_with_features": ["/home/shnatsel/Code/cargo-auditable/cargo-auditable/tests/fixtures/workspace/target/x86_64-unknown-linux-musl/debug/crate_with_features_bin"]}
/home/shnatsel/Code/cargo-auditable/cargo-auditable/tests/fixtures/workspace/target/x86_64-unknown-linux-musl/debug/binary_and_cdylib_crate dependency info: VersionInfo { packages: [Package { name: "binary_and_cdylib_crate", version: Version { major: 0, minor: 1, patch: 0 }, source: "local", kind: Runtime, dependencies: [1], features: [] }, Package { name: "library_crate", version: Version { major: 0, minor: 1, patch: 0 }, source: "local", kind: Runtime, dependencies: [], features: [] }] }
thread 'test_cargo_auditable_workspaces' panicked at 'assertion failed: crate_with_features_bins.len() == 2', cargo-auditable/tests/it.rs:149:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Right now the source
field in auditable_serde::Package is a String
. It should be made into a #[non_exhaustive]
enum instead.
Cargo.lock is not a stable format and is not expected to become stable. We need to store dependency information in a format we can eventually stabilize.
JSON isomorphic to Cargo.lock with some fields redacted sounds like a good start. Some discussion on the format can be found in the RFC: rust-lang/rfcs#2801
We now put the dependency data in a separate link section. Parsing the executable format should be sufficient to locate the start and end of the dependency data.
Having a rustc wrapper defined (like build caching with RUSTC_WRAPPER=sccache
) makes build fail:
$ export RUSTC_WRAPPER=sccache
$ cargo auditable build --release
error: failed to run `rustc` to learn about target-specific information
Caused by:
process didn't exit successfully: `/home/amousset/.cargo/bin/sccache /home/amousset/.cargo/bin/cargo-auditable rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 2)
--- stderr
sccache: error: failed to execute compile
sccache: caused by: Compiler not supported: "Unrecognized command: \"-E\"\n"
Right now it is possible to represent cyclic dependencies in the data format. For example:
{"packages":[
{"name":"foo","version":"0.3.1","source":"local","dependencies":[1]},
{"name":"bar","version":"0.2.1","source":"crates.io","dependencies":[0]}
]}
But Cargo does not actually allow cyclic dependencies.
This quirk of the format requires any consumer of this data to check for cyclic dependencies first, or risk an infinite loop during traversal.
It would be nice to make cyclic dependencies impossible to represent, for example:
{"packages":[{"name":"foo","version":"0.3.1","source":"local"},{"name":"bar","version":"0.2.1","source":"crates.io"}]}
{"dependencies":{"0": [{"1":{[]}]}}
Or something along those lines. The idea is to encode the relationships in a JSON tree that is guaranteed to be acyclic.
Dependency information will not be present in the final binary, unless the data returned by auditable::inject_dependency_list!()
is actually used somewhere in the binary (printed, put into test::black_box
, etc).
This is actually a bug in rustc: rust-lang/rust#47384
https://blog.rust-lang.org/2022/06/22/sparse-registry-testing.html discusses a sparse registry feature. This may impact the reported source URL for crates coming from crates.io; we need to check if our detection of crates.io still works with sparse registry.
Right now rust-audit requires running a separate executable to get the Cargo.lock
information. It'd be really useful to surface the Cargo.lock
information to the running program itself - even as just a &'static str
so that it can be reported dynamically.
I am specifically thinking about some production code I own that returns build info through a /buildinfo
HTTP GET endpoint. It'd be awesome if I could return the whole Cargo.lock
through something like /buildinfo/dependencies
Relevant lines in CI log: https://github.com/Shnatsel/rust-audit/runs/7619314273?check_suite_focus=true#step:6:149
It appears that Cargo creates many more binary artifacts than our tests expect - namely the .pdb
, .lib
and .exp
files on top of the .dll
and .exe
that we expect.
cc @tofay who wrote the code for parsing Cargo output
Right now all dependencies are stored together. We could split out build dependencies and store them separately.
A common case where this distinction would be useful is RUSTSEC-2018-0006: this is used by the current version of clap
as a build dependency and poses no security risk in that context; however, its uses as a runtime dependency would be problematic.
We need to keep information about build dependencies because a bug in a code generator such as protobuf or cap'n proto only included as a build dependency may still pose a security risk at runtime.
I was recently thinking about what it would take to integrate something like this into a cargo install
process. The biggest issue I see is that it requires modifying the binary sources to have the data added. I think a potentially more useful approach is a way to inject this data into an arbitrary binary build; maybe via something like a cargo wrapper cargo auditable build
.
This would also avoid issues #9, #11 and #13 (but probably introduce others 😀).
Is there a reason to prefer the current approach where each binary needs to be configured to include the data?
The section name we use for Linux cannot be reused as-is:
mach-o section specifier requires a segment and section separated by a comma.
https://github.com/bnjbvr/cargo-machete implements configuration via [package.metadata.cargo-machete]
, we should consider using this as a configuration mechanism for cargo-auditable
.
auditable::version_info()
returning &str
instead of &[u8]
would be much more ergonomic.
Problem is, we need to store inline data in a variable instead of a pointer, and it has to be sized. AFAIK there's no statically-sized version of str
, that's why we currently use [u8; weird_length_calculation()]
.
So either we need to store the version info twice - once as bytes and once as &str
- or use unsafe
to convert from without doing the full scan for non-UTF-8 characters on every call to auditable::version_info()
, but it's only sound if we can statically ensure that our slice is UTF-8. Using include_str!
to verify UTF-8 compliance would leave us open to time-of-check/time-of-use attacks.
It would be nice to verify that the recovered information is indeed read correctly by cargo auditable
and/or the underlying rustsec
crate, and that it does indeed report vulnerable versions when they're present.
There is a test advisory specifically for this purpose: https://github.com/rustsec/advisory-db/blob/main/crates/rustsec-example-crate/RUSTSEC-2019-0024.md
Could be possible to fix if we parse the Cargo.lock
in more detail and doctor it.
This would automatically not be an issue for an implementation of the idea within Cargo.
auditable
crate currently adds ~30 seconds to compilation time due to the dependencies on syn
and serde
.
Since serde-json is what Cargo itself uses, we have to stick to it for this to be a reasonably faithful implementation of the Cargo RFC. This issue is going to disappear once this is functionality is upstreamed into Cargo.
This stems from two issues:
build.rs
when running tests, sothe dependency file doesn't exist and the environment variable with a path to it is not setcargo test
, the test
attribute is both set and not set at the same time so we cannot even inject a dummy value (I tried)Demo of Schrödinger's cfg attribute: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4538ce9eba5d72975dda87d696a717ea (make sure you select the "test" action, not "run")
Rust defaults to the x86_64-pc-windows-msvc
platform on Windows, but both it and x86_64-pc-windows-gnu
are Tier 1. We should test cargo auditable
on both.
Cargo has made it possible to depend on the same version of a given crate with different feature sets, provided that one version is a runtime dependency and another is a build dependency.
The dependency resolution in rust-audit was written prior to that change, and it's possible that auditable-serde
collates these two packages.
The deduplication is done on the package ID from cargo-metadata, and we'll need to double-check that this is in fact correct even in the presence of the new Cargo feature resolver:
We should use std::path::MAIN_SEPARATOR
instead of hardcoded /
After using the json-to-toml
example and feeding the data to cargo-audit, it reports that it has succeeded and found no vulnerabilities. However, in practice the presence of vulnerabilities is not reported.
For example, RUSTSEC-2021-0003 is not reported when the bundled hello-world sample depends on a vulnerable SmallVec verison.
Since the dependency extraction logic is in build.rs of a dependency crate, it is not re-run whenever the toplevel crate is recompiled. This may lead to the embedded dependency info being stale compared to the actual state of affairs.
The external-injection
branch implements a Cargo subcommand to inject the audit data without requiring a build script and avoiding several issues associated with that.
It currently assumes that only one binary artifact is being built, and handles cases where several binaries or an entire workspace is being built very poorly. Cargo doesn't make the information about which binary artifacts are being built readily available to subcommands.
According to @tofay:
You can get info on binaries built in an external tool, but you have to enable cargo json messages and then parse them (https://doc.rust-lang.org/cargo/reference/external-tools.html#artifact-messages). we have a wrapper at work that does this to do some post-processing of binaries, and that's the approach I've proposed for cargo-spdx at alilleybrinker/cargo-spdx#9
This will probably have to be implemented before cargo auditable
can be actually used in the wild for arbitrary projects.
Dependency info is highly compressible. We should utilize that fact to reduce the binary size overhead.
Hey I tried
git clone https://github.com/Shnatsel/rust-audit.git
cd rust-audit
cargo build --release
and got:
error[E0004]: non-exhaustive patterns: `SectionIsMissing(_)` and `UnexpectedSectionType { .. }` not covered
--> auditable-extract/src/lib.rs:96:15
|
96 | match e {
| ^ patterns `SectionIsMissing(_)` and `UnexpectedSectionType { .. }` not covered
|
::: /home/harry/.cargo/git/checkouts/binfarce-e74c3427e3f3ff61/32b9eed/src/error.rs:5:5
|
5 | SectionIsMissing(&'static str),
| ---------------- not covered
6 | UnexpectedSectionType { expected: u32, actual: u32 },
| --------------------- not covered
|
= help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
= note: the matched value is of type `ParseError`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0004`.
error: could not compile `auditable-extract`
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed
This means you can't identify if crates are from crates-io or an alternative registry, which may result in false positives in subsequent scanning.
Perhaps crates-io could be special cased, so that the URL is included? That would at least allow differentiation between crates-io and alternative registries. (I'm assuming rust-audit
doesn't include the source from cargo_metadata directly for privacy reasons e.g leaking URLs/ git repo structure)
Right now we have the full extraction pipeline in examples, which is not super complicated but is nevertheless manual.
rust-audit-info
shows how it's all tied together; we should just put that into a function and make that a crate.
I have a crate, mbot for which auditable-build
successfully runs and this all works for on Stable, but for which auditable-build
panics when building on Beta.
Some relevant command output:
jmn@neogreen:~/Projects/mprojects/mbot$ RUST_BACKTRACE=1 cargo +beta run --release --features auditable-data
Compiling mbot v0.1.0 (/home/jmn/Projects/mprojects/mbot)
error: failed to run custom build command for `mbot v0.1.0 (/home/jmn/Projects/mprojects/mbot)`
Caused by:
process didn't exit successfully: `/home/jmn/Projects/mprojects/mbot/target/release/build/mbot-35641d5a1f06aff7/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=data/maddie.json
--- stderr
thread 'main' panicked at 'no entry found for key', cargo/registry/src/github.com-1ecc6299db9ec823/auditable-serde-0.1.0/src/lib.rs:307:51
stack backtrace:
0: rust_begin_unwind
at /rustc/9f0e6fa94be6f97c736e51811d7b58904edfa8cb/library/std/src/panicking.rs:475
1: core::panicking::panic_fmt
at /rustc/9f0e6fa94be6f97c736e51811d7b58904edfa8cb/library/core/src/panicking.rs:85
2: core::option::expect_failed
at /rustc/9f0e6fa94be6f97c736e51811d7b58904edfa8cb/library/core/src/option.rs:1213
3: core::option::Option<T>::expect
4: <std::collections::hash::map::HashMap<K,V,S> as core::ops::index::Index<&Q>>::index
5: <auditable_serde::VersionInfo as core::convert::TryFrom<&cargo_metadata::Metadata>>::try_from
6: auditable_build::collect_dependency_list
7: build_script_build::main
8: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
RISC-V is the only architecture for which the data exposed by the object
crate is not sufficient, and information from the compiler internals is needed. That's why it is currently stubbed out:
cargo-auditable/cargo-auditable/src/object_file.rs
Lines 119 to 126 in a90fb30
We need to find some way to deal with this.
It is technically possible to support WebAssembly, since they do allow custom sections: https://webassembly.github.io/spec/core/appendix/custom.html
This may be useful since the overhead of the audit info is just a few kilobytes and WebAssembly is being applied not just for the web.
I'm not sure how to include_bytes!
with a platform-agnostic path separator, so it would insert /
or \
depending on the host OS.
This works but is not portable: include_bytes!(concat!(env!("OUT_DIR"), "/myfile"));
This doesn't work: include_bytes!(concat!(env!("OUT_DIR"), std::path::MAIN_SEPARATOR, "myfile"));
I've tried googling this but nothing comes up
Now that the dependency info is part of the executable, we need to ensure that it allows for reproducible builds. We need to ensure that the package version info we embed is deterministic.
One potential source of non-determinism is the fact that we store data as ordered collections, but don't enforce any particular order.
Apparently there is a number of formats designed to encode package info already: https://gitbom.dev/glossary/sbom/
We need to check if any of them are suitable for our use case. Notably we redact some field such as git repo URLs, and also include information about enabled features, so it might not be 100% compatible.
Also, the degree of adoption of these formats needs to be understood; perhaps we should provide conversion utilities, even if we don't end up using the format internally.
Right now we include the data equivalent to the contents of Cargo.lock, which lists all dependencies declared in the workspace. Some of those dependencies may be disabled via features; including them may result in false positives in whatever tooling uses this data.
Windows-only dependencies are currently included in a Linux build and vice versa. We should filter deps by platform.
Hi, nice to see all the progress on the "injection" approach. On msys2 i see that
cargo build --release --target x86_64-pc-windows-gnu
works fine, but
cargo auditable build --release --target x86_64-pc-windows-gnu
fails with
error[E0463]: can't find crate for `std`
|
= note: the `x86_64-pc-windows-gnu` target may not be installed
= help: consider downloading the target with `rustup target add x86_64-pc-windows-gnu`
error: cannot find macro `println` in this scope
--> src\main.rs:2:5
|
2 | println!("Hello, world!");
| ^^^^^^^
error: requires `sized` lang_item
For more information about this error, try `rustc --explain E0463`.
I tried to create a minimal reproducible example here: https://github.com/niklasf/cargo-auditable-issue/runs/7665968148?check_suite_focus=true
cargo-audit
already has decent developer penetration, so having another audit
crate is confusing. Could I maybe suggest the name rust-traceable
instead?
https://crates.io/crates/build-info
It exists and does something similar. Perhaps some of the concerns can be offloaded to it, or perhaps they've got some cool techniques I couldn't come up with.
Right now cargo auditable
just has .unwrap()
all over the place. That's fine for a prototype, but we'll need proper error handling to show nice error messages.
We should use anyhow
crate for error handling, because that's what Cargo already uses, and it will make upstreaming simpler.
auditable
crate and replace then with cargo auditable
cargo auditable
packageRight now auditable
relies on the assumption that the target directory is somewhere below the directory containing Cargo.lock. This is true by default, with the build happening in target/
, but the user can override it via CARGO_TARGET_DIR
and violate this assumption.
The problem here is that Cargo does not support any kind of cross-crate communication via build.rs. Even though we can run literally arbitrary code from build.rs, we don't know where the code of the other crate is located. Specifically:
auditable
because by the time its build.rs runs, auditable
is already compiledauditable
has no clean way to know what is the toplevel crate when it's in the dependency treeAs I see it, we have to either:
CARGO_TARGET_DIR
is overridden. We can provide an env variable to explicitly point to the appropriate Cargo.lock as a workaround. This is in the spirit of making easy things easy and hard things possible.This issue will no longer exist once this mechanism is moved to Cargo, since Cargo has full knowledge of the Cargo.lock of the toplevel crate.
We need to call include_bytes! with a platform-agnostic path separator, so it would insert /
or \
depending on the host OS.
The best way so far is #[cfg(unix)]
and #[cfg(windows)]
which is what we currently use. But these are set depending on the target platform, not host platform. There are no cfg options for host platform.
Building on Mac without passing linker flags doesn't preserve the audit data. The Mac ld
doesn't support the --undefined
flag.
There is a documented flag -u
that seems to do what we need it to, but trying to actually use it results in the following error:
ld: unknown option: -u AUDITABLE_VERSION_INFO
The -u
flag is documented in the manpage, so it's weird to see the linker reject it.
Both -u AUDITABLE_VERSION_INFO
(with the space) and -uAUDITABLE_VERSION_INFO
(without the space) fail.
I only have a Linux machine to test on. We should write end-to-end tests and run them in CI on a variety of platforms.
The current format doesn't have information to determine direct or transitive dependence. In the following example, we can know ansi_term
depends on bitflags
, but the project may also directly depend on bitflags
.
{
"packages": [
{
"name": "ansi_term",
"version": "0.12.1",
"source": "crates.io",
"dependencies": [
1
]
},
{
"name": "bitflags",
"version": "1.2.1",
"source": "crates.io",
"features": [
"default"
]
}
}
}
For example, package-lock.json
has ""
so that we can know direct dependencies.
https://github.com/firebase/functions-samples/blob/3515b7f38a3c598cdb20152a263372e81719ecda/package-lock.json#L7-L17
auditable
cargo-lock
crate for parsing itA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.