GithubHelp home page GithubHelp logo

Comments (7)

Shnatsel avatar Shnatsel commented on May 31, 2024

Specifically, we need to understand:

  1. Does anyone actually use those SBOM formats?
  2. Are any of those formats a good fit for storing our data - perhaps we won't have to invent a custom format after all?

from cargo-auditable.

tofay avatar tofay commented on May 31, 2024

Dumping my notes on formats and SPDX here.


Suggested requirements for data format.

  1. Needs to be able to convey Rust crate runtime and build dependencies.
  2. Needs to be extensible to adding extra information we may want to add in the future, e.g statically linked C libraries, or build tool versions such as rustc?
  3. Needs to be easily interoperable with other tools. Parsable in Rust and other languages (in particular go as used by syft/trivy SCA tools). Needs to be easy for tools to correlate with vulnerability dbs (e.g Rustsec)

Trivy creator asked Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)
Discussion loosely points to SBOM formats being more appropriate as a data format than package identification formats (SWID/PURL). In particular SBOM formats allow expressing the nature of relationships (e.g build/runtime dependency).

It was suggested on zulip that SPDX is likeliest SBOM format to reach wider adoption given it's backing by OpenSSF and industry.

There's currently no standardized way to embed SPDX SBOMs into binaries - Embedding SPDX into binaries · Issue #739 · spdx/spdx-spec (github.com).

Some concerns over embedding SPDX SBOMs are:

  • Size, as SBOMs can be very large with e.g license information. It's not clear that’s required for the vulnerability use case, as for SPDX SBOMs NOASSERTION could be used as the value for various license fields (or the SPDX identifier instead of full license text). The SBOM could be compressed prior to embedding (ELF supports native compression too of sections too, unsure about PE/Mach-O)
  • Impact on reproducibility. SPDX format includes creation timestamps. If the binary is represented in SPDX SBOM as a File then it'd need to have a SHA1 checksum, which wouldn't be accurate. This could be mitigated by representing the binary as a (Root?) Package of the SBOM, and not including file information for the binary itself.

An example representing a binary as a SPDX File looks like


{
  "spdxVersion": "SPDX-2.2",
  "dataLicense": "CC0-1.0",
  "SPDXID": "SPDXRef-DOCUMENT",
  "name": "baz.spdx.json",
  "documentNamespace": "https://foo.bar/",
  "creationInfo": {
    "created": "2022-08-01T18:44:38Z",
    "creators": [
      "Tool: cargo-spdx 0.1.0"
    ]
  },
  "packages": [
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/[email protected]",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "bar",
      "SPDXID": "SPDXRef-bar-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/[email protected]",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "baz",
      "SPDXID": "SPDXRef-baz-0.1.0",
      "versionInfo": "0.1.0"
    },
    {
      "copyrightText": "NOASSERTION",
      "downloadLocation": "NOASSERTION",
      "externalRefs": [
        {
          "referenceCategory": "PACKAGE_MANAGER",
          "referenceLocator": "pkg:cargo/[email protected]",
          "referenceType": "purl"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "NOASSERTION",
      "name": "foo",
      "SPDXID": "SPDXRef-foo-0.1.0",
      "versionInfo": "0.1.0"
    }
  ],
  "files": [
    {
      "checksums": [
        {
          "algorithm": "SHA1",
          "checksumValue": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
        }
      ],
      "copyrightText": "NOASSERTION",
      "fileName": "baz",
      "fileTypes": [
        "BINARY"
      ],
      "licenseConcluded": "NOASSERTION",
      "SPDXID": "SPDXRef-File-baz"
    }
  ],
  "relationships": [
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "GENERATED_FROM",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-bar-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-baz-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    },
    {
      "relatedSpdxElement": "SPDXRef-foo-0.1.0",
      "relationshipType": "DEPENDS_ON",
      "spdxElementId": "SPDXRef-File-baz"
    }
  ]
}

Rust support for SPDX SBOM format:

  • doubleopen-project/spdx-rs: SPDX Documents in Rust (github.com) exists for serializing/deserializing

  • cargo-spdx has some serialization support for SPDX. We'd want to share SPDX support with that

  • (There's a JSON schema for SPDX, so should be straightforward to generate serde representations should existing ones not be suitable)

More questions to consider regarding use of SPDX in cargo-auditable:

Does it actually make it easier to use the embedded data?

  • Considering both Rust tooling (cargo audit) and external tools (go-rustaudit/syft)

Is it worth using a different format at all without a resolution to
Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)

  • Win from that would be interoperability. If we used SPDX format but in a non-standardized section header then we'd still have to teach SCA tools to look in that location.
  • The existing format is conveys similar information to Cargo.lock. An advantage to this is that SCA tools are generally capable of reading Cargo.lock files, so the existing format is likely to be easy to integrate with SCA tools existing Rust support (and this was the case when integrating with syft). Unclear whether that would apply to non-crate information (e.g rustc version/statically linked C libraries)

from cargo-auditable.

tofay avatar tofay commented on May 31, 2024

Re "does anyone actually use these format", both trivy and grype (the vulnerability scanning tool that works with/uses syft) are capable of reading SBOMs in multiple formats, e.g SPDX/cyclonedx.

If there was a standardized section name for embedding SBOMs then cargo-auditable could use that and these tools could be updated to detect that. And without section name standardization, cargo-auditable could use SPDX, and go-rustaudit could extract the SBOM and expose the JSON for these tools to parse with their existing parsers.

from cargo-auditable.

orangecms avatar orangecms commented on May 31, 2024

Hi, I just heard from you on the Rustacean Station podcast - really cool stuff here! :-)

I've been thinking, talking and exchanging about this whole topic here for a while now, so let me add some references:

When I asked who else would be interested in the topic, I was invited to the CycloneDX Slack, where people discuss the entire SBoM topic very broadly. Maybe that's also for you. :-)

Finally, I am quite involved in the oreboot firmware project, where I'm seeking to introduce SBoM as well, likely based on CycloneDX, for which there is also a Rust implementation.

That shall be it for now; feel free to poke back at me should you have any further questions etc.. 🥳

from cargo-auditable.

Shnatsel avatar Shnatsel commented on May 31, 2024

Thanks for the links! Having SBOMs in firmware would certainly be cool!

So far I've found everything not specifically designed for inclusion into binaries unsuitable, for two reasons:

  1. Inclusion of dates messes up reproducible builds
  2. The formats are very verbose and/or require including lots of information that is not relevant for the purposes of a security audit, increasing the binary size considerably.

I'm looking to talk to some people who have worked on the SBOM embedded in Go binaries by default. They also rolled their own JSON-based format, and perhaps we could collaborate on something more generic or at least that could be shared between the two.

FWIW Syft can already convert from the cargo auditable data format to CycloneDX.

from cargo-auditable.

jayvdb avatar jayvdb commented on May 31, 2024

https://github.com/google/osv-scanner supports "SPDX and CycloneDX SBOMs using Package URLs" - https://google.github.io/osv-scanner/usage/#specify-sbom

As an alternative/pre-cursor for storing the dependency info in those SBOM formats, perhaps rust-audit-info could extract the existing format and do a "rough" conversion to these SBOM formats, so that integration with these other tools can be explored, determining what (if any) extra fields need to be stored in the rust binaries in order to get reasonable compatibility with these tools.

from cargo-auditable.

Shnatsel avatar Shnatsel commented on May 31, 2024

Syft can already perform such a conversion today.

from cargo-auditable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.