GithubHelp home page GithubHelp logo

Comments (4)

mc-alt avatar mc-alt commented on September 22, 2024 1

Thanks for the response

I ended up creating a script that

  • parses the selected package
  • walks its dependencies (including packages from the workspace, and their dependencies etc.)
  • creates a new temp package which represents those dependencies, but flattened
  • copies in the pnpm lock file from the top level of the monorepo
  • performs a pnpm install (which is quite fast because it's just setting up links, and the copy of the lock file makes it use the already established versions)
  • generates the SBOM

Hacky script here:
https://gist.github.com/mc-alt/b0c27dd7621b3ea2f984b43a619877c2

This seems to work for us

Note: this would not be performant if not for the way pnpm's cache and linking approach works

Unfortunately I've been pulled on to other things, but I will try and find time to prepare a public example repository for a setup like ours

from syft.

kzantow avatar kzantow commented on September 22, 2024

Hi, @mc-alt, there are a few things to mention here, so let me start by suggesting a few options with what is available today: are you able to scan the subdirectories directly? If you wanted separate SBOMs, I'd think just scanning like syft project-root/packages/sub-package-1 could do the trick. If you tried this, I suspect the challenge you ran into is since this is a directory scan it doesn't pick up any package.json information by default; you'd need to enable the javascript-package-cataloger (by using the flag --select-catalogers +javascript-package-cataloger). This isn't perfect, though, as Syft won't do transitive dependencies, only read what it finds on the filesystem -- this is one reason Syft prefers lock files, but what I understand about this setup is that the lock file only exists as the top-level pnpm-lock.yaml so it wouldn't be read when scanning a subdirectory and syft wouldn't necessarily know how to determine which packages to exclude anyway. If, however, you had all the appropriate dependencies in the node_modules installed, these would show up as you expect using the javascript-package-cataloger with another caveat that it will also include build-time dependencies that are downloaded into node_modules. I don't claim to have a lot of familiarity with PNPM; does this option get you close to having something usable?

I could definitely see some sort of enhancements we could implement -- namely looking outside the requested directory to attempt to find some additional pnpm-lock.yaml, node_modules, or other pertinent files. But we haven't done a lot of this and it's a little unclear to me if this should be the default behavior -- in other words: if I scan a directory, did I mean to treat it as a directory or as part of a larger workspace? Another option to explore is to add some sort of --workspace or similar flag that can be used by catalogers that have knowledge of workspaces. PNPM certainly isn't the only one that does something like this and perhaps we can find some commonality across different package managers. The last thing I'd note is that, we've also had some investigation shelling out to tools (such as mvn dependency:tree or a similar pnpm call), but would very much like Syft to avoid doing this as much as possible.

That said, would you be able to provide some public repo(s) with a similar setup that we could have a look at?

from syft.

wagoodman avatar wagoodman commented on September 22, 2024

@mc-alt glad you are figured with your script!

I think the interesting thing to take out of this is that there may be something missing in the syft ecosystem in terms of "scanning 1 thing and generating N many SBOMs", which is outside of the scope of syft, but may be hinting at a separate tool that wraps syft. This is similar to (but not the same as) #562 .

The new use case highlight here is "what is the prescription for using syft in a mono repo setting?". This probably warrants some discussion.

from syft.

kzantow avatar kzantow commented on September 22, 2024

Another example of something that a user might want to perform a similar scan is a maven multi-module project, where a subdirectory contains something like a deployable web application and a user wants to include parent and sibling directories to properly resolve modules and parent poms with relative paths.

This seems to boil down to separating the set of files included in the source from the target directory to catalog. Today, for example, a user running a directory scan uses: syft my/dir and syft indexes, and scans everything within that directory only. If there was a way to specify a different directory to scan while retaining the larger set of files for context it could be possible do accomplish what's asked for here, with some work in the catalogers to follow relative links. For example: syft /some/root/path --only-catalog sub/dir or syft /some/root/path/sub/dir --root /some/root/path to select a subset of files the cataloging functions when an alternate root is provided.

It seems there may be a path forward for this, but certainly more investigation is needed.

from syft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.