spdx / ntia-conformance-checker Goto Github PK
View Code? Open in Web Editor NEWCheck SPDX SBOM for NTIA minimum elements
License: Apache License 2.0
Check SPDX SBOM for NTIA minimum elements
License: Apache License 2.0
This is for the sake of consistency and to ease the mental load on current and future maintainers
import ntia_conformance_checker as ntia
sbom = ntia.SbomChecker("SBOM_filepath")
print(sbom.ntia_mininum_elements_compliant)
Adding this code block as an example with a little additional text should be sufficient.
Potential improvement: Use SPDX-ID instead of name in machine-readable output when listing components.
The problem? The code cannot list the names of components that have no name.
The solution: Use SPDX ID instead of names to list nonconformant components.
This is low priority and can potentially await a re-architecture in which the messages
list is no longer the central data structure of the codebase.
Add section that answers troubleshooting FAQs. For instance, see issue #49.
This would be a separate markdown document and linked from the main README.
@goneall, I noticed that SPDX Online Tools uses the default print functionality. IMO, it's a bit hard to grok as a user since it prints a long list of issues, rather than a more structured output.
Would you be interested if I created a print mode that was optimized for SPDX Online Tools? I could make some suggestions (or you could) in this issue and then I could try to implement it.
Note: Almost any suggestion will require some re-architecting of the codebase. But, TBH, that's on the docket anyway, so that's inevitable and doesn't need to be a constraint while we brainstorm. But once we decide on a new print mode (if we do), I'll also open a ticket about re-architecting and then I can kill two birds with one PR.
Should we add a "build passing" badge?
I noticed https://github.com/spdx/tools-golang has this badge and others.
Having the badge lets maintainers know if the build is broken without digging around in the Actions tab. Additionally, assuming the build is passing, provides confidence to potential users.
NTIA component identifiers check passes for the attached file (please remove .txt from it before running).
Is this SBOM NTIA minimum element conformant? False
Individual elements | Status
-------------------------------------------------------
All component names provided? | True
All component versions provided? | False
All component identifiers provided? | True
All component suppliers provided? | False
SBOM author name provided? | True
SBOM creation timestamp provided? | True
Dependency relationships provided? | True
The script expects the presence of unique SPDXID which is truly unique for all packages.
However, NTIA intent with Other unique identifier appears to be checking for PURL/CPE/SWID (or equivalent). From the NTIA doc - Other unique identifiers support automated efforts to map data across data uses and ecosystems and can reinforce certainty in instances of uncertainty. Examples of commonly used unique identifiers are Common Platform Enumeration (CPE),9 Software Identification (SWID) tags,10 and Package Uniform Resource Locators (PURL). 11 These other identifiers may not be available for every piece of software, but should be used if they exist.
With the CPE/PURL/SWID interpretation, only 8 out of 15 components have unique identifier. e.g:
ExternalRef: PACKAGE-MANAGER purl pkg:oci/busybox@sha256:f4ed5f2163110c26d42741fdc92bd1710e118aed4edb19212548e8ca4e5fca22?mediaType=application%2Fvnd.docker.distribution.manifest.list.v2+json&repository_url=index.docker.io%2Flibrary
but completely missing from the following package
PackageName: sha256:3d8a17fefa47b7be9e46147c5e670fb74d3de4a45889e307c5b7e85da5bee3d0
On this issue, sbomqs implementation differs from ntia-comformance-checker so I would like to get SPDX's interpretation for a consistent implementation.
PS: Thanks to @kestewart for pointing me to this tool
bom-alpine-3.15.spdx.txt
When I run this tool on an SPDX document created by Tern, I get a False
status for SBOM author name provided
field. My question is, what should this field be when a document is created by a tool? According to the spec, https://spdx.github.io/spdx-spec/v2.3/how-to-use/#k22-mapping-ntia-minimum-elements-to-spdx-fields, Author
maps to the Creator
field. In this case, the creator is a tool and the SBOM includes this information:
Creator: Tool: tern-2dd359916884b250e8b66d94c175506e387df07e
What is the tool looking for?
I don't think (but I could be wrong) that Click adds any functionality (for this app at this time) above and beyond what the Python standard library provides. To minimize dependencies, I propose removing Click.
This file seems to be a template file that does not contain functionality related to the project. I'd be glad to put in a PR that removes it.
Tool does not check for identifier instead check for supplier in check_component_indentifiers(). This can be corrected by checking spdx id instead.
It could be nice to:
--help
flag--file
as part of the command rather than interactively.In check_sbom_author, only a Person is valid.
I'm wondering if we should also allow Organizations.
@kestewart - do you know if the NTIA conformance guidance is specific on the creator being a Person?
The current setup.py resulted in a recent hiccup for spdx-online-tools
: spdx/spdx-online-tools#418
It could become a source of confusion and maintenance burden to have test documents for a particular test case that are divergent across formats. See PR #68 for an example of a PR that introduces this type of problem. The supplier
test documents, per the PR, are a good example.
It could be helpful to treat one format (say, JSON) as the source of truth and to have tooling that auto-generates the other formats for each test case.
If there is interest and willingness to have automated testing on pull requests, I could put in a PR that adds this capability via GitHub Actions.
@goneall and @anthonyharrison, I'm going to try cutting a new release on Thursday. Sound good?
Fingers crossed I don't set anything on fire. If so, I'll write a GitHub issue describing the issues I encountered and we can debug.
It could be helpful for machine consumers of this tool to have access to JSON output.
Is there any interest in this feature?
$ ntia-checker --help
zsh: command not found: ntia-checker
@anthonyharrison, I did the pip install
route in the README and the got this. What am I missing? Thank you!!
See spdx/spdx-online-tools#428 (comment)
The current printing mode is optimized for a terminal. But there should be a print mode optimized for SPDX-online-tools
This will require some investigation of SPDX-online-tools and its current UI for output.
Related to issue #28
In the course of examining a bug related to parsing, @goneall discovered that ntia-conformance-checker
is using a fork, not the upstream, of tools-python
.
This codebase should use the upstream to take advantage of ongoing improvements.
The only open question: Should this project switch to the upstream before the upstream accepts three commits from @linynjosh's fork? Or should the project switch to the upstream and, in the meantime, try to merge those changes? (Assuming those changes haven't been submitted and merged in the past sometime.)
If not, fix so it can.
I discovered that providing an input of a file that does not exist leads to a potentially confusing error message.
(ntia-conformance-checker) bash-3.2$ python3 checker.py
File name: nosuchfile.json
which returns:
['Document cannot be parsed.']
I would expect an error like "Document not found" rather than the "Document cannot be parsed." The "cannot be parsed" phrasing could unintentionally imply to the user that the document does exist.
Add a field for total number of components to JSON output. One observer pointed out to me that without information about the total number of components it is harder to evaluate whether there are "many" or "few" components with missing values. For instance, 20 components missing version info might seem like a lot, but not if there are 2000 components overall.
This should be a simple PR.
When I was running the tool I accidentally provided a file to it that did not exist (fat finger typo). The tool gave me a confusing UnboundLocalError
message that might be confusing for users not familiar with reading Python tracebacks. Suggestion for the tool to exit gracefully if a non-existent file is supplied by providing a more clear error message.
Currently:
(ternenv) rose@rose-vm:~/ternenv/ntia-conformance-checker/ntia_conformance_checker$ python3 main.py -v --file dne.spdx
ERROR:root:Filename dne.spdx not found.
ERROR:root:Document cannot be parsed: [Errno 2] No such file or directory: 'dne.spdx'
Traceback (most recent call last):
File "/home/rose/ternenv/ntia-conformance-checker/ntia_conformance_checker/main.py", line 51, in <module>
main() # pylint: disable=no-value-for-parameter
File "/home/rose/ternenv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/rose/ternenv/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/rose/ternenv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/rose/ternenv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/rose/ternenv/ntia-conformance-checker/ntia_conformance_checker/main.py", line 36, in main
sbom = sbom_checker.SbomChecker(file)
File "/home/rose/ternenv/ntia-conformance-checker/ntia_conformance_checker/sbom_checker.py", line 17, in __init__
self.doc = self.parse_file()
File "/home/rose/ternenv/ntia-conformance-checker/ntia_conformance_checker/sbom_checker.py", line 42, in parse_file
return doc
UnboundLocalError: local variable 'doc' referenced before assignment
Could be improved to something like:
(ternenv) rose@rose-vm:~/ternenv/ntia-conformance-checker/ntia_conformance_checker$ python3 main.py -v --file dne.spdx
Warning: file 'dne.spdx' not found.
To help with code quality, it could be helpful to add a pylint GH action. Info on pylint is here: https://github.com/PyCQA/pylint
This would likely involve some sprucing up of the codebase too, but probably wouldn't be too bad.
I'm glad to put in this PR.
@puerco mentioned to me an aspect of the NTIA minimum requirements document of which I was unaware:
Depth. An SBOM should contain all primary (top level) components, with all their transitive
dependencies listed. At a minimum, all top-level dependencies must be listed with enough
detail to seek out the transitive dependencies recursively.
The question: Should ntia-conformance-checker
attempt to account for this "depth" requirement? If so, how?
For some technical documentation how this could be done: https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md
Adding the Python Software Foundation's black formatter could be a way to simplify development.
Info here: https://github.com/psf/black
I'm glad to put in a PR.
On PR #41, no test coverage showed up in PR. Huh?
Add explanation to README
A package supplier can be defined as an Organistion or Person. It can also be defined as NOASSERTION see SPDX Specification.
It appears that if a PackageSupplier tag exists, this is sufficient to pass the 'are all package details provided' even if the supplier is marked as NOASSERTION. This doesn't seem correct.
Is pyproject.toml all that is needed?
Need to investigate.
See PR #74.
@anthonyharrison notes:
i don't think you need Pipfile and Pipfile.lock as they are related to pipenv. (the lock file should be autogenerated - pipenv lock
I'll plan on doing releases every two weeks (or at least try that for a month or two) unless anyone objects.
I'm going to make a new release of the SPDX Online Tools in the next week or so - I'm going to include an updated NTIA Conformance Checker.
Let me know if there are any additional pull requests or issues we should resolve before updating the online tools.
Bandit is a static analysis security tool for Python: https://bandit.readthedocs.io/en/latest/
Adding it to CI can help us know, fix, and prevent some security issues at relatively low cost.
I'm glad to put in a PR.
Please use exit code 0 on success and 1 on error, not -1 which is system-dependent and many systems only support unsigned values.
Fix all use of sys.exit(-1)
to sys.exit(1)
NTIA version check passes for the attached file (please remove .txt from it before running).
However, a typical version information field is empty:
"versionInfo": "",
The root cause appears to be the check here is missing an empty string check (or even stricter check for semver
or derivative).
PS: Ignore these messages from the output
'{'packageVerificationCodeValue': ''}' is not a valid value for PKG_VERIF_CODE_FIELD
This is a known issue with bom filed here -
kubernetes-sigs/bom#230
we are tracking that and other known issues with formats here if you are curious to follow along - interlynk-io/sbomqs#39
See issue #52 for the relevant background.
(ntia-conformance-checker) (base) ricardo@MB cli_tools % python checker.py
File name: /Users/ricardo/_git/spdx_sboms/us-demo-org-2_react.spdx
['Document cannot be parsed.']
Is there a way to produce more details on the error why the document cannot be parsed?
I'm referencing a standard SPDX file format. I have presented absolute path to the source file, and I have even moved the file inside the cli_tools/ directory where checker.py is located.
(ntia-conformance-checker) (base) ricardo@MC cli_tools % python checker.py
File name: us-demo-org-2_react.spdx
['Document cannot be parsed.']
Thanks.
I think signing PyPI releases with Sigstore is possible.
If possible with argparse
, add a version flag and perhaps show the commit hash of the git commit associated with the source from which that version was built.
Will need to investigate.
This would help submitters and reviewers know whether the PR provides test coverage for any new code.
In tandem with #37, it's time, IMO, for a re-architecture. Fortunately, this codebase is only ~250 lines, so I actually don't think it will be that painful. Let me explain the current architecture, the motivation for changing this architecture, and my proposed new architecture.
The Current Architecture: A Conveyor Belt
The codebase currently uses a messages
list data structure that holds all messages to the user about the minimum elements checks. I compare it to a conveyor belt because all the messages are in a line, one after the other, and the codebase simply adds new messages to the messages
conveyor belt. This is a simple architecture, which is an important point in its favor, but I think the codebase has outgrown this data structure.
Why Change?
Because a conveyor belt is great for picking up your luggage due to the simplicity of the operation (wait for your particular piece or pieces of luggage), but it's not great for presenting structure to a user. In particular, the conveyor belt approach is why it's hard to quickly re-architect the print functionality to make a print functionality optimized for the online-tools web app. To make this work, one has to write parsing code that grabs lots of elements from the messages
data structure and then re-arranges them. It's also why the JSON output depends oon convoluted (and brittle) parsing code.
So, TL;DR: The current messages
data structure requires after-the-fact parsing in order to present output to the user in any form other than a long list.
The Case for a Singleton Architecture
A little bit of object orientation could go a long way in this codebase. In particular, I propose a SBOM
class that would be created each time the tool in invoked and that would hold all the data (in a structured way) that is now put on the messages
conveyor belt. But instead of one long line of messages, there would be properties specifically for each check. This way, when a programmer wants to write a print functionality, the programmer simply needs that object, and not complicated parsing functionality that dissects the messages
list.
@goneall, sound good? @linynjosh, feel free to weigh in too!
Because ntia-conformance-checker
no longer relies on a pre-release of tools-python
, the --pre
flag can be removed from the GitHub CI automation.
It took me a couple of minutes to find checker.py
. It could speed up the time it takes for a user to understand how to use this cool tool as a command line tool if there were some usage information on the README. I'm glad to put in a draft PR if anyone thinks this would be useful.
Initial thoughts:
Other thoughts, ideas, welcome.
I am trying to install the conformance checker tool according to the directions in the README but hit the following ModuleNotFound
error:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/__init__.py", line 27, in <module>
from . import urllib3
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/__init__.py", line 8, in <module>
from .connectionpool import (
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/connectionpool.py", line 35, in <module>
from .connection import (
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/connection.py", line 54, in <module>
from ._collections import HTTPHeaderDict
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/_collections.py", line 2, in <module>
from collections import Mapping, MutableMapping
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/__init__.py)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/pipenv", line 33, in <module>
sys.exit(load_entry_point('pipenv==11.9.0', 'console_scripts', 'pipenv')())
File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pipenv/cli.py", line 347, in install
from .import core
File "/usr/lib/python3/dist-packages/pipenv/core.py", line 21, in <module>
import requests
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/__init__.py", line 62, in <module>
from .packages.urllib3.exceptions import DependencyWarning
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/__init__.py", line 29, in <module>
import urllib3
ModuleNotFoundError: No module named 'urllib3'
urllib3 was already installed so I tried to upgrade it but still get the same error
(ternenv) rose@rose-vm:~/ternenv/ntia-conformance-checker$ pip install urllib3 --upgrade
Requirement already satisfied: urllib3 in /home/rose/ternenv/lib/python3.10/site-packages (1.26.9)
Collecting urllib3
Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 kB 2.4 MB/s eta 0:00:00
Installing collected packages: urllib3
Attempting uninstall: urllib3
Found existing installation: urllib3 1.26.9
Uninstalling urllib3-1.26.9:
Successfully uninstalled urllib3-1.26.9
Successfully installed urllib3-1.26.14
Would be very useful if the tool operated with a quiet option and just returned a value 0 (conformant) or -1 (non-conformant). This would then allow the tool to be easily added to a CI/CD pipeline.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.