GithubHelp home page GithubHelp logo

marctjones / bomsh Goto Github PK

View Code? Open in Web Editor NEW

This project forked from omnibor/bomsh

0.0 0.0 0.0 895 KB

bomsh is collection of tools to explore the OmniBOR idea

License: Apache License 2.0

Shell 0.10% Python 77.63% Perl 2.35% C 19.74% Java 0.01% Makefile 0.06% Dockerfile 0.11%

bomsh's Introduction

bomsh

Table of Contents

Overview

Bomsh is collection of tools to explore the OmniBOR idea. It includes the below tools:

Bombash: a BASH-based shell to generate OmniBOR artifact trees for software.

Bomtrace/Bomtrace2/Bomtrace3: a STRACE-based tool to generate OmniBOR artifact trees for software.

Note: Bombash and Bomtrace are deprecated, only Bomtrace2 and Bomtrace3 are actively developing and supported.

Multiple Python scripts are developed to work together with these tools.

  • bomsh_hook.py script, invoked by Bombash and Bomtrace for each shell command during software build.
  • bomsh_hook2.py script, invoked by Bomtrace2 for each shell command during software build.
  • bomsh_create_bom.py script, which processes the bomsh_hook_raw_logfile.sha1 generated by bomsh_hook2.py script and creates OmniBOR docs.
  • bomsh_create_bom_java.py script, which scans java build workspace and creates OmniBOR docs for generated .jar package files.
  • bomsh_create_cve.py script, which scans git repo and creates CVE database.
  • bomsh_search_cve.py script, which creates OmniBOR artifact tree for a binary file, and searches the CVE database and attaches the CVE search result to the OmniBOR artifact tree.
  • bomsh_pstree.py script, which analyzes strace logfile and creates various pstree files and indented strace logfile.
  • debrebuild script, which is a slight modification of the Debian debrebuild.pl script, to fix a few issues and support a new --srctardir option.
  • bomsh_rebuild_deb.py script, which reproducibly rebuilds Debian packages from its buildinfo, and generates its OmniBOR documents.
  • bomsh_rebuild_rpm.py script, which rebuilds RPM packages from its source RPM, and generates its OmniBOR documents.
  • bomsh_index_debrepo.py script, which creates a blob index database for source packages of Debian/Ubuntu repositories.
  • bomsh_index_yocto.py script, which creates a blob index database for source packages of OpenEmbedded/Yocto.
  • bomsh_index_ws.py script, which creates a blob index database for software build workspace.
  • bomsh_sbom.py script, which creates or updates SPDX SBOM documents with OmniBOR info.
  • bomsh_spdx_rpm.py script, which creates or updates SPDX SBOM documents for RPMs built from its src RPM.
  • bomsh_spdx_deb.py script, which creates or updates SPDX SBOM documents for DEBs built from its src.
  • bomsh_spdx_image.py script, which creates SPDX SBOM documents for software product images.
  • bomsh_spdx_image_docker.py script, which creates SPDX SBOM documents and CVE vulnerability reports for software product images.
  • bomsh_art_tree.py script, which grafts new subtrees or prunes existing subtrees of OmniBOR artifact trees.
  • bomsh_dynlib.py script, which creates raw_logfile of runtime-dependency fragments for ELF executables.
  • bomsh_pylib.py script, which creates raw_logfile of runtime-dependency fragments for Python scripts.

Quick Start

For a quick start of using the Bomsh tool, run the below command:

$ git clone URL-of-this-git-repo bomsh
$ wget https://vault.centos.org/8-stream/AppStream/Source/SPackages/sysstat-11.7.3-7.el8.src.rpm
$ bomsh/scripts/bomsh_rebuild_rpm.py -c alma+epel-8-x86_64 --docker_image_base almalinux:8 -s sysstat-11.7.3-7.el8.src.rpm -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir --syft_sbom --bomsh_spdx --mock_option="--no-bootstrap-image --define 'packager BOMSH user $(id -un) at $(hostname)'"
$ grep -B1 -A3 CVElist outdir/bomsher_out/bomsh_logfiles/bomsh_search_jsonfile-details.json
$
$ # if mock is < 5.0 version, then the above "--mock_option=--no-bootstrap-image" command option may not be needed
$ wget https://vault.centos.org/8-stream/AppStream/Source/SPackages/sysstat-11.7.3-9.el8.src.rpm
$ bomsh/scripts/bomsh_rebuild_rpm.py -c alma+epel-8-x86_64 --docker_image_base almalinux:8 -s sysstat-11.7.3-9.el8.src.rpm -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir3 --syft_sbom --bomsh_spdx
$ grep -B1 -A3 CVElist outdir3/bomsher_out/bomsh_logfiles/bomsh_search_jsonfile-details.json
$
$ # the above should take only a few minutes, and the below may take tens of minutes
$ wget https://buildinfos.debian.net/buildinfo-pool/s/sysstat/sysstat_11.7.3-1_all-amd64-source.buildinfo
$ bomsh/scripts/bomsh_rebuild_deb.py -f sysstat_11.7.3-1_all-amd64-source.buildinfo -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir2 --syft_sbom --bomsh_spdx --mmdebstrap_no_cleanup
$ grep -B1 -A3 CVElist outdir2/bomsher_out/bomsh_logfiles/bomsh_search_jsonfile-details.json

Then explore and inspect all the output files in the outdir/bomsher_out directory, especially the outdir/bomsher_out/bomsh_logfiles/bomsh_hook_raw_logfile.sha1 file, which contains the list of build commands with details of output/input files that are recorded by the Bomsh tool. The omnibor_dir/metadata/bomsh/* files contain useful metadata collected by Bomsh. Also the bomsh_logfiles/bomsh_search_jsonfile* files contain the constructed OmniBOR tree with relevant metadata for the built RPM/DEB packages, the bomsh_logfiles/bomsh-index-* files contain the relevant package/blobs database, the syft_sbom/omnibor* files contain the syft-generated SPDX SBOM documents with ExternalRef OmniBOR identifier, and the bomsh_sbom/* files contain the SPDX SBOM documents with ExternalRef OmniBOR identifier generated by the bomsh_spdx_rpm.py or bomsh_spdx_deb.py script.

Compile Bombash and Bomtrace from Source

The Bombash tool is based on BASH, and Bomtrace/Bomtrace2/Bomtrace3 is based on STRACE. The corresponding patch files are stored in the patches directory. To compile Bombash/Bomtrace2/Bomtrace3 from source, do the following steps:

$ git clone URL-of-this-git-repo bomsh
$ git clone https://git.savannah.gnu.org/git/bash.git
$ # or github repo # git clone https://github.com/bminor/bash.git
$ cd bash ; patch -p1 < ../bomsh/.devcontainer/patches/bombash.patch
$ ./configure ; make ; cp ./bash ../bomsh/bin/bombash
$ cd ..
$ git clone https://github.com/strace/strace.git
$ cd strace ; patch -p1 < ../bomsh/.devcontainer/patches/bomtrace2.patch
$ ./bootstrap ; ./configure ; make
$ # if configure fails, try add --disable-mpers or --enable-mpers=check
$ cp src/strace ../bomsh/bin/bomtrace2
$ cd ..
$ git clone https://github.com/strace/strace.git strace3
$ cd strace3 ; patch -p1 < ../bomsh/.devcontainer/patches/bomtrace3.patch
$ cp ../bomsh/.devcontainer/src/*.[hc] src/
$ ./bootstrap ; ./configure ; make
$ # if configure fails, try add --disable-mpers or --enable-mpers=check
$ cp src/strace ../bomsh/bin/bomtrace3

To automatically create the bomtrace2/bomtrace3 binaries run:

$ git clone URL-of-this-git-repo bomsh
$ cd bomsh
$ docker run -it --rm -v ${PWD}:/out $(cd .devcontainer && docker build -q .)

And you will find the bomtrace2 and bomtrace3 files have been copied into '.' on your host.

Generating OmniBOR Docs with Bomtrace3

Bomtrace3 is version 3 of Bomtrace. Bomtrace3 improves performance over Bomtrace2, and has more command options/features. Do the following to generate OmniBOR docs for the HelloWorld program with Bomtrace3.

$ git clone URL-of-this-git-repo bomsh
$ cd bomsh/src
$ ../bin/bomtrace3 make
$ ../scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b /tmp/bomdir
$ ls -tl /tmp/bomdir/objects /tmp/bomdir/metadata/bomsh
$ cat /tmp/bomsh_hook_raw_logfile.sha1
$ cat /tmp/bomsh_createbom_jsonfile

Bomtrace3 does not need to work together with the bomsh_hook2.py script to record necessary raw info. Bomtrace3 has integrated the functionality of the bomsh_hook2.py script by rewritting it in C code. Except for this difference, all other steps to generate OmniBOR documents are the same.

By running with all C code instead of invoking Python scripts, Bomtrace3 saves a lot of process context switches overhead, thus improving the performance significantly over Bomtrace2. Bomtrace2 is a few (2x to 5x) times slower than the baseline run, while Bomtrace3 has only about 20% runtime overhead.

Generating OmniBOR Docs with Bomtrace2

Bomtrace2 is version 2 of Bomtrace. Bomtrace2 improves performance over Bomtrace, has more command options/features, is more flexible and extensible, more tested for bugfixes, and more preferred to use than Bomtrace. Do the following to generate OmniBOR docs for the HelloWorld program with Bomtrace2.

$ git clone URL-of-this-git-repo bomsh
$ cd bomsh
$ cp scripts/bomsh_hook2.py /tmp
$ cd src
$ ../bin/bomtrace2 make
$ ../scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b /tmp/bomdir
$ ls -tl /tmp/bomdir/objects /tmp/bomdir/metadata/bomsh
$ cat /tmp/bomsh_hook_raw_logfile.sha1
$ cat /tmp/bomsh_createbom_jsonfile

Bomtrace2 works together with the new bomsh_hook2.py script, which records only necessary raw info. The raw info contains ADFs (Artifact Dependency Fragments) for generated artifacts. Each artifact dependency fragment contains an output file and a list of input files and their GITOIDs, as well as some metadata like build_cmd, build_tool, etc. The raw info is recorded in /tmp/bomsh_hook_raw_logfile.sha1, which only contains the SHA1 checksums of the parsed input/output files of the shell commands. After software build is done, a new bomsh_create_bom.py script is run to read the raw_logfile and do the hash-tree generation, as well as OmniBOR doc creation and metadata collection/aggregation. The generated hash-tree database is saved in /tmp/bomsh_createbom_jsonfile. The original bomsh_hook.py is essentially divided into two scripts: bomsh_hook2.py + bomsh_create_bom.py The two new scripts should generate the exact same OmniBOR artifact database /tmp/bomsh_createbom_jsonfile as the /tmp/bomsh_hook_jsonfile of the old bomsh_hook.py script.

The below is a sample run of C HelloWorld program compilation, and the generated bomsh_hook_raw_logfile.sha1 recorded by Bomtrace2:

[root@000b478b5d68 src]# pwd
/home/bomsh/src
[root@000b478b5d68 src]# ../bin/bomtrace2 make
gcc -c -o hello.o hello.c
gcc -o hello hello.o
[root@000b478b5d68 src]# more /tmp/bomsh_hook_raw_logfile.sha1

outfile: 6c7744ecf42790fb8073d0e822eb0a2b9b7c39e7 path: /home/bomsh/src/hello.o
infile: 29039dc7dd32210e38e949fcf483ec8ce6f7a054 path: /home/bomsh/src/hello.c
infile: c2ab78a2d4c20711295a501c61dd038bfa029934 path: /usr/include/stdc-predef.h
infile: 739e08610d54f341cf14247ec38f254e1520e5b1 path: /usr/include/stdio.h
infile: b4a429b83c345681b269bdee0785363f3d2c1f3c path: /usr/include/bits/libc-header-start.h
infile: 5bed0a499605a3a26d55443f3c8b7e67de152f74 path: /usr/include/features.h
infile: 3f6fe3cc8563b49311327647fad53eb18d94da2c path: /usr/include/sys/cdefs.h
infile: 70f652bca14d65c1de5a21669e7c0ffb8ecfe5ea path: /usr/include/bits/wordsize.h
infile: 28488e0b05954ccf87c779f5f9258987e4d68ac5 path: /usr/include/bits/long-double.h
infile: 70a1ba017357d3111cc510e73b269541ca2aaf09 path: /usr/include/gnu/stubs.h
infile: 477c8e4931c0d7191187acb42f0ed4255e3619aa path: /usr/include/gnu/stubs-64.h
infile: 31b96a7e5e17f8da4cb8e6262869f643eddbd477 path: /usr/lib/gcc/x86_64-redhat-linux/8/include/stddef.h
infile: e4c73fd23a271b0b452cece0212ff244d2b55d48 path: /usr/lib/gcc/x86_64-redhat-linux/8/include/stdarg.h
infile: 64f344c6e7897491c7c7430f52ad06c61fa85dad path: /usr/include/bits/types.h
infile: e6f7481a19cbc7857dbbfebef5adbeeaf80a70b8 path: /usr/include/bits/typesizes.h
infile: bb04576651b9097b3027e4299cc30c88f334535f path: /usr/include/bits/types/__fpos_t.h
infile: 1d8a4e28d1b62a2bfeba837fe18422cd106e6ddf path: /usr/include/bits/types/__mbstate_t.h
infile: 06a6891154fff74e1ddb6245f4a0467b09c617c5 path: /usr/include/bits/types/__fpos64_t.h
infile: 06dd79bc831bb06a6267a36ad2d62beccd7900b2 path: /usr/include/bits/types/__FILE.h
infile: f2682632090ba3e7f2caa1736394cbb235ceab0c path: /usr/include/bits/types/FILE.h
infile: 359f94945346c9eb4f92d1551e5e1a6d63a63dfb path: /usr/include/bits/types/struct_FILE.h
infile: 1be90e6fab4ab9b7dd3b27cea5bb1fe29acc0204 path: /usr/include/bits/stdio_lim.h
infile: 4f725e95ffa2663083b66a557b12751261cbcf05 path: /usr/include/bits/sys_errlist.h
build_cmd: gcc -c -o hello.o hello.c
==== End of raw info for this process


outfile: dfad3d1a11801f146a94b2ad50024945b82efef6 path: /home/bomsh/src/hello
infile: ff3b4838fba28e31dedd3703f4337107e6bc3ac0 path: /usr/libexec/gcc/x86_64-redhat-linux/8/liblto_plugin.so
infile: fc3bd83b45151f219d7efeac952c567ddb9f86d0 path: /lib64/ld-linux-x86-64.so.2
infile: 596b81d3834f6f7f3aa888e09885505539c2f5ad path: /usr/lib64/crt1.o
infile: 232fd2c41d204d23899069fc89e6516aab57421b path: /usr/lib64/crti.o
infile: df02dffda2dc9c8a306829c31b540348165a3b92 path: /usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o
infile: 6c7744ecf42790fb8073d0e822eb0a2b9b7c39e7 path: /home/bomsh/src/hello.o
infile: e4af8bf4f89bdb8bb6a890d8a9f07dce5c638138 path: /usr/lib/gcc/x86_64-redhat-linux/8/crtend.o
infile: 3d5810339f0b219eb80dfa7cbd8883c3ef944351 path: /usr/lib64/crtn.o
build_cmd: /usr/bin/ld -plugin /usr/libexec/gcc/x86_64-redhat-linux/8/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper -plugin-opt=-fresolution=/tmp/ccPRtOFM.res -plug
in-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --no-add-needed --eh-frame-hdr
 --hash-style=gnu -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/
crti.o /usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o -L/usr/lib/gcc/x86_64-redhat-linux/8 -L/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x8
6_64-redhat-linux/8/../../.. hello.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/8/crtend.o /usr/lib/gcc/x86_64-redhat-linu
x/8/../../../../lib64/crtn.o
==== End of raw info for this process


outfile: dfad3d1a11801f146a94b2ad50024945b82efef6 path: /home/bomsh/src/hello
infile: 6c7744ecf42790fb8073d0e822eb0a2b9b7c39e7 path: /home/bomsh/src/hello.o
build_cmd: gcc -o hello hello.o
==== End of raw info for this process

[root@000b478b5d68 src]#

A new [-w watched_programs_file] option is added for bomtrace2 so that only commands for a limited set of programs are recorded. Make sure that this program list covers all the watched programs in bomsh_hook2.py script.

A list of pre-exec mode only programs can also be provided in the same watched_programs_file. This pre-exec mode only list is provided in the same watched_programs_file after the list of watched programs, separated by an exact line of "---". Make sure that this pre-exec program list covers all the pre-exec watched programs in bomsh_hook2.py script too.

Also an aditional list of programs can be detached immediately upon execve syscall for perfomance benefits. This detach list is provided in the same watched_programs_file after the list of watched or pre-exec programs, separated by an exact line of "===". A good use case for this detach program list is the configure command or testing command for software build.

Also an aditional list of programs can be configured as umbrella programs to improve performance. When configured, only the child processes of the umbrella programs are recorded to run the hookup script. This can improve performance for a lot of scenarios, for example, in the Debian reproducible build, we are only interested in the dpkg-buildpackage process, and in the mock build environment, we are only interested in the rpmbuild process. This umbrella list is provided in the same watched_programs_file after the list of watched, pre-exec, or detach programs, separated by an exact line of "+++".

An example watched_programs file has been provided in the bin/bomtrace_watched_programs file. Note that empty lines or lines starting with '#' character are ignored, so you can add comments in your watched_programs file. If there is no -w option, then it is the same behavior as before, recording all commands by default.

A new bomsh_pstree.py script has been added to analyze the strace log file and create various pstree JSON files. It also generates a new strace logfile with appropriate indentations. This can help you determine which programs to detach and which programs for umbrella programs.

Here are the new commands to generate OmniBOR docs with the -w option.

$ git clone URL-of-this-git-repo bomsh
$ cd bomsh
$ cp scripts/bomsh_hook2.py /tmp
$ cd src
$ ../bin/bomtrace2 -w ../bin/bomtrace_watched_programs make
$ ../scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b /tmp/bomdir
$ ls -tl /tmp/bomdir
$ cat /tmp/bomsh_createbom_jsonfile

You can customize this bomtrace_watched_programs for your own software, to further improve the performance. The generated /tmp/bomsh_createbom_jsonfile should be the same as the old /tmp/bomsh_hook_jsonfile, which is used by the scripts/bomsh_search_cve.py script.

A new -c option is added for Bomtrace2 to read some configurations from a config file. Five options are now supported: hook_script_file, hook_script_cmdopt, shell_cmd_file, logfile, and syscalls. Especially with the hook_script_cmdopt parameter, now we will be able to run the hook script with various different options more conveniently. A sample Bomtrace config file is provided in bin/bomtrace.conf file.

The bomsh_create_bom.py script takes care of OmniBOR docs creation when reading and processing bomsh_hook_raw_logfile.sha1 file. Here is the adopted algorithm to create OmniBOR docs in the bomsh_create_bom.py script:

  • if there are multiple input files, then a new OmniBOR doc is generated, and the 20-byte OmniBOR ID is embedded in the output file if necessary (by default for cc/ld output only);
  • if there is only one input file (unary transformation), then the same OmniBOR id of the input file is reused by the output file.

The -b option of bomsh_create_bom.py script specifies the OmniBOR repo directory to store all the OmniBOR docs and metadata. If user does not specify the -b option, then $PWD/.omnibor directory is used as the default location. The OmniBOR docs are stored in .omnibor/objects/ directory, while bomsh tool's metadata is stored in .omnibor/metadata/bomsh/ directory. The bomsh tool's metadata includes:

  • bomsh_hook_raw_logfile, the raw info recorded by bomsh_hook2.py script for software build
  • bomsh_omnibor_treedb, the hash-tree JSON format file with metadata, created by bomsh_create_bom.py script
  • bomsh_omnibor_doc_mapping, the file-githash to its OmniBOR doc ID mapping file, for all output files generated during software build

A new --embed_bom_after_commands is added for bomsh_hook2.py script to allow user to select a list of commands to automatically insert an embedded .note.omnibor ELF section into the compiled binary files during software build. The embedding of the .note.omnibor ELF section is done transparently to the software build, so you don't need to modify your build Makefiles at all. And user can conveniently select where in the middle of the build process to perform this .note.omnibor section insertion. To find the appropriate place to do this, user can inspect the generated bomsh_hook_raw_logfile or bomsh_hook_trace_logfile to see the list of shell commands and their execution order in the sequence of build process, and figure out where to do it the best.

Note that this automatic .note.omnibor section embedding impacts performance, since it needs to build the hash-tree and generate OmniBOR docs. From our experiments, the performance impact is less than 10%, and the runtime increase is linear to the number of bom-id-embedding operations.

A new --lseek_lines_file option is added for bomsh_create_bom.py script to avoid duplicate reading/processing of bomsh_hook_raw_logfile if a previous run has already read/processed some lines of bomsh_hook_raw_logfile. This code optimization is done specifically for the above scenario of automatic .note.omnibor section embedding during RPM/Debian packaging. User should not need this --lseek_lines_file option for regular use scenarios.

The bomsh_create_bom.py script also supports creating OmniBOR docs for RPM/DEB packages via the -p option of the bomsh_create_bom.py script. For example, to do it for hostname RPM package, here is the workflow:

$ git clone URL-of-this-git-repo bomsh
$ cp bomsh/scripts/bomsh_hook2.py bomsh/scripts/bomsh_create_bom.py /tmp
$ dnf download hostname --source
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_* /tmp/bomsh_createbom_*
$ bomsh/bin/bomtrace2 -w bomsh/bin/bomtrace_watched_programs rpmbuild --rebuild hostname-3.20-6.el8.src.rpm
$ bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -p /root/rpmbuild/RPMS/x86_64/hostname-3.20-6.el8.x86_64.rpm
$ cat /tmp/bomsh_hook_raw_logfile.sha1 /tmp/bomsh_createbom_jsonfile
$ echo "{}" > hostname_cvedb.json
$ bomsh/scripts/bomsh_search_cve.py -vv -b .omnibor -d hostname_cvedb.json -f /root/rpmbuild/RPMS/x86_64/hostname-3.20-6.el8.x86_64.rpm
$ cat /tmp/bomsh_search_jsonfile-details.json

Note that rpm2cpio and cpio are used to unbundle RPM package, and dpkg-deb is used to unbundle DEB package, so make sure they are installed.

With latest bomtrace2/bomsh_hook2.py script, we have enabled automatic .note.omnibor section embedding into ELF binary files by default for compilers/linkers (cc/gcc/clang/ld, etc.) and eu-strip (elfutils strip) program. The eu-strip program is known to strip the .note.omnibor ELF section while GNU strip does not, so we must perform bom-id re-insertion for eu-strip program.

To disable this auto-bom-id-embedding, user must provide -n option to the bomsh_hook2.py script, which requires the use of "-c bomtrace.conf" option when running bomtrace2. Therefore, user can still choose to do bom-id embedding into binary files at any build steps they prefer, with the "-c bomtrace.conf" option. For example, to build the hostname RPM package with embedded .note.omnibor ELF section in the hostname binary at the last build step only, here is the workflow:

$ git clone URL-of-this-git-repo bomsh
$ cp bomsh/scripts/bomsh_hook2.py bomsh/scripts/bomsh_create_bom.py /tmp
$ dnf download hostname --source
$ sed -i "s|hook_script_cmdopt=-vv > |hook_script_cmdopt=-vv -n --embed_bom_after_commands /usr/lib/rpm/sepdebugcrcfix > |" bomsh/bin/bomtrace.conf
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_* /tmp/bomsh_createbom_*
$ bomsh/bin/bomtrace2 -c bomsh/bin/bomtrace.conf -w bomsh/bin/bomtrace_watched_programs rpmbuild --rebuild hostname-3.20-6.el8.src.rpm
$ bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -p /root/rpmbuild/RPMS/x86_64/hostname-3.20-6.el8.x86_64.rpm
$ cat /tmp/bomsh_hook_raw_logfile.sha1 /tmp/bomsh_createbom_jsonfile
$ echo "{}" > hostname_cvedb.json
$ bomsh/scripts/bomsh_search_cve.py -vv -b .omnibor -d hostname_cvedb.json -f /root/rpmbuild/RPMS/x86_64/hostname-3.20-6.el8.x86_64.rpm
$ cat /tmp/bomsh_search_jsonfile-details.json

If you just want to capture all the build commands for your software build, you can do similar steps with the "bomtrace2 -c bomtrace.conf make" command. Then you check the generated /tmp/bomsh_hook_trace_logfile for a list of recorded shell commands.

Generating OmniBOR ADGs for Debian or RPM Packages with Bomtrace2

Bomsh implements the symlink farm feature to persist the artifact ID to OmniBOR ADG (Artifact Dependency Graph) doc mapping. For the output file of each build step, a symlink file is created in the .omnibor/symlinks/ directory, with the gitoid of the output file as file name. This symlink file points to the associated OmniBOR ADG doc in the .omnibor/objects/ directory. With this symlink farm, the A2G (Artifact-ID to OmniBOR Graph) mappings are persisted in the file system. Given a binary file, user can compute its gitoid, then look up its associated ADG document in the .omnibor/symlinks/ directory. This makes ADG document lookup of artifacts much easier for users.

For Debian package or RPM package build, bomtrace2 will also automatically create a hello.deb.omnibor_adg.sha1, hello.deb.omnibor_adg.sha256, hello.rpm.omnibor_adg.sha1, or hello.rpm.omnibor_adg.sha256 symlink for each built hello.deb or hello.rpm package file for user convenience. It will also create a symlink in the default .omnibor/pkgs/ directory, pointing to the associated ADG (Artifact Dependency Graph) document. This helps user quickly find the associated OmniBOR bom-id for the built packages, and makes publishing OmniBOR documents easy.

Package maintainers can create tarballs for the associated OmniBOR ADG documents, and put it next to the Debian/RPM package in the same repo for easy access by users. Or a new field like Omnibor-Bomid can be added to the generated hostname_3.23_amd64.buildinfo file, which can help user find the associated OmniBOR ADG documents elsewhere.

Here is an example hostname_3.23_amd64.buildinfo file with the new proposed Omnibor-Bomid-Sha1 and Omnibor-Bomid-Sha256 fields added:

Omnibor-Bomid-Sha1:
 8be5ef9c8d4db58cdd0404814b1233cf696b0200 hostname-dbgsym_3.23_amd64.deb
 361aa0bb79fe277af3c299304057acebda0a58f5 hostname_3.23_amd64.deb
Omnibor-Bomid-Sha256:
 f5c5526c06000b7e0826416df537df1dcd2a0a2ae3499d5192ab250c2eb49f14 hostname-dbgsym_3.23_amd64.deb
 c241440dc405a3dd0e943cae09f4b706ee9cd955010d568ab38f2ec5446a1565 hostname_3.23_amd64.deb

If you specify the "-n" option to not embed bom-id by Bomsh, and compile a build-reproducible Debian package, then you get the OmniBOR ADG documents that other people can reproduce the build and verify. This gives us REPRODUCIBLE OmniBOR documents!

Therefore, it is feasible to generate the OmniBOR ADG documents for all the officially released build-reproducible Debian packages. An official repo to host the standard OmniBOR documents for build-reproducible packages will become possible for the software industry. This will be a great advantage for build-reproducible Debian packages.

Here is an example of the hostname Debian package build, reproduced with the debrebuild.py script. Two different people should be able reproduce the same build, generate the bit-for-bit identical hostname Debian packages, as well as the same OmniBOR documents.

root@60cb7fac1537:/home/repro-deb/myreprodir# rm -rf .omnibor  /tmp/bomsh_hook_*
root@60cb7fac1537:/home/repro-deb/myreprodir# /tmp/bomtrace2 -c /tmp/bomtrace.conf -w /tmp/bomtrace_watched_programs /home/repro-deb/debrebuild/debrebuild.py --output ./art3 --verbose --builder=mmdebstrap ../buildinfo-dir/hostname_3.23_amd64.buildinfo

I: creating tarball...
I: done
I: removing tempdir /tmp/mmdebstrap.lH261h4YdA...
I: success in 262.4173 seconds
md5: hostname-dbgsym_3.23_amd64.deb: OK
md5: hostname_3.23_amd64.deb: OK
sha1: hostname-dbgsym_3.23_amd64.deb: OK
sha1: hostname_3.23_amd64.deb: OK
sha256: hostname-dbgsym_3.23_amd64.deb: OK
sha256: hostname_3.23_amd64.deb: OK
Checksums: OK

root@60cb7fac1537:/home/repro-deb/myreprodir# ls -tl art3
-rw-------. 1 root root  1216 Oct 17 05:12 summary.out
-rw-r--r--. 1 1000 1000  1483 Oct 17 05:12 hostname_3.23_amd64.changes
-rw-r--r--. 1 1000 1000  4445 Oct 17 05:12 hostname_3.23_amd64.buildinfo
lrwxrwxrwx. 1 root root    95 Oct 17 05:12 hostname-dbgsym_3.23_amd64.deb.omnibor_adg -> ../../../../home/repro-deb/myreprodir/.omnibor/objects/8b/e5ef9c8d4db58cdd0404814b1233cf696b0200
-rw-r--r--. 1 1000 1000 13312 Oct 17 05:12 hostname-dbgsym_3.23_amd64.deb
lrwxrwxrwx. 1 root root    95 Oct 17 05:12 hostname_3.23_amd64.deb.omnibor_adg -> ../../../../home/repro-deb/myreprodir/.omnibor/objects/36/1aa0bb79fe277af3c299304057acebda0a58f5
-rw-r--r--. 1 1000 1000 14948 Oct 17 05:12 hostname_3.23_amd64.deb
drwxr-xr-x. 3 1000 1000  4096 Oct 17 05:12 hostname-3.23
root@60cb7fac1537:/home/repro-deb/myreprodir# ls -tl .omnibor/pkgs
lrwxrwxrwx. 1 root root 52 Oct 17 05:12 hostname-dbgsym_3.23_amd64.deb.omnibor_adg -> ../objects/8b/e5ef9c8d4db58cdd0404814b1233cf696b0200
lrwxrwxrwx. 1 root root 52 Oct 17 05:12 hostname_3.23_amd64.deb.omnibor_adg -> ../objects/36/1aa0bb79fe277af3c299304057acebda0a58f5
root@60cb7fac1537:/home/repro-deb/myreprodir# ls -tl .omnibor/*/*
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/pkgs/hostname-dbgsym_3.23_amd64.deb.omnibor_adg -> ../objects/8b/e5ef9c8d4db58cdd0404814b1233cf696b0200
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/90d3e30024df8a52566b1edef657256b5fb5e49e -> ../objects/8b/e5ef9c8d4db58cdd0404814b1233cf696b0200
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/pkgs/hostname_3.23_amd64.deb.omnibor_adg -> ../objects/36/1aa0bb79fe277af3c299304057acebda0a58f5
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/cf2f6247f73a4b13c3363c31a003b7bcb610af86 -> ../objects/36/1aa0bb79fe277af3c299304057acebda0a58f5
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/cd6933dd663ca099e4c7758bdb22b5cbcd5478d4 -> ../objects/6b/d38ff5092af9f0cac209f6f1466c4376694de1
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/8cf556f1bc2b80d9b14324b6bd3a102b50761248 -> ../objects/6b/d38ff5092af9f0cac209f6f1466c4376694de1
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/6f41ea69396ab9a32077d42f0e0e6864b04922b2 -> ../objects/6b/d38ff5092af9f0cac209f6f1466c4376694de1
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/ef79f940d64dd1e9677d6cb8b01d09caa9e00e31 -> ../objects/6b/d38ff5092af9f0cac209f6f1466c4376694de1
lrwxrwxrwx. 1 root root   52 Oct 17 05:12 .omnibor/symlinks/6581e00f013b687e9a457f6375b65ff19c276b95 -> ../objects/2a/709f0c4f59298557b2ae4fbfb703aab5ccb1e2

.omnibor/objects/8b:
-rw-------. 1 root root 183 Oct 17 05:12 e5ef9c8d4db58cdd0404814b1233cf696b0200

.omnibor/objects/36:
-rw-------. 1 root root 321 Oct 17 05:12 1aa0bb79fe277af3c299304057acebda0a58f5

.omnibor/objects/6b:
-rw-------. 1 root root 321 Oct 17 05:12 d38ff5092af9f0cac209f6f1466c4376694de1

.omnibor/objects/2a:
-rw-------. 1 root root 5152 Oct 17 05:12 709f0c4f59298557b2ae4fbfb703aab5ccb1e2
root@60cb7fac1537:/home/repro-deb/myreprodir#

Here is an example of the sysstat RPM package build, with the mock build tool:

[root@e8281323a4d6 rpm-src-dir]# rm -rf .omnibor/  /tmp/bomsh_hook_*
[root@e8281323a4d6 rpm-src-dir]# /tmp/bomtrace2 -c /tmp/bomtrace.conf -w /tmp/bomtrace_watched_programs mock -r /etc/mock/almalinux-8-x86_64.cfg rebuild sysstat-11.7.3-6.el8.src.rpm

Wrote: /builddir/build/RPMS/sysstat-11.7.3-6.el8.x86_64.rpm
Wrote: /builddir/build/RPMS/sysstat-debugsource-11.7.3-6.el8.x86_64.rpm
Wrote: /builddir/build/RPMS/sysstat-debuginfo-11.7.3-6.el8.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.17qvvY
+ umask 022
+ cd /builddir/build/BUILD
+ cd sysstat-11.7.3
+ /usr/bin/rm -rf /builddir/build/BUILDROOT/sysstat-11.7.3-6.el8.x86_64
+ exit 0
Finish: rpmbuild sysstat-11.7.3-6.el8.src.rpm
Finish: build phase for sysstat-11.7.3-6.el8.src.rpm
INFO: Done(sysstat-11.7.3-6.el8.src.rpm) Config(almalinux-8-x86_64) 1 minutes 11 seconds
INFO: Results and/or logs in: /var/lib/mock/almalinux-8-x86_64/result
Finish: run

[root@e8281323a4d6 rpm-src-dir]# ls -tl .omnibor/pkgs/
lrwxrwxrwx. 1 root root 52 Oct 17 05:28 sysstat-debuginfo-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../objects/da/5f9988fd02603ed0f24ca11f011f7601acb266
lrwxrwxrwx. 1 root root 52 Oct 17 05:28 sysstat-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../objects/9e/9267a9d1902d7b34d5f3c56be592d92bb057e7
lrwxrwxrwx. 1 root root 52 Oct 17 05:28 sysstat-debugsource-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../objects/7d/acbbfd2022c5322059fcca00c7e462ce38ba3c
lrwxrwxrwx. 1 root root 52 Oct 17 05:27 sysstat-11.7.3-6.el8.src.rpm.omnibor_adg -> ../objects/59/13a0db7e8d5ee18b00f24603f45844537f7d7a
[root@e8281323a4d6 rpm-src-dir]# ls -tl /var/lib/mock/almalinux-8-x86_64/root/builddir/build/RPMS/
lrwxrwxrwx. 1 root root     98 Oct 17 05:28 sysstat-debuginfo-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../../../../../../../../home/rpm-src-dir/.omnibor/objects/da/5f9988fd02603ed0f24ca11f011f7601acb266
lrwxrwxrwx. 1 root root     98 Oct 17 05:28 sysstat-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../../../../../../../../home/rpm-src-dir/.omnibor/objects/9e/9267a9d1902d7b34d5f3c56be592d92bb057e7
lrwxrwxrwx. 1 root root     98 Oct 17 05:28 sysstat-debugsource-11.7.3-6.el8.x86_64.rpm.omnibor_adg -> ../../../../../../../../home/rpm-src-dir/.omnibor/objects/7d/acbbfd2022c5322059fcca00c7e462ce38ba3c
-rw-r--r--. 1 root mock 529732 Oct 17 05:28 sysstat-debuginfo-11.7.3-6.el8.x86_64.rpm
-rw-r--r--. 1 root mock 217384 Oct 17 05:28 sysstat-debugsource-11.7.3-6.el8.x86_64.rpm
-rw-r--r--. 1 root mock 434208 Oct 17 05:28 sysstat-11.7.3-6.el8.x86_64.rpm
[root@e8281323a4d6 rpm-src-dir]#

Generating SPDX Docs

Bomsh is able to generate SPDX SBOM documents for built Debian/RPM packages. When rebuilding Debian/RPM packages using the bomsh_rebuild_deb.py or bomsh_rebuild_rpm.py script, adding the --bomsh_spdx option will build the SPDX documents.

Also for the bomsh_rebuild_deb.py script, the --deb_build_script can be used to specify a script file to build the Debian packages, without specifying the -f or --buildinfo_file option. Note that this script must copy the built Debian packages and the source tarball files to expected location for later use by the bomsh_spdx_deb.py script to generate the SPDX documents. An example bomsh-openosc-deb.sh script file has been provided to illustrate such Debian-build script.

$ git clone URL-of-this-git-repo bomsh
$ wget https://vault.centos.org/8-stream/AppStream/Source/SPackages/sysstat-11.7.3-7.el8.src.rpm
$ bomsh/scripts/bomsh_rebuild_rpm.py -c alma+epel-8-x86_64 --docker_image_base almalinux:8 -s sysstat-11.7.3-7.el8.src.rpm -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir --syft_sbom --bomsh_spdx --mock_option="--no-bootstrap-image --define 'packager BOMSH user $(id -un) at $(hostname)'"
$ grep -B1 -A3 CVElist outdir/bomsher_out/bomsh_logfiles/bomsh_search_jsonfile-details.json
$
$ # the above should take only a few minutes, and the below may take tens of minutes
$ wget https://buildinfos.debian.net/buildinfo-pool/s/sysstat/sysstat_11.7.3-1_all-amd64-source.buildinfo
$ bomsh/scripts/bomsh_rebuild_deb.py -f sysstat_11.7.3-1_all-amd64-source.buildinfo -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir2 --syft_sbom --bomsh_spdx --mmdebstrap_no_cleanup
$ grep -B1 -A3 CVElist outdir2/bomsher_out/bomsh_logfiles/bomsh_search_jsonfile-details.json
$
$ # specify a Debian build script to build Debian packages
$ bomsh/scripts/bomsh_rebuild_deb.py --deb_build_script bomsh/scripts/bomsh-openosc-deb.sh -o openosc-outdir --syft_sbom --bomsh_spdx
$ ls -tl openosc-outdir/bomsher_out/bomsh_sbom/

Currently only SPDX v2.3 version SBOM documents are supported.

Bomsh is able to generate SPDX documents for generic images too, like ISO/OVA image files. This usually involves an image unbundler or unpacking tool to unpack the image to individual binary files. The bomsh_spdx_image.py script is created for this purpose. This feature is for advanced users only.

Reducing Storage of Generated OmniBOR Docs

The generated OmniBOR documents are usually more than necessary for the final delivery products.

There are many reasons to generate more OmniBOR docs than necessary: ./configure script, testing code, build tool preparation, etc. The bomsh tool nor the gcc/clang compilers can distinguish these compilations from the normal software build compilations.

To solve this issue, a new option "--copyout_bomdir" has been added to the bomsh_search_cve.py script. When this option is specified, the script will build the hash tree for your queried binary files, and will copy all the necessary/relevant OmniBOR documents to the new copyout_bomdir, while leaving other unnecessary/irrlevant OmniBOR docs. This way, a truncated set of OmniBOR documents are created in the new copyout_bomdir. If user does not specify any binary file, then all the OmniBOR files (of either sha1 or sha256 via the --hashtype option) will be copied. Experiments with Ubuntu kernel build has found that this option can reduce the storage size of OmniBOR documents by 5 to 6 times. Therefore it is suggested to run this command before saving the OmniBOR documents to your reository.

Another optimization is to remove the separator strings like the โ€œblob โ€œ and โ€œbom โ€œ strings and the first line of โ€œgitoid:blob:sha1โ€/โ€gitoid:blob:sha256โ€ from all OmniBOR docs, before compressing them to store these OmniBOR docs; and re-insert them after uncompressing to recover these OmniBOR docs. A new option "--remove_sepstrs_in_doc" has been added to the bomsh_search_cve.py script to do the removal, and another new option "--insert_sepstrs_in_doc" has been added to re-insert them.

Here is an example for Ubuntu kernel build:

$ # the below command creates more OmniBOR docs than necessary in the default .omnibor directory:
$ ./bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -p linux-image-unsigned-5.11.0-1028-aws_5.11.0-1028.31~20.04.1_amd64.deb
$ # the below command creates a truncated set of OmniBOR docs in the new truncate-bomdir:
$ ./bomsh_search_cve.py --bom_dir .omnibor -f linux-image-unsigned-5.11.0-1028-aws_5.11.0-1028.31~20.04.1_amd64.deb --copyout_bomdir truncate-bomdir
$ # the below command copies out and removes the separator strings of OmniBOR docs in the new remove-bomdir:
$ ./bomsh_search_cve.py --bom_dir truncate-bomdir --copyout_bomdir remove-bomdir --remove_sepstrs_in_doc
$ # the below command recovers the truncated OmniBOR docs in the new recover-bomdir:
$ ./bomsh_search_cve.py --bom_dir remove-bomdir --copyout_bomdir recover-bomdir --insert_sepstrs_in_doc

To further reduce the storage size, we can sacrifice the OmniBOR tree structure: we can collapse all subtrees to a single level and remove duplicate artifact-IDs. Because a lot of common .h header files are repeated in the OmniBOR docs of different .o files, this will be able to remove many such duplicates, and reduce the storage size significantly. For most of applications like CVE search, we donโ€™t need to keep the full OmniBOR tree structure, so this subtree collapsing works for CVE search while reducing the storage size. And sometimes we donโ€™t want to expose the OmniBOR tree structure, a flat list of artifact IDs can still satisfy the SBOM requirement while reducing storage significantly.

A new option "--subtree_collapsed_bomdir" has been added to the bomsh_search_cve.py script to implement this feature. The "--remove_sepstrs_in_doc" and "--insert_sepstrs_in_doc" options can work well with this option.

Here is an example for Ubuntu kernel build:

$ # the below command creates more OmniBOR docs than necessary in the default .omnibor directory:
$ ./bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -p linux-image-unsigned-5.11.0-1028-aws_5.11.0-1028.31~20.04.1_amd64.deb
$ # the below command creates a truncated set of OmniBOR docs in the new truncate-bomdir:
$ ./bomsh_search_cve.py --bom_dir .omnibor -f linux-image-unsigned-5.11.0-1028-aws_5.11.0-1028.31~20.04.1_amd64.deb --subtree_collapsed_bomdir collapse-bomdir --remove_sepstrs_in_doc
$ # the below command recovers the newly created subtree-collapsed OmniBOR docs in the new recover-bomdir:
$ ./bomsh_search_cve.py --bom_dir collapse-bomdir --copyout_bomdir recover-bomdir --insert_sepstrs_in_doc

Manipulating OmniBOR Artifact Tree with Grafting and Pruning

Sometimes it is necessary to manipulate OmniBOR manifest documents to create a new artifact tree. For example, we want to hide the details of a proprietary library, or the OmniBOR details of a third-party library is provided later after software build and we want to amend the OmniBOR artifact tree with the OmniBOR details of this third-party library. This will require the manipulation of OmniBOR documents, to prune existing subtrees or graft new subtrees. A new bomsh_art_tree.py script is created for this purpose.

Here is an example of grafting and pruning OmniBOR artifact trees:

$ git clone URL-of-this-git-repo bomsh
$ wget http://vault.centos.org/8-stream/AppStream/Source/SPackages/sysstat-11.7.3-9.el8.src.rpm
$ bomsh/scripts/bomsh_rebuild_rpm.py -c alma+epel-8-x86_64 --docker_image_base almalinux:8 -s sysstat-11.7.3-9.el8.src.rpm -d bomsh/scripts/sample_sysstat_cvedb.json -o outdir --syft_sbom --mock_option="--no-bootstrap-image --define 'packager BOMSH user $(id -un) at $(hostname)'"
$ cp outdir/bomsher_out/bomsh_logfiles/bomtrace2 /tmp ; cp bomsh/scripts/*.py /tmp
$ cd bomsh/src ; make -f Makefile.libd1 clean ; make -f Makefile.libd1 libd1.so
$ rm -rf .omnibor /tmp/bomsh_hook_* omnibor_libs1 ; make -f Makefile.libs1 clean ; /tmp/bomtrace2 make -f Makefile.libs1 libs1.a ; /tmp/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b omnibor_libs1
$ rm -rf .omnibor /tmp/bomsh_hook_* omnibor_hello2 ; make clean ; /tmp/bomtrace2 make -f Makefile hello2 ; /tmp/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b omnibor_hello2
$ # You will see the embedded OmniBOR bom-id with the below readelf command
$ readelf -x .note.omnibor hello2
$ # Now you should NOT see the OmniBOR subtree for libsa1.a library when searching for hello2 binary
$ /tmp/bomsh_search_cve.py -vvv -b omnibor_hello2 -f hello2 ; cat /tmp/bomsh_search_jsonfile-details.json
$ ### Grafting the artifact subtree of libs1.a library to the main tree of hello2
$ /tmp/bomsh_art_tree.py -B omnibor_libs1,omnibor_hello2 -O new_omnibor_hello2 -f hello2
$ # After grafting, you should see the OmniBOR subtree for libsa1.a library when searching for hello2 binary
$ /tmp/bomsh_search_cve.py -vvv -b new_omnibor_hello2 -f hello2 ; cat /tmp/bomsh_search_jsonfile-details.json
$ # Also the embedded OmniBOR bom-id is changed accordingly. Note a new new_omnibor_hello2/artifacts/hello2 with changed bom-id is created by default
$ readelf -x .note.omnibor hello2 new_omnibor_hello2/artifacts/hello2
$ # If you want to change hello2 directly, then you can add --change_in_place option
$ rm -rf new_omnibor_hello2 ; /tmp/bomsh_art_tree.py -B omnibor_libs1,omnibor_hello2 -O new_omnibor_hello2 -f hello2 --change_in_place
$ readelf -x .note.omnibor hello2
$ ### Pruning the artifact subtree of libs1.a library from the main tree of hello2
$ /tmp/bomsh_art_tree.py -B new_omnibor_hello2 -O prune_omnibor_hello2 --prune_gitoids $(git hash-object libs1.a) -f hello2
$ # After pruning, you should NOT see the OmniBOR subtree for libs1.a library when searching for hello2 binary
$ /tmp/bomsh_search_cve.py -vvv -b prune_omnibor_hello2/new_omnibor_hello2 -f hello2 ; cat /tmp/bomsh_search_jsonfile-details.json
$ # Also the embedded OmniBOR bom-id is changed accordingly. Again note a new hello2 with changed bom-id is created by default
$ readelf -x .note.omnibor hello2 prune_omnibor_hello2/new_omnibor_hello2/artifacts/hello2
$ # If you want to change hello2 directly, then you can add --change_in_place option
$ rm -rf prune_omnibor_hello2 ; /tmp/bomsh_art_tree.py -B new_omnibor_hello2 -O prune_omnibor_hello2 --prune_gitoids $(git hash-object libs1.a) -f hello2 --change_in_place
$ readelf -x .note.omnibor hello2

The above is just a very simple example to demonstrate the use of this bomsh_art_tree.py script. It has other options to let user customize the behavior. User can run "bomsh_art_tree.py --help" to find out more.

As a result, multiple OmniBOR bom-ids can be associated with a single artifact at the same time for a single software build instance. For example, internal development engineers can have a bom-id which have all the subtrees of your company's proprietary software, while external customers only have a bom-id without the subtrees of your company's proprietary software.

Most importantly, with Bomsh scripts, user does not need to rebuild software in order to get a different OmniBOR artifact tree. User can easily connect multiple OmniBOR artifact trees to form a new artifact tree. The Bomsh post-processing scripts significantly increase the flexibility for software vendors.

Creating Index Database for Debian Source Packages

It is very useful to have a metadata database for all the source files. A new bomsh_index_debrepo.py script is created for this purpose, which can create a blob indexing database for Debian/Ubuntu source packages.

Here is an example of creating the database for some Debian/Ubuntu official releases:

$ git clone URL-of-this-git-repo bomsh
$ bomsh/scripts/bomsh_index_debrepo.py -d mydir -r bullseye,bookworm,focal,jammy --skip_download_if_exist -vv --first_n_packages 3
$ cat /tmp/bomsh-index-pkg-db.json /tmp/bomsh-index-db.json /tmp/bomsh-index-summary.json
$ bomsh/scripts/bomsh_index_debrepo.py -d mydir -r focal-updates,jammy-security,focal,jammy --first_n_packages 0 -m http://archive.ubuntu.com/ubuntu
$ cat /tmp/bomsh-index-summary.json

The bomsh-index-pkg-db.json contains the { "package_name package_version" => list of blobs } mappings. The bomsh-index-db.json contains the { blob => list of packages } mappings. The bomsh-index-summary.json contains some summary information, the { deb_release => summary stats of num_packages, total_size } mappings. The full index database will be huge, and in the above example, we only do it for the first 3 packages in each release. To get only the bomsh-index-summary.json result, you can specify "--first_n_packages 0" option, which will not download or process any source package.

A few similar scripts are created too. The bomsh_index_yocto.py script can create index database for Yocto projects. The bomsh_index_ws.py script can create index database for RPM/Debian build workspace.

Creating Runtime Dependency Tree for ELF Binaries

A new bomsh_dynlib.py script is developed to create runtime dependency for Linux ELF executables and dynamic libraries. It utilizes the readelf program to read the dynamic section of the ELF binary files, create artifact dependency fragments (ADFs) for all relevant libraries, and save in a raw_logfile. This raw_logfile is then fed to other Bomsh scripts to create the OmniBOR database and artifact dependency graph (ADG) trees.

Here is an example of creating OmniBOR artifact dependency trees for some Linux binaries:

$ git clone URL-of-this-git-repo bomsh
$ bomsh/scripts/bomsh_dynlib.py -f /usr/sbin/sshd,/usr/bin/sha1sum --hashtype sha1,sha256
$ bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_dynlib_raw_logfile.sha1 -b omnibor_dir
$ bomsh/scripts/bomsh_search_cve.py -vvv -b omnibor_dir -f /usr/sbin/sshd,/usr/bin/sha1sum
$ ls -tl /tmp/bomsh_search_jsonfile* ; cat /tmp/bomsh_search_jsonfile-details.json

The bomsh_dynlib.py script also has -d option, which specifies a directory and the script will analyze all ELF executables/libraries in that directory. It can generate a snapshot identifier for your Linux system/container, or a subset of ELF executables, so that you can compare multiple run environments or provide customers a reference snapshot ID. The generated OmniBOR bom-id is like the SHA1 checksum for downloaded ISO image, now this OmniBOR ID stands for a tree of relevant ELF binary files. For example, if the sshd-omnibor-id of customer system is collected, then you can check or verify if you have the same run environment for SSH daemon as your customer.

Creating Runtime Dependency Tree for Python Scripts

A new bomsh_pylib.py script is developed to create runtime dependency for Python scripts. It does some static analysis on imported Python modules in each Python .py script, create artifact dependency fragments (ADFs) for all Python scripts, and save in a raw_logfile. This raw_logfile is then fed to other Bomsh scripts to create the OmniBOR database and artifact dependency graph (ADG) trees.

Here is an example of creating OmniBOR artifact dependency trees for some Python scripts:

$ git clone URL-of-this-git-repo bomsh
$ bomsh/scripts/bomsh_pylib.py -f bomsh/scripts/bomsh_hook2.py,bomsh/scripts/bomsh_pstree.py --hashtype sha1,sha256
$ bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_pylib_raw_logfile.sha256 -b omnibor_dir --hashtype sha256
$ bomsh/scripts/bomsh_search_cve.py -vvv -b omnibor_dir -f bomsh/scripts/bomsh_hook2.py,bomsh/scripts/bomsh_pstree.py --hashtype sha256
$ ls -tl /tmp/bomsh_search_jsonfile* ; cat /tmp/bomsh_search_jsonfile-details.json

The bomsh_pylib.py script also has -d option, which specifies a directory and the script will analyze all *.py files in that directory. Similarly as ELF binaries, you can generate a snapshot identifier for your set of Python scripts, so that you can compare multiple Python run environments or provide customers a reference snapshot ID.

Creating CVE Database for Software

The OmniBOR artifact tree created by bomsh lays the foundation for more useful things like CVE search for software. It is very important to create an accurate CVE database for your software. We will take the OpenSSL software as an example since OpenSSL is a very critical security software in Linux.

In order to accurately create the CVE database for OpenSSL, we have proposed to use YAML format to tag the git commits that introduce or fix the CVE. Here are some example YAML files for OpenSSL:

[yonhan@rtp-gpu-02 cveinfo_dir]$ more cveinfo.731f431.yaml
Fixed:
 CVE-2014-0160:
  src_files:
   - ssl/d1_both.c
   - ssl/t1_lib.c
[yonhan@rtp-gpu-02 cveinfo_dir]$ more cveinfo.4817504.yaml
Added:
 CVE-2014-0160:
  src_files:
   - ssl/d1_both.c
   - ssl/t1_lib.c
[yonhan@rtp-gpu-02 cveinfo_dir]$

When you put all such cveinfo.*.yaml files into a directory cveinfo_dir, you can run the below command to generate the CVE database for your software:

../bomsh/scripts/bomsh_create_cve.py --use_git_tags --cveinfo_dir cveinfo_dir -j openssl_bomsh_created_cvedb.json

The created CVE database file is the openssl_bomsh_created_cvedb.json file, which is used by the bomsh_search_cve.py script to search CVEs for binaries.

In order to cover more blobs that are not covered by the CVE-add/CVE-fix commits in the git repo, we have proposed the below YAML format for the CVE checking rules:

The below check in cveadd file for CVE-add:

CVE-2020-1967:
 ssl/t1_lib.c:
  include:
   - "if (sig_nid == sigalg->sigandhash)"
   - "? tls1_lookup_sigalg(s->s3.tmp.peer_cert_sigalgs[i])"
  exclude:
   - "if (sigalg != NULL && sig_nid == sigalg->sigandhash)"

The below check in cvefix file for CVE-fix:

CVE-2020-1967:
 ssl/t1_lib.c:
  include:
   - "if (sigalg != NULL && sig_nid == sigalg->sigandhash)"
  exclude:
   - "if (sig_nid == sigalg->sigandhash)"

The above CVE checking rules will be checked against for all the CVE-relevant source files in your git repo. If you put the cveadd and cvefix files in the cvecheck directory, then run the below command to generate a more complete CVE database:

../bomsh/scripts/bomsh_create_cve.py --use_git_tags --cveinfo_dir cveinfo_dir --cve_check_dir cvecheck -j openssl_bomsh_created_cvedb.json

Note that the above command must be run from the git repo directory of your software.

Some Linux distros apply additional security patches (including backporting of high-severity CVE fixes) on top of upstream software releases. This may generate new blobs that do not exist in the git repo of the software. The bomsh_create_cve.py script has been enhanced to cover this use case. For Centos or Fedora RPM git repo, you can clone the RPM git repo and run the below command, which scans your RPM git repo and finds all CVE-relevant blobs and checks against the CVE rules. The openssl_bomsh_created_cvedb.json input parameter is the CVE database created from the official OpenSSL git repo with the above bomsh_create_cve.py script.

../bomsh/scripts/bomsh_create_cve.py --cvedbfile openssl_bomsh_created_cvedb.json -vv --cve_check_dir cvecheck --gen_extra_cvedb

Another use case is to run bomtrace2 with CVE checking during software build, that is, during OmniBOR tree generation. You need to run bomtrace2 with "-c bomtrace.conf" option, and modify bomtrace.conf file and add the below "--cve_check_dir cvecheck" option when invoking bomsh_hook2.py script.

hook_script_cmdopt= --cve_check_dir cvecheck

This will generate some additional CVE metadata during OmniBOR tree generation, which will be utilized later by the bomsh_search_cve.py script. This will cover any new source file blobs that are not covered by the bomsh_create_cve.py script.

Please check the openssl-cve repo for some OpenSSL examples.

Software Vulnerability CVE Search

The generated hash tree database is /tmp/bomsh_hook_jsonfile, which can be fed to the scripts/bomsh_search_cve.py script for CVE vulnerability search.

To create the CVE database and search for CVEs for a software like OpenSSL, with Bombash, do the below:

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/openssl/openssl.git
$ cd openssl
$ ../bomsh/scripts/bomsh_create_cve.py -v -j openssl_cvedb.json
$ git checkout OpenSSL_1_1_1k
$ ./config
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_*; cp ../bomsh/scripts/bomsh_hook.py /tmp
$ BOMSH= ../bomsh/bin/bombash
$ make
$ exit
$ ../bomsh/scripts/bomsh_search_cve.py -r /tmp/bomsh_hook_jsonfile -d openssl_cvedb.json -f libssl.so.1.1,libcrypto.so.1.1

To create the CVE database and search for CVEs for a software like OpenSSL, with Bomtrace2, do the below:

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/openssl/openssl.git
$ cd openssl
$ ../bomsh/scripts/bomsh_create_cve.py -v -j openssl_cvedb.json
$ git checkout OpenSSL_1_1_1k
$ ./config
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_* /tmp/bomsh_createbom_*
$ cp ../bomsh/scripts/bomsh_hook2.py ../bomsh/scripts/bomsh_create_bom.py /tmp
$ ../bomsh/bin/bomtrace2 make
$ ../bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b /tmp/bomdir
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r /tmp/bomsh_createbom_jsonfile -d openssl_cvedb.json -f libssl.so.1.1,libcrypto.so.1.1
$ cat /tmp/bomsh_search_jsonfile-details.json
$ ../bomsh/scripts/bomsh_search_cve.py -vv -b /tmp/bomdir -d openssl_cvedb.json -f libssl.so.1.1,libcrypto.so.1.1
$ # You can also directly provide checksums (blob_ids) with -c option, or OmniBOR bom_ids with -g option
$ cat /tmp/bomsh_search_jsonfile-details.json

To create the CVE database and search for CVEs for a software like Linux kernel, do the below:

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/torvalds/linux.git
$ cd linux
$ ../bomsh/scripts/bomsh_create_cve.py -v -j linux_cvedb.json
$ git checkout v4.18
$ make menuconfig
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_* /tmp/bomsh_createbom_* ; cp ../bomsh/scripts/bomsh_*.py /tmp
$ ../bomsh/bin/bomtrace2 -w ../bomsh/bin/bomtrace_watched_programs make
$ ../bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -b /tmp/bomdir
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r /tmp/bomsh_createbom_jsonfile -d linux_cvedb.json -f arch/x86/boot/bzImage
$ cat /tmp/bomsh_search_jsonfile-details.json
$ ../bomsh/scripts/bomsh_search_cve.py -vv -b /tmp/bomdir -d linux_cvedb.json -f vmlinux,arch/x86/boot/bzImage
$ cat /tmp/bomsh_search_jsonfile-details.json

If you want to accurately create the CVE DB, please identify all the vulnerable source files for each CVE, specify the blob ID ranges of the source files that are vulnerable to the CVE in a text file, and run bomsh_create_cve script with the -r option. A sample text file is provided in scripts/sample_vulnerable_ranges.txt file.

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/openssl/openssl.git
$ cd openssl
$ ../bomsh/scripts/bomsh_create_cve.py -v -j openssl_cvedb.json -r openssl_vulnerable_cve_ranges.txt

Please note, in order to create a more accurate CVE database, please follow the instructions in the "Creating CVE Database for Software" section. It requires identification of CVE-add and CVE-fix git commits (which is one-time thing) in your software git repo. Also CVE checking rules are useful when new source file blobs exist in Linux distros or private software builds/releases.

Software Vulnerability CVE Search for JAVA Packages

To create the OmniBOR database and the CVE database for Log4j2 CVE-2021-44228, and search for CVEs for the Log4j2 software, do the below:

$ git clone URL-of-this-git-repo bomsh
$ git clone --branch rel/2.17.0 https://gitbox.apache.org/repos/asf/logging-log4j2.git log4j-2.17.0
$ cd log4j-2.17.0
$ ../bomsh/scripts/bomsh_create_cve.py -v -r ../bomsh/scripts/log4j2_CVE_2021_44228_ranges.txt -j ../log4j2_cvedb.json
$ ./mvnw package -Dmaven.test.skip=true
$ ../bomsh/scripts/bomsh_create_bom_java.py -r . -f log4j-core/target/log4j-core-2.17.0.jar -j log4j-treedb.json
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r log4j-treedb.json -d ../log4j2_cvedb.json -j result.json -f log4j-core/target/log4j-core-2.17.0.jar
$ grep -6 CVElist result.json-details.json
$
$ cd ..
$ git clone --branch rel/2.14.0 https://gitbox.apache.org/repos/asf/logging-log4j2.git log4j-2.14.0
$ cd log4j-2.14.0
$ ../bomsh/scripts/bomsh_create_cve.py -v -r ../bomsh/scripts/log4j2_CVE_2021_44228_ranges.txt -j ../log4j2_cvedb.json
$ ./mvnw package -Dmaven.test.skip=true
$ ../bomsh/scripts/bomsh_create_bom_java.py -r . -f log4j-core/target/log4j-core-2.14.0.jar -j log4j-treedb.json
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r log4j-treedb.json -d ../log4j2_cvedb.json -j result.json -f log4j-core/target/log4j-core-2.14.0.jar
$ grep -6 CVElist result.json-details.json

Here are the CVE search results for two versions of Log4j2 software:

[root@000b478b5d68 log4j-2.17.0]# /tmp/bomsh_search_cve.py -r bomsh_createbom_jsonfile -d ../log4j2_cvedb.json -vv -j mysearchcve-result.json -f log4j-core/target/log4j-core-2.17.0.jar

Here is the CVE search results:
{
    "log4j-core/target/log4j-core-2.17.0.jar": {
        "CVElist": [],
        "FixedCVElist": [
            "CVE-2021-44228"
        ]
    }
}
[root@000b478b5d68 log4j-2.17.0]# grep -6 CVElist mysearchcve-result.json-details.json
                "file_path": "./log4j-core/src/main/java/org/apache/logging/log4j/core/config/plugins/convert/TypeConverterRegistry.java"
            },
            "file_path": "./log4j-core/target/classes/org/apache/logging/log4j/core/config/plugins/convert/TypeConverterRegistry.class"
        },
        "71e9c7daeb6f4e3819403a1e37f8171f548e50ed": {
            "a783ea43c171982723e87cc6afd29287c63c1b53": {
                "FixedCVElist": [
                    "CVE-2021-44228"
                ],
                "file_path": "./log4j-core/src/main/java/org/apache/logging/log4j/core/lookup/JndiLookup.java"
            },
            "file_path": "./log4j-core/target/classes/org/apache/logging/log4j/core/lookup/JndiLookup.class"
        },
[root@000b478b5d68 log4j-2.17.0]#

[root@000b478b5d68 log4j-2.14.0]# /tmp/bomsh_search_cve.py -r bomsh_createbom_jsonfile -d ../log4j2_cvedb.json -vv -j mysearchcve-result.json -f log4j-core/target/log4j-core-2.14.0.jar

Here is the CVE search results:
{
    "log4j-core/target/log4j-core-2.14.0.jar": {
        "CVElist": [
            "CVE-2021-44228"
        ],
        "FixedCVElist": []
    }
}
[root@000b478b5d68 log4j-2.14.0]# grep -6 CVElist mysearchcve-result.json-details.json
                "file_path": "./log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/DatePatternConverter.java"
            },
            "file_path": "./log4j-core/target/classes/org/apache/logging/log4j/core/pattern/DatePatternConverter$CachedTime.class"
        },
        "605c82e7442a5693745e1e28736446a8ced01d3c": {
            "30e65ad24f4b4d799e52cfd70fcbebc0490b7343": {
                "CVElist": [
                    "CVE-2021-44228"
                ],
                "file_path": "./log4j-core/src/main/java/org/apache/logging/log4j/core/lookup/JndiLookup.java"
            },
            "file_path": "./log4j-core/target/classes/org/apache/logging/log4j/core/lookup/JndiLookup.class"
        },
[root@000b478b5d68 log4j-2.14.0]#

It shows that the 2.14.0 version log4j-core-2.14.0.jar is vulnerable to CVE-2021-44228, while the 2.17.0 version log4j-core-2.17.0.jar is not vulnerable (CVE has fixed). Also it reports the root cause: it is due to the specific version of the JndiLookup.java file with the githash of 30e65ad24f4b4d799e52cfd70fcbebc0490b7343. Note the git commit logs of log4j2 are manually inspected, and the "bomsh_create_cve.py -r ranges.txt" command is run to create log4j2_cvedb.json for CVE-2021-44228 in this example.

The bomsh_create_bom_java.py script also inserts .bom entry into .jar files automatically.

[root@000b478b5d68 log4j-2.17.0]# ../bomsh/scripts/bomsh_create_bom_java.py -r . -f log4j-core/target/log4j-core-2.17.0.jar -b bomdir -j log4j-treedb.json

[root@000b478b5d68 log4j-2.17.0]# jar tvf bomdir/with_bom_files/d4f6bcc969db60298df329972b9b6e83f3aec2e2-with_bom-0dc986b732c75ba0050cdbc859cd9b97eb2cf325-log4j-core-2.17.0.jar | tail -3
   650 Sat Jan 22 18:22:14 UTC 2022 org/apache/logging/log4j/core/jmx/LoggerConfigAdminMBean.class
  5833 Sat Jan 22 18:22:16 UTC 2022 org/apache/logging/log4j/core/jmx/StatusLoggerAdmin.class
    20 Mon Jan 24 04:38:45 UTC 2022 .bom
[root@000b478b5d68 log4j-2.17.0]# jar -xvf bomdir/with_bom_files/d4f6bcc969db60298df329972b9b6e83f3aec2e2-with_bom-0dc986b732c75ba0050cdbc859cd9b97eb2cf325-log4j-core-2.17.0.jar .bom
extracted: .bom
[root@000b478b5d68 log4j-2.17.0]# hexdump -C .bom
00000000  0d c9 86 b7 32 c7 5b a0  05 0c db c8 59 cd 9b 97  |....2.[.....Y...|
00000010  eb 2c f3 25                                       |.,.%|
00000014
[root@000b478b5d68 log4j-2.17.0]#

The bomsh_create_bom_java.py script can also work with strace to more accurately create the OmniBOR hash-tree database. Strace can be run first to collect the strace log, which is then read by bomsh_create_bom_java.py with the "-s" option. This tracks the read/write of .java/.class files, and should be able to more accurately associate .class files to .java files. The below is an example of creating the hash-tree database for Maven with strace logfile.

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/apache/maven.git ; cd maven
$ strace -f -s99999 --seccomp-bpf -e trace=openat -qqq -o strace_logfile mvn -Drat.numUnapprovedLicenses=1000 package
$ ../bomsh/scripts/bomsh_create_bom_java.py -r . -s strace_logfile -f maven-core/target/maven-core-4.0.0-alpha-1-SNAPSHOT.jar -j maven-treedb.json
$ cat maven-treedb.json

Software Vulnerability CVE Search for Rust Packages

To create the OmniBOR database for a Rust package like kalker, do the below:

$ git clone URL-of-this-git-repo bomsh
$ git clone https://github.com/PaddiM8/kalker.git
$ cd kalker ; echo "{}" > kalker_cvedb.json
$ ../bomsh/bin/bomtrace2 cargo build --release
$ cat /tmp/bomsh_hook_raw_logfile.sha1
$ ../bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -vv -b /tmp/bomdir
$ cat /tmp/bomsh_createbom_jsonfile
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r /tmp/bomsh_createbom_jsonfile -d kalker_cvedb.json -j result.json -f target/release/kalker
$ cat result.json-details.json

All the OmniBOR docs are created in /tmp/bomdir (the -b option of bomsh_create_bom.py script). And all the ELF files are automatically inserted with .note.omnibor ELF section.

Software Vulnerability CVE Search for GoLang Packages

To create the OmniBOR database for a golang package like outyet, do the below:

$ git clone URL-of-this-git-repo bomsh
$ # you need to find out the locaiton of your go compiler and tell bomtrace.
$ # on Ubuntu20.04, it is /usr/lib/go-1.13/pkg/tool/linux_amd64/compile
$ # the below is for RedHat/Centos/AlmaLinux
$ sed -i "s|hook_script_cmdopt=-vv > |hook_script_cmdopt=-vv -w /usr/lib/golang/pkg/tool/linux_amd64/compile,/usr/lib/golang/pkg/tool/linux_amd64/link > |" bomsh/bin/bomtrace.conf
$ sed -i "s|#syscalls=openat|syscalls=openat|" bomsh/bin/bomtrace.conf
$ git clone https://github.com/golang/example
$ cd example/outyet; echo "{}" > outyet_cvedb.json
$ rm -rf /tmp/bomdir /tmp/bomsh_hook_* /tmp/bomsh_createbom_*
$ ../bomsh/bin/bomtrace2 -c ../bomsh/bin/bomtrace.conf go build -a
$ cat /tmp/bomsh_hook_raw_logfile.sha1
$ ../bomsh/scripts/bomsh_create_bom.py -r /tmp/bomsh_hook_raw_logfile.sha1 -vv -b /tmp/bomdir
$ cat /tmp/bomsh_createbom_jsonfile
$ ../bomsh/scripts/bomsh_search_cve.py -vv -r /tmp/bomsh_createbom_jsonfile -d outyet_cvedb.json -j result.json -f outyet
$ cat result.json-details.json

Notice that "go build" by default caches previously built packages. The -a option makes "go build" ignore the cache. This is required for bomtrace to record all build steps. Also remember to compile bin/bomtrace2 with the latest patches/bomtrace2.patch file, and a customized bomtrace.conf file must be used because the bomtrace tool needs to know the location of go compiler and two more syscalls need to be traced. Again all the OmniBOR docs are created in /tmp/bomdir (the -b option of bomsh_create_bom.py script). And all the ELF files are automatically inserted with .note.omnibor ELF section.

Reproducible Build and Bomsh

A lot of Linux packages are now build-reproducible: byte-to-byte identical binaries are rebuilt when the same build environment is reproduced. About 95% of Debian package are already build-reproducible at the end of 2021, and future Debian Linux distros may enforce reproducible-build. Bomsh can record the build steps of these build-reproducible packages, and generate the OmniBOR docs, without altering the generated binary files. The created bomsh_omnibor_doc_mapping file can be signed for trust and distributed offline (via packaging or website access). This makes OmniBOR immediately ready for use by people for >90% of official Debian Linux packages, not only for newly built Linux packages. This holds true for other build-reproducible software like RPM packages, etc.

For reproducible build, the -n option must be specified when running bomsh_hook2.py script, in order to not embed any .note.omnibor section into the generated binary files. This requires the use of "-c bomtrace.conf" option when running bomtrace2.

A new bomsh_rebuild_deb.py script has been created to help with this use scenario. This script utilizes Docker container to automatically set up the environment. To rebuild an officially released Debian package like hostname 3.23 version, find its buildinfo file on the buildinfos.debian.net website, and then do the below:

$ git clone URL-of-this-git-repo bomsh
$ wget https://buildinfos.debian.net/buildinfo-pool/h/hostname/hostname_3.23_amd64.buildinfo
$ scripts/bomsh_rebuild_deb.py -f hostname_3.23_amd64.buildinfo
$ ls -tl bomsher_out/*
$ wget https://buildinfos.debian.net/buildinfo-pool/s/sysstat/sysstat_12.2.0-2_amd64.buildinfo
$ scripts/bomsh_rebuild_deb.py -f sysstat_12.2.0-2_amd64.buildinfo
$ ls -tl bomsher_out/*
$ wget https://buildinfos.debian.net/buildinfo-pool/l/linux/linux_5.10.84-1_amd64.buildinfo
$ scripts/bomsh_rebuild_deb.py -f linux_5.10.84-1_amd64.buildinfo
$ ls -tl bomsher_out/*

Then the rebuilt .deb files are in the bomsher_out/debs directory. The generated OmniBOR documents are in the bomsher_out/omnibor directory. For your convenience, the bomsher_out/omnibor/pkgs/.omnibor_adg. and bomsher_out/debs/.omnibor_adg. files are the created symlinks to the top level OmniBOR documents of your packages. Also the relevant bomsh logfiles are in the bomsher_out/bomsh_logfiles directory.

Even if it is not an officially released Debian package, for example, the openosc package, if it is reproducible, you can provide its source tarball and .dsc file, and then do the same:

$ git clone URL-of-this-git-repo bomsh
$ cat >Dockerfile.openosc <<EOL
FROM debian:bullseye

RUN apt-get update && apt-get install -y git build-essential debhelper ; \\
    rm -rf /var/lib/apt/lists/* ;

RUN cd /root ; git clone https://github.com/cisco/OpenOSC.git ; \\
    cd /root/OpenOSC ; sed -i 's/dpkg-buildpackage -b -uc -us/dpkg-buildpackage -F -uc -us/' Makefile.am ; \\
    autoreconf -vfi && ./configure && make deb ;

CMD cp /root/openosc_* /out
EOL
$ docker run -it --rm -v ${PWD}:/out $(docker build -q -f Dockerfile.openosc)
$ scripts/bomsh_rebuild_deb.py -f ./openosc*.buildinfo -s .
$ ls -tl bomsher_out/*

Using bomsh, we have successfully reproduced the build for some officially released versions of Debian packages: hostname, linux (Linux-kernel), openssl, sysstat, etc. We also created a repo to store these OmniBOR docs. Please check the omnibor-repo for some examples.

Such OmniBOR repo allows easy and convenient distribution of OmniBOR artifact trees for released binaries. This will motivate people to create various metadata and associate them with OmniBOR artifact trees. CVEs, bugs, features, licensing, security compliance, compatibility, build info, attestations, or declarations of mitigations can all be created as metadata for OmniBOR. When more OmniBOR metadata is public available, people will be more motivated to use OmniBOR docs and artifact trees. This way, positive-feedback cycle will be formed to greatly help OmniBOR wide adoption.

We believe more use scenarios will be found for the OmniBOR repo. For example, the checksums of known-vulnerable binary files, like OpenSSL releases with HeartBleed vulnerability (CVE-2014-0160), grub2 releases with BootHole vulnerability (CVE-2020-10713), or Log4j2 releases with Log4Shell vulnerability (CVE-2021-44228) can be put into a blacklist or alert-list in our repo. People can easily download such a blacklist from our repo and use it to prevent the execution of such vulnerable binaries or alert the user.

If you have any good ideas, please share with us. More people involved, more useful OmniBOR will be!

Notes

  1. This has been tested on Ubuntu22.04/Ubuntu20.04/AlmaLinux9/AlmaLinux8/Centos8/RedHat8.

  2. Most of the generated files by the scripts are put in /tmp directory by default, except the OmniBOR docs are put in ${PWD}/.omnibor directory. This is configurable. The tmp directory can be changed with the --tmpdir option. The omnibor directory can be changed with the -b/--bom_dir option.

  3. The performance of Bomtrace3 is about 20% overhead.

  4. The bomsh_hook2.py and bomsh_create_bom.py scripts call git/head/ar/readelf/xxd/objcopy, make sure they are installed.

  5. The bomsh_create_bom_java.py script calls git/head/diff/xxd/javap/jar/zip, make sure they are installed.

References

  1. Towards a GitBOM Specification : https://hackmd.io/@aeva/draft-gitbom-spec

  2. Bomsh/Bomtrace: Tools to Generate GitBOM Artifact Trees and Search CVE for Software Build

  3. OmniBOR: A System for Automatic, Verifiable Artifact Resolution across Software Supply Chains

bomsh's People

Contributors

yonhan3 avatar edwarnicke avatar tisawesomeness avatar jeff-schutt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.