GithubHelp home page GithubHelp logo

llnl / surfactant Goto Github PK

View Code? Open in Web Editor NEW
15.0 4.0 13.0 893 KB

Modular framework for SBOM generation that gathers file information and analyzes dependencies

License: MIT License

Python 89.76% C# 0.22% Java 0.09% C++ 0.15% CMake 0.22% HTML 9.55%
cyclonedx dependencies dependency-analysis dependency-graph python python3 sbom sbom-generator software-bill-of-materials software-composition-analysis

surfactant's Introduction

Blue magnifying glass Surfactant logo

Surfactant

A modular framework to gather file information for SBOM generation and dependency analysis.

CI Test Status PyPI Python Versions MIT License Documentation Status pre-commit.ci status

Documentation

Description

Surfactant can be used to gather information from a set of files to generate an SBOM, along with manipulating SBOMs and analyzing the information in them. It pulls information from recognized file types (such as PE, ELF, or MSI files) contained within a directory structure corresponding to an extracted software package. By default, the information is "surface-level" metadata contained in the files that does not require running the files or decompilation.

Installation

For Users:

  1. Create a virtual environment with python >= 3.8 [Optional, but recommended]
python -m venv cytrics_venv
source cytrics_venv/bin/activate
  1. Install Surfactant with pip
pip install surfactant

For Developers:

  1. Create a virtual environment with python >= 3.8 [Optional, but recommended]
python -m venv cytrics_venv
source cytrics_venv/bin/activate
  1. Clone sbom-surfactant
git clone [email protected]:LLNL/Surfactant.git
  1. Create an editable surfactant install (changes to code will take effect immediately):
pip install -e .

To install optional dependencies required for running pytest and pre-commit:

pip install -e ".[test,dev]"

Usage

Identify sample file

In order to test out surfactant, you will need a sample file/folder. If you don't have one on hand, you can download and use the portable .zip file from https://github.com/ShareX/ShareX/releases or the Linux .tar.gz file from https://github.com/GMLC-TDC/HELICS/releases. Alternatively, you can pick a sample from https://lc.llnl.gov/gitlab/cir-software-assurance/unpacker-to-sbom-test-files

Build configuration file

A configuration file contains the information about the sample to gather information from. Example JSON configuration files can be found in the examples folder of this repository.

extractPaths: (required) the absolute path or relative path from location of current working directory that surfactant is being run from to the sample folders, cannot be a file (Note that even on Windows, Unix style / directory separators should be used in paths)
archive: (optional) the full path, including file name, of the zip, exe installer, or other archive file that the folders in extractPaths were extracted from. This is used to collect metadata about the overall sample and will be added as a "Contains" relationship to all software entries found in the various extractPaths
installPrefix: (optional) where the files in extractPaths would be if installed correctly on an actual system i.e. "C:/", "C:/Program Files/", etc (Note that even on Windows, Unix style / directory separators should be used in the path). If not given then the extractPaths will be used as the install paths

Create config command

A basic configuration file can be easily built using the create-config command. This will take a path as a command line argument and will save a file with the default name of the end directory passed to it as a json file. i.e., /home/user/Desktop/myfolder will create myfolder.json.

$  surfactant create-config [INPUT_PATH]

The --output flag can be used to specify the configuration output name. The --install-prefix can be used to specify the install prefix, the default is '/'.

$  surfactant create-config [INPUT_PATH] --output new_output.json --install-prefix 'C:/'

Example configuration file

Lets say you have a .tar.gz file that you want to run surfactant on. For this example, we will be using the HELICS release .tar.gz example. In this scenario, the absolute path for this file is /home/samples/helics.tar.gz. Upon extracting this file, we get a helics folder with 4 sub-folders: bin, include, lib64, and share.

Example 1: Simple Configuration File

If we want to include only the folders that contain binary files to analyze, our most basic configuration would be:

[
  {
    "extractPaths": ["/home/samples/helics/bin", "/home/samples/helics/lib64"]
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/home/samples/helics/bin/helics_binary"],
      "containerPath": null
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/home/samples/helics/lib64/lib1.so"],
      "containerPath": null
    }
  ],
  "relationships": [
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    }
  ]
}
Example 2: Detailed Configuration File

A more detailed configuration file might look like the example below. The resulting SBOM would have a software entry for the helics.tar.gz with a "Contains" relationship to all binaries found to in the extractPaths. Providing the install prefix of / and an extractPaths as /home/samples/helics will allow to surfactant correctly assign the install paths in the SBOM for binaries in the subfolders as /bin and /lib64.

[
  {
    "archive": "/home/samples/helics.tar.gz",
    "extractPaths": ["/home/samples/helics"],
    "installPrefix": "/"
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc0",
      "fileName": ["helics.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/bin/helics_binary"],
      "containerPath": ["abc0/bin/helics_binary"]
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/lib64/lib1.so"],
      "containerPath": ["abc0/lib64/lib1.so"]
    }
  ],
  "relationships": [
    {
      "xUUID": "abc0",
      "yUUID": "abc1",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc2",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    }
  ]
}
Example 3: Adding Related Binaries

If our sample helics tar.gz file came with a related tar.gz file to install a plugin extension module (extracted into a helics_plugin folder that contains bin and lib64 subfolders), we could add that into the configuration file as well:

[
  {
    "archive": "/home/samples/helics.tar.gz",
    "extractPaths": ["/home/samples/helics"],
    "installPrefix": "/"
  },
  {
    "archive": "/home/samples/helics_plugin.tar.gz",
    "extractPaths": ["/home/samples/helics_plugin"],
    "installPrefix": "/"
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc0",
      "fileName": ["helics.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/bin/helics_binary"],
      "containerPath": ["abc0/bin/helics_binary"]
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/lib64/lib1.so"],
      "containerPath": ["abc0/lib64/lib1.so"]
    },
    {
      "UUID": "abc3",
      "fileName": ["helics_plugin.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc4",
      "fileName": ["helics_plugin"],
      "installPath": ["/bin/helics_plugin"],
      "containerPath": ["abc3/bin/helics_plugin"]
    },
    {
      "UUID": "abc5",
      "fileName": ["lib_plugin.so"],
      "installPath": ["/lib64/lib_plugin.so"],
      "containerPath": ["abc3/lib64/lib_plugin.so"]
    }
  ],
  "relationships": [
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc4",
      "yUUID": "abc5",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc5",
      "yUUID": "abc2",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc1",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc2",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc3",
      "yUUID": "abc4",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc3",
      "yUUID": "abc5",
      "relationship": "Contains"
    }
  ]
}

NOTE: These examples have been simplified to show differences in output based on configuration.

Run surfactant

$  surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM]

CONFIG_FILE: (required) the config file created earlier that contains the information on the sample
SBOM OUTPUT: (required) the desired name of the output file
INPUT_SBOM: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems
--skip_gather: (optional) skips the gathering of information on files and adding software entires
--skip_relationships: (optional) skips the adding of relationships based on metadata
--skip_install_path: (optional) skips including an install path for the files discovered. This may cause "Uses" relationships to also not be generated
--recorded_institution: (optional) the name of the institution collecting the SBOM data (default: LLNL)
--output_format: (optional) changes the output format for the SBOM (given as full module name of a surfactant plugin implementing the write_sbom hook)
--input_format: (optional) specifies the format of the input SBOM if one is being used (default: cytrics) (given as full module name of a surfactant plugin implementing the read_sbom hook)
--help: (optional) show the help message and exit

Understanding the SBOM Output

Software

This section contains a list of entries relating to each piece of software found in the sample. Metadata including file size, vendor, version, etc are included in this section along with a uuid to uniquely identify the software entry.

Relationships

This section contains information on how each of the software entries in the previous section are linked.

Uses: this relationship type means that x software uses y software i.e. y is a helper module to x
Contains: this relationship type means that x software contains y software (often x software is an installer or archive such as a zip file)

Observations:

This section contains information about notable observations about individual software components. This could be vulnerabilities, observed features, etc

Merging SBOMs

A folder containing multiple separate SBOM JSON files can be combined using merge_sbom.py with a command such the one below that gets a list of files using ls, and then uses xargs to pass the resulting list of files to merge_sbom.py as arguments.

ls -d ~/Folder_With_SBOMs/Surfactant-* | xargs -d '\n' surfactant merge --config_file=merge_config.json --sbom_outfile combined_sbom.json

If the config file option is given, a top-level system entry will be created that all other software entries are tied to (directly or indirectly based on other relationships). Specifying an empty UUID will make a random UUID get generated for the new system entry, otherwise it will use the one provided.

Details on the merge command can be found in the docs page here.

Plugins

Surfactant supports using plugins to add additional features. For users, installing and enabling a plugin usually just involves doing a pip install of the plugin.

Detailed information on configuration options for the plugin system and how to develop new plugins can be found here.

Support

Full user guides for Surfactant are available online and in the docs directory.

For questions or support, please create a new discussion on GitHub Discussions, or open an issue for bug reports and feature requests.

Contributing

Contributions are welcome. Bug fixes or minor changes are preferred via a pull request to the Surfactant GitHub repository. For more information on contributing see the CONTRIBUTING file.

License

Surfactant is released under the MIT license. See the LICENSE and NOTICE files for details. All new contributions must be made under this license.

SPDX-License-Identifier: MIT

LLNL-CODE-850771

surfactant's People

Contributors

apochira avatar ccbromia avatar czatar avatar dependabot[bot] avatar docjon09 avatar douglasdennis avatar kendallharteratwork avatar learningcomputers777 avatar levilloyd avatar mcutshaw avatar mgallegos4 avatar mws180000 avatar nightlark avatar pochiraju1 avatar pre-commit-ci[bot] avatar reestwick avatar shaynakapadia avatar slyles1001 avatar thestache avatar wangmot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

surfactant's Issues

UX: Generate SBOM info for single file?

If the generate command gets given a binary file, an empty SBOM may be generated. It could make sense to detect that it is a non-JSON/config file, and generate a single entry SBOM (similar to giving Surfactant a directory). Same for listing a file under extract paths in a config file instead of a folder.

installPath not replaced with installPrefix on Windows

Describe the bug
When using Surfactant on Windows the installPath is not updated with a config defined installPrefix.

To Reproduce
Steps to reproduce the behavior:

  1. Create config like the following:
[
    {
        "extractPaths": [
            "path/to/directory"
        ],
        "installPrefix": "new/install/path"
    }
]
  1. Run Surfactant in a Windows environment.
  2. Check the generated SBOM and notice the installPath has not been updated.

Expected behavior
installPath would be updated with the installPrefix.

Screenshots
Configuration file used:

[
  {
    "extractPaths": [ "C:/Program Files/Git/mingw64/bin" ],
    "installPrefix": "new/install/path"
  }
]

Example from the generated SBOM:

    {
      "UUID": "e3b55c39-af35-4041-a1ae-5b27750305a3",
      "name": null,
      "size": 92465,
      "fileName": [
        "acountry.exe"
      ],
      "installPath": [
        "C:/Program Files/Git/mingw64/bin\\acountry.exe"
      ],
      "containerPath": [],
      "captureTime": 1694456199,
      "version": "",
      "vendor": [],
      "description": "",
      "sha1": "ee31a2610f360b4e19066325b19a7deed6e0c8b3",
      "sha256": "cc304d2a3c13f6c1b587e484c53b8750fc740717723c4cc293fa26557f8586bc",
      "md5": "33eaa33f9632e9c84b2100ef9a1eec2c",
      "relationshipAssertion": "Unknown",
      "comments": "",
      "metadata": [
        {
          "collectedBy": "Surfactant",
          "collectionPlatform": "Windows-10-10.0.19045-SP0",
          "fileInfo": {
            "mode": "-rwxrwxrwx",
            "hidden": false
          }
        },
        {
          "OS": "Windows",
          "peMachine": "AMD",
          "peOperatingSystemVersion": "4.0",
          "peSubsystemVersion": "5.2",
          "peSubsystem": "WINDOWS_CUI",
          "peLinkerVersion": "2.40",
          "peImport": [
            "KERNEL32.dll",
            "msvcrt.dll",
            "WS2_32.dll",
            "libcares-2.dll"
          ],
          "peIsExe": true,
          "peIsDll": false,
          "peIsClr": false,
          "dllRedirectionLocal": false
        }
      ],
      "supplementaryFiles": [],
      "provenance": null,
      "recordedInstitution": "LLNL",
      "components": []
    },

System Information (please complete the following information):

  • OS: Windows 10

Additional context
This appears to be due to the regex used in real_path_to_install_path that requires a UNIX style separator (/) to be used. However, os.join (which is used to generate the filepaths) uses whatever the OS's path separator happens to be.

Add support for files with zlib magic bytes

There are a few different magic bytes patterns that can be used by zlib compressed files. Being able to recognize them for inclusion in generated SBOMs would be good; we've come across a number zlib compressed firmware files, or files such as z19 which is just a zlib compressed motorola s-record file.

Enable plugins to hint at important metadata fields

Currently what the interesting bits of metadata that give a good indication of what an SBOM field such as version should be is hard-coded in surfactant/cmd/generate.py.

This can probably be done by adding a hookspec that allows a plugin to return hints about what metadata fields can be mapped to specific pieces of key SBOM information. Implementations of the hook for PE and OLE/MSI file metadata should be added. Alternative ideas for how to approach this can be considered.

Stickers!

Look into what it would take to get stickers, or small business card size things printed with some basic info to hand out.

Adding a web interface to interact with SBOM/JSON files

Currently the only way to modify an initial SBOM is mainly by overwriting the associated JSON file manually. It might be a better idea and more user friendly in the long run to create a web interface where users can open SBOM files and modify them more easily without producing errors in the final JSON. The Web interface could also incorporate other features like invoking the command line functionality over a GUI Web interface.

Support user config options for plugins

Is your feature request related to a problem? Please describe.
Users should be able to configure settings unique to each plugin, and ideally be able to control which plugins run beyond using pip virtual environments. This feature could also be used to let users configure default Surfactant settings, such as the recorded institution and potentially preferred behavior for various scenarios. A config or settings subcommand for editing a user-wide settings file similar to git config could be nice.

Describe the solution you'd like
From discussion in the working group, the following is what we'd like (copy+paste from PowerPoint slides gives an image):
image

.NET cross-platform runtime behavior when supplying platform specific library names.

For loading native libraries, .NET specifies that it checks the following name combinations depending on what platform it's running on:

https://learn.microsoft.com/en-us/dotnet/standard/native-interop/native-library-loading

It does not mention what happens if a name like "libname.dll" is provided to a .NET assembly running on Linux/macOS. Does the runtime remove the platform-specific ".dll" extension before adding on ".so" or does it try "libname.dll.so"? Does the same happen if a Linux/macOS name is supplied to a Windows .NET runtime?

Pypi installer fails if wheel is not installed

If following the instructions in the original readme, the build will fail in a virtual environment if wheel is not installed which then requires the user to manually install wheel before finishing the installation.

Generate "Uses" Relationships without installPrefix

Note: I have framed this as a feature request as I'm unsure if the current behavior is a bug or expected.

Is your feature request related to a problem? Please describe.
When I run Surfactant using a config file that only contains an extractPaths entry then no "Uses" relationships are generated for imports. To get the "Uses" relationships to be generated I must include an installPrefix. This requirement is not currently documented, as far as I can see, so I'm not sure if this is expected.

As I initially read the documentation, I expected installPrefix to help in two situations. One, be a way of overriding the generated SBOM's installPath to better reflect an actual installation of the software. Two, it would also give the user an opportunity to remove any local information that may be revealed in the extract path. I didn't expect that I had to specify installPrefix in order to generate relationships.

Describe the solution you'd like
My ideal solution is that the extractPaths are used as the installPath if an installPrefix is not explicitely stated. This would ensure that "Uses" relationships are generated by default. I would include the extractPaths in the generated installPath of the SBOM unless overriden by the installPrefix.

This solution would break existing behavior in two ways:

  1. Generate relationships when they used to not.
  2. Generate an installPath when it used to be an empty list.

Change 1 can be addressed by the user using --skip_relationships. Change 2 cannot be easily addressed and I believe that is still acceptable. The current implementation generates no installPath information if an installPrefix is not provided.
That makes it so that file system hierarchical information is not collected (i.e. the generated SBOM appears to exist in a flat file system). I don't believe that this is especially useful by default and users would rather collect that information without additional configuration.

Describe alternatives you've considered
One alternative solution would be to use the extract path solely for the sake of calculating relationships but to not save it as the installPath. I believe this would be a more complicated solution as current relationship generating methods appear to rely on the installPath field in the software object.

Another solution is to simply update the documentation to better reflect that an installPrefix must be provided to ensure that "Uses" relationships are captured.

Additional context
If you run Surfactant by passing an extract path directly through the CLI (instead of using a config file), then the extract path is automatically used as the installPrefix and relationships are generated. This behavior matches my proposed solution.

I have found that my documentation of existing behavior is consistent on Windows and Linux for PE and ELF binaries, respectively.

The existing documentation does seem to imply the existing behavior is expected. Example 1 in the README demonstrates a bare config file with no installPrefix and the generated SBOM has no relationships. Example 2 then shows a config file with an installPrefix and the relationships are in fact generated. I would suggest that this behavior be more explicitely stated if it is expected though.

Explore recognizing JavaScript libraries

retire.js is a vulnerability scanner for JavaScript, and has a file with a collection of patterns for recognizing various JavaScript libraries and their versions.

  1. Look into what it would take to identify a file as JavaScript (or HTML/CSS)
  2. Add a method to recognize specific JavaScript libraries and their versions using a database, either from retire.js or another tool that has a similar collection of patterns that we can use as a starting point for pattern matching.

Surfactant crash when processing files cant decode byte 0xc4 in position 35

Describe the bug
Installed the latest rc2.
Run generate over files that I know crash surfactant
observe crash

To Reproduce
Steps to reproduce the behavior:

  1. Install Surfactant 0.0.0rc2
  2. surfactant generate ./_modules.extracted.crashes.surfactant test.sbom.json

Expected behavior
sbom generated

Crash Output
`
└─$ surfactant generate ./_modules.extracted.crashes.surfactant test.sbom.json
2024-01-25 08:45:30.479 | WARNING | surfactant.cmd.generate:sbom:284 - Fixing install path
2024-01-25 08:45:30.479 | INFO | surfactant.cmd.generate:sbom:293 - Processing ./_modules.extracted.crashes.surfactant
Traceback (most recent call last):
File "/home/user/surfactant/bin/surfactant", line 8, in
sys.exit(main())
^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/surfactant/cmd/generate.py", line 347, in sbom
if ftype := pm.hook.identify_file_type(filepath=filepath):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/pluggy/_hooks.py", line 501, in call
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/pluggy/_callers.py", line 138, in _multicall
raise exception.with_traceback(exception.traceback)
File "/home/user/surfactant/lib/python3.11/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/surfactant/lib/python3.11/site-packages/surfactant/filetypeid/id_hex.py", line 82, in identify_file_type
curr = f.readline()
^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 35: invalid continuation byte

└─$ surfactant version
0.0.0rc2

`

System Information (please complete the following information):

  • OS: Debian 6.1.20-1

Additional context
Cannot share the files but this is the head of the first one in the directory. I'm not sure if this file is the one causing the crash, if Surfactant outputted the file it crashes on I could examine that file further.
xxd 324C016 | head 00000000: 0000 0c48 4c69 6e6f 0210 0000 6d6e 7472 ...HLino....mntr 00000010: 5247 4220 5859 5a20 07ce 0002 0009 0006 RGB XYZ ........ 00000020: 0031 0000 6163 7370 4d53 4654 0000 0000 .1..acspMSFT.... 00000030: 4945 4320 7352 4742 0000 0000 0000 0000 IEC sRGB........ 00000040: 0000 0001 0000 f6d6 0001 0000 0000 d32d ...............- 00000050: 4850 2020 0000 0000 0000 0000 0000 0000 HP ............ 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000080: 0000 0011 6370 7274 0000 0150 0000 0033 ....cprt...P...3 00000090: 6465 7363 0000 0184 0000 006c 7774 7074 desc.......lwtpt

I did put that single file in its own directory and reran Surfactant it ran but produced an empty output.
`
└─$ cp ./_modules.extracted.crashes.surfactant/324C016 ./crash1

└─$ surfactant generate ./crash1 test.sbom.json
2024-01-25 08:55:14.539 | WARNING | surfactant.cmd.generate:sbom:284 - Fixing install path
2024-01-25 08:55:14.539 | INFO | surfactant.cmd.generate:sbom:293 - Processing ./crash1

└─$ cat test.sbom.json
{
"systems": [],
"hardware": [],
"software": [],
"relationships": [],
"analysisData": [],
"observations": [],
"starRelationships": []
}

└─$ ls ./crash1
324C016

└─$ file ./crash1/324C016
./crash1/324C016: Microsoft color profile 2.1, type Lino, RGB/XYZ-mntr device, IEC/sRGB model by HP, 3144 bytes, 9-2-1998 6:49:00, relative colorimetric "sRGB IEC61966-2.1"

`

Create diagrams showing how Surfactant works

Create diagrams (one or more?) showing an overview of how Surfactant works — for overview presentations and documentation. Asciinema showing example use of Surfactant could also be useful for the docs.

Implement cross-platform POSIX path normalization function for establishing ELF relationships

The ELF relationships code currently uses os.path.normpath to cleanup and normalize paths, however the behavior of this function is different on Windows than POSIX systems tested (including macOS). This leads to the unit tests for ELF relationships failing on Windows systems.

The description of the Clean function in Go describes a working normalization function for that would be suitable for this, with the exception of a "//" appearing at the start of a path remaining intact. A few examples of this special behavior:

os.path.normpath("/") -> "/"
os.path.normpath("//") -> "//"
os.path.normpath("///") -> "/"
os.path.normpath("////") -> "/"
os.path.normpath("/..") -> "/"
os.path.normpath("/../") -> "/"
os.path.normpath("//../") -> "//"
os.path.normpath("//a//b/c///d") -> "//a/b/c/d"
os.path.normpath("///a//b/c///d") -> "/a/b/c/d"

The Python docs for the os.path.normpath function can be found at https://docs.python.org/3/library/os.path.html#os.path.normpath and they link to the IEEE Standard on pathname resolution

This fix will also make CI tests pass for #7.

Error handling vmlinux files

Surfactant crashes with an IndexError when attempting to process linux kernel boot images (vmlinux) as it incorrectly assumes that they are Windows PE files and then fails when processing the non-existent 'optional' header.

To reproduce:

  • run surfactant generate on a container that has a linux kernel boot image in COFF format

The root cause of this is that filetypeinfo/id_magic.py assumes any file with the magic bytes MZ is a PE file, when MZ really denotes a DOS EXE file, which PE is built off of. The COFF linux boot image format starts with the same MZ, but notably is lacking the standard DOS stub of 'This program cannot be run in DOS mode` and does not have the windows-specific 'optional' header. A more robust means of checking for PE files could also check for that message, or for the optional header located after the DOS stub and standard COFF headers.

This is system-agnostic.

Crash on no StringTable attribute

Describe the bug
When running Surfactant on a particular binary it crashed due to an object not having a StringTable attribute.
The binary was the "uninstall.exe" binary generated by IDA Freeware 8.3.

To Reproduce
Steps to reproduce the behavior:

  1. Run Surfactant on IDA Freeware 8.3 directory
  2. Surfactant crashes upon attempting to analyze "uninstall.exe"

Expected behavior
Surfactant to run without crashing on the specified binary.

Screenshots
Stack trace:

Traceback (most recent call last):
  File "/home/doug/projects/Surfactant/venv/bin/surfactant", line 8, in <module>
    sys.exit(main())
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/doug/projects/Surfactant/surfactant/cmd/generate.py", line 267, in sbom
    get_software_entry(
  File "/home/doug/projects/Surfactant/surfactant/cmd/generate.py", line 43, in get_software_entry
    extracted_info_results = pluginmanager.hook.extract_file_info(
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 433, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 112, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 116, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/home/doug/projects/Surfactant/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 80, in _multicall
    res = hook_impl.function(*args)
  File "/home/doug/projects/Surfactant/surfactant/infoextractors/pe_file.py", line 30, in extract_file_info
    return extract_pe_info(filename)
  File "/home/doug/projects/Surfactant/surfactant/infoextractors/pe_file.py", line 143, in extract_pe_info
    for st in fi_entry.StringTable:
AttributeError: 'Structure' object has no attribute 'StringTable'

System Information (please complete the following information):

  • OS: Windows 10 / Ubuntu 20.04 using WSL2

Additional context
PR to follow.

Support dynamic list of files/folders to analyze

  • Rename config - context?
  • Create dataclass for context entries for files/folders to process
  • Propagate context to extract_file_info hook so analysis plugins can add new files/folders to the list of things to analyze

Container path may be missing path components

When creating an SBOM from pieces of a container, the files may appear to be in a flat file system within the container.

e.g. A.zip has a file under usr/bin/fileA

An SBOM is generated with a config file that lists:
archive = "A.zip"
extractPaths = "some-directory-with-only-usr/bin-files"

The resulting SBOM should have fileA with a container path of <parent_uuid>/usr/bin/fileA, but it actually just has <parent_uuid>/fileA. This could potentially be inferred to some degree from the provided installPrefix? But msi/exe installers get complicated (are they just a bunch of blobs of file data internally?).

Add output writer plugin for Sourcetrail DB format

Add an output writer plugin (that is optional/requires extra install step) that uses https://github.com/CoatiSoftware/Sourcetraildb (or that looks at the SQLite schema that the library writes to) to output relationships in a file that can be loaded by Sourcetrail for visualization. See https://github.com/quarkslab/pyrrha?tab=readme-ov-file#visualization-with-sourcetrail for an idea of how relationships could be mapped to the Sourcetrail format.

Adding support for fuzzy dependency name matching with regex

Extracting library dependency names from binary metadata provides a "base" name to check. Variations of this name are used to find the dependency file, or maybe the name is a symlink that refers to a file whose name is a variation of the base name.

For example, "libname" might end up linking with "libname.so.4" or any version number when actually running it. In this case, one check could be against file names that match libname.so.[0-9]. For name variations across different systems, other matching is needed.

Plugin to get information from Linux Kernel images

As discussed in #28, It would be nice to implement a plugin that gets information from Linux Kernel images. The filetype magic for this is already in place (though it does not detect 'old'/pre-linux 1.3.73 boot images*), so all that is needed is a plugin to extract the relevant data from the header and output that as the appropriate format.

The documentation on the kernel image structure can be found here: https://www.kernel.org/doc/Documentation/x86/boot.txt

There is an overview of the layout in the **** THE REAL-MODE KERNEL HEADER section and then more detailed descriptions of what each field is later.

* Based on what the linux file utility does (https://github.com/file/file/blob/master/magic/Magdir/linux#L137), the check for older kernels could be implemented by checking for the string 'Loading' at offset 0x1e3 or 0x1e9. This is left as an exercise for the reader :)

CycloneDX output writer plugin is missing short name

The CycloneDX output writer plugin doesn't set a short name of cyclonedx.

While here, the short name related code should probably also be checked to see if different types of plugins can have the same short name (e.g. both the reader and writer plugins).

Generate API pages for documentation website

Once functions have docstrings, it should be possible to generate API pages for the documentation website -- this will be useful for users looking to use Surfactant as a library. Adding a page with a brief intro to library/programmatic use could also be useful.

MSI file extraction with accurate install paths

Currently the only accurate way to get install paths for files installed by an MSI file is running it in a VM. 3rd party tools for extracting the files don't preserve accurate install path information (and use compiled languages so may not be portable across all platforms when they use e.g. Windows-only APIs).

We should be able to port functionality from those tools to a pure Python module that can dump installed files with enough information on installation paths for creating an accurate SBOM.

Use logging library instead of prints

Is your feature request related to a problem? Please describe.
Using print doesn't give any control over how much gets printed out, or any indication of how important the information shown is (info/debug/warn/error/etc). Switching to a logging library could allow a user to decide how much information they want Surfactant to show when running, include timestamps, and also enable outputting information to a log file.

Describe the solution you'd like
Look into what Python logging libraries are available, and compare their features. Two that come to mind are logging (which comes with the Python standard library) and loguru, but there are likely several others that could be considered. After picking a library, replace all calls to print with calls to the logging library instead.

Add support for Mach-O files

This may be a bit of a larger task, but should be relatively straightforward. Look into the Mach-O file format, and extract whatever information could be interesting (see the ELF and PE file extraction code for ideas).

macOS/BSD might both benefit from support for Mach-O files.

In addition to the above, other things that could be looked into include:

  • macOS also uses directories ending in .app as a container for programs, which should have some additional metadata files (*.plist perhaps?), in addition to .pkg installers potentially having information of interest
  • BSD distributions might have their own packaging file format that contains metadata that could be extracted

Update CycloneDX library to v6

Version 6 of the CycloneDX Python library introduced a breaking change by removing a function. Update the CycloneDX writer/reader to work with the new version.

Add support for Docker Scout/Dockerfile information

Add support for adding information from Docker Scout output to an SBOM, and potentially also an attempt at parsing Dockerfile/layer information to get additional things that may have been built from source or installed without running a package manager.

Workflow UX enhancements for SBOMs from files in Windows installers

This past summer work was done to explore automatically stepping through Windows installers in a VirtualBox VM and detecting what files were written by the installer: https://github.com/LLNL/Surfactant/tree/main/docs/windows_installer_tutorial

Currently it is a bit difficult to get set up and run, and due to trying to make it fully automatic there are aspects that were difficult to make reliable. Some ideas for improving it, and potentially making it more of an interactive tool are:

  • Make installation of the minifilter driver easier (installer/script?)
  • Add tool/script to simplify fresh VM setup
  • Write a tool (GUI) for manually running an installer in a VM that allows picking the installer to run and handles starting/stopping the minifilter driver; show the exact directories that files are written to in the GUI for users to monitor in real time; flag temporary files that were deleted and give instructions for capturing those ephemeral files (consider an option to re-run installer in the tool but copy flagged files as soon as they are written or intercept the delete file event and copy before deletion/suppress deletion)
  • Export the written file information as a valid Surfactant config file, maybe with all written files zipped up into a single archive

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.