GithubHelp home page GithubHelp logo

iris-gmbh / meta-osselot Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 4.0 187 KB

bitbake layer repository for intergrating osselot into the build process

License: MIT License

BitBake 96.42% Makefile 3.58%
bitbake copyright-analysis license-analysis license-compliance oss-compliance oss-package-analysis yocto yocto-layer bitbake-layer osselot

meta-osselot's Introduction

meta-osselot

This README file contains information on integrating Osselot into your openembedded build process using the meta-osselot layer.

Background: What is Osselot?

In recent years the topic open-source license compliance in commercial software has seen an increase in interest. As a result, an increasing number of industry partners and customers are asking for an "open source clearance document", as a guarantee that the software in question does not violate any open-source (especially copy-left) license terms.

Creating such a clearing document is a multi-step process:

  1. You need to identify which packages are shipped within your product.
  2. You need to identify (curate) the various licenses and copyright notices within those packages. Depending on the requirements, this might have to happen on a file-to-file, rather than a per-package basis.
  3. You need to verify that all license obligations are adhered to.
  4. You need to provide this information to the customer in a suitable format.

If clearance is to be done on a file-to-file basis (a requirement that becomes increasingly predominant), this poses a huge challenge, especially in the (open)embedded context, where entire custom-built firmwares need to be considered.

However, even though the firmwares are custom-built, core components used in the underlying (Linux-based) operating systems are largely identical. Incidentally, these are the packages that require the most work during curation, due to their large codebase (it is an entire operating system after all). Therefore, instead of having every vendor to do the same data curation work for these packages for themselves, the idea of sharing and re-using curated license information within the community is only logical and fits spirit of open source.

Cue to Osselot, a relative new project tackling this challenge. In a nutshell, Osselot is an open-source database of curated license information on various open source projects that is made available as git repository. Additionally, Osselot provides documentation and tooling for re-using curation data on divergence in package version and/or individual files.

Osselot therefore helps covering step No2 (and to an extend No4) of the open source clearance process.

Still, the question remains: How can we easily identify packages relevant for license compliance, make Osselot data available wherever possible, as well as identifying divergences between source code cleared in Osselot and source code used in the openembedded build? This is where this layer, meta-osselot comes into play.

How meta-osselot works

Meta-osselot integrates directly in the bitbake build process. It will identify any target-relevant package and attempt to find the package (in the best-matching version) as JSON SPDX file within the Osselot database. If a suitable package is found, meta-osselot will compare the file checksums for all source code within the "S" build directory against the available curation data. It is worth noting that since meta-osselot uses the Osselot git repository as data source, you can easily replace the upstream repository with your own fork of the Osselot curation database, thus allowing using your own curated data as well.

The results of this comparison, together with other meta-information as well as the available Osselot curation data, will be then provided as build artefacts.

Adding the meta-osselot layer to your build

Run bitbake-layers add-layer meta-osselot to add the core layer to your build.

If you are using kas, the configuration looks as follows:

repos:
  meta-osselot:
    url: "https://github.com/iris-GmbH/meta-osselot.git"
    branch: "<YOCTO_RELEASE>"
...

Enabling and configuring Osselot integration

To enable the Osselot integration, simply add INHERIT += "osselot" to your local.conf file. Now osselot tasks will automatically run for every package or image you build.

Alternatively, if you are only interested in the osselot output and not in building packages add --runonly=populate_osselot to your bitbake command, e.g:

bitbake core-image-minimal --runonly=populate_osselot

Meta-osselot can be configured via bitbake environment variables, either on a global or per recipe basis (or both).

Available configuration options are as follows.

Global configuration

Global configuration is done in your local.conf file or a distro configuration file.

Variable Description Default value
OSSELOT_HASH_ALGORITHM The hash algorithm used when determining equivalence between source code and curation data "md5"
OSSELOT_DEPLOY_DIR The folder in which Osselot artifact data will be deployed "${DEPLOY_DIR}/osselot"
OSSELOT_SRC_URI The bitbake SRC_URI configuration for fetching curation data "git://github.com/Open-Source-Compliance/package-analysis.git;protocol=https;branch=main"
OSSELOT_SRCREV The revision of the curation data to use (default: latest) "${AUTOREV}"
OSSELOT_PV The package version of the curation data "1.0+git${SRCPV}"
OSSELOT_IGNORE_LICENSES Ignore packages with the listed licenses (whitespace separated) "CLOSED"
OSSELOT_IGNORE_SOURCE_GLOBS Globally ignore source code files in S which paths match these globs (whitespace separated) ".pc/**/* patches/series .git/**/*"
OSSELOT_IGNORE_PACKAGE_SUFFIXES Ignore packages ending with one of the specified suffixes (whitespace separated) "${SPECIAL_PKGSUFFIX}"

Per-recipe configuration

Per-recipe configuration is done in either the original *.bb recipe file or by appending to an existing (upstream) recipe using *.bbapped.

Variable Description Default value
OSSELOT_NAME The name of this package within the Osselot database "${BPN}"
OSSELOT_VERSION The version of this package within the Osselot database "${PV}"
OSSELOT_IGNORE Set to "1" to ignore this recipe "0"
OSSELOT_IGNORE_SOURCE_GLOBS Within this recipe, ignore source code files in S where the paths match these globs (whitespace separated) ".pc/**/* patches/series .git/**/*"
OSSELOT_HASH_EQUIVALENCE Set source code hash equivalence of one or more hashes. Equal hashes are colon separated, statements are whitespace separated, e.g. "aaaa:bbbb:cccc dddd:ffff" ""

Using meta-osselot

General recommendations

We recommend overwriting OSSELOT_SRC_URI with your own fork of the package-analysis repository, as this allows you to push and re-use your own curated data. However, remember to keep your fork up-to-date with upstream repository for the latest curation data.

When adding curation data to the git repository, be aware that following assumptions are made by meta-osselot, otherwise your curation data will not be identified:

  1. Available packages are subdirectories within the analysed-packages directory (depth is irrelevant). The package name equals the directory name.
  2. Available packages have one or more versions available as direct subdirectory within the package directory.
  3. These version directories start with the prefix "version-". The package version equals anything after this prefix.
  4. The version directory contains at least one valid SPDX file in JSON format.

We also recommend to upstream your curation data to the Osselot project for an additional layer of quality control, and for the open-source spirit of improving the Osselot database by making your curation data available to others.

Overriding package names and versions for Osselot

There might be false negatives when matching packages against the Osselot data folder structure due to mismatches in name or version formatting between the recipe and the Osselot database.

For example, within the openembedded-core recipe expat the name of the package is "expat" and the version is "2.5.0". In Osselot however, the same package is named "libexpat" and the version is "R_2_5_0".

In these cases OSSELOT_NAME and/or OSSELOT_VERSION need to be overwritten, either within the recipe itself for your own custom layers, or in a matching .bbappend file if the mismatch occurs within an upstream layer. In the latter case, please open an issue in this projects issue tracker, or contribute a patch to add the .bbappend file for the appropriate layer within the bbappend folder of this repository, so that we can fix this for everyone.

Ignoring openembedded packages

By default, meta-osselot will already exclude packages where one the following is true:

  1. The package name contains a non-target suffix, i.e. it will not end up in the target product (see SPECIAL_PKGSUFFIX which is the default value for OSSELOT_IGNORE_PACKAGE_SUFFIXES)
  2. The package does not have any source code (S folder not existent)
  3. The recipe LICENSE is set to "CLOSED" (see OSSELOT_IGNORE_LICENSES variable)
  4. The recipe has OSSELOT_IGNORE set to "1" (see preconfigured bbappends within meta-osselot)

You may globally append to the OSSELOT_IGNORE_LICENSES variable, if you want to exclude recipe with based on other licenses (e.g. your custom defined company license)

Also, you may set OSSELOT_IGNORE = "1" within any recipe *.bb or *.bbappend file, if you wish to ignore this package.

Ignoring source files

Caution

Always be cautious when ignoring source file globs on a global level, you might end up ignoring valid cases!

There are valid use-cases for ignoring source files from a license compliance perspective.

For example, when you are confident that there files do not end up in the final product. Adding these files to the ignore list will help reduce the diff between version mismatches.

Another valid reason are release tarballs. Openembedded recipes sometimes use software release tarballs (e.g. GitHub releases) rather than the corresponding git source code (which is used during curation in Osselot). These release tarballs might contain additional files (e.g. pre-generated configure files) that are not relevant from a license compliance perspective.

Append to the OSSELOT_IGNORE_SOURCE_GLOBS variable if you want to ignore a file (or glob), e.g.: OSSELOT_IGNORE_SOURCE_GLOBS += "tests/**/*".

Defining source code file hash equivalence

There might be cases of source code differences between two versions of the same file, e.g. when applying patches via bitbake recipes. In most of these cases, there are no license compliance relevant changes done to these files. In these cases you can use OSSELOT_HASH_EQUIVALENCE variable to define equivalences between two or more hashes, e.g.:

# first hash equivalence statement
OSSELOT_HASH_EQUIVALENCE += "bcb82bc370eb937e3b310e98d20aa906:d6b29fc355b6ab0f9f4bb9d4e03e9304:cb31a703b96c1ab2f80d164e9676fe7d"
# second hash equivalence statement
OSSELOT_HASH_EQUIVALENCE += "d3b07384d113edec49eaa6238ad5ff00:c157a79031e1c40f85931829bc5fc552"

If an openembedded package source code checksum mismatches the corresponding SPDX checksum entry, meta-osselot will evaluate all available hash equivalence statements. If both the source code checksum as well as the SPDX checksum are available within the same hash equivalence statement, the file will be marked accordingly in the output.

Working with meta-osselot output

After a successful build, all files relevant to meta-osselot will be stored in the ${OSSELOT_DEPLOY_DIR}.

The file at ${OSSELOT_DEPLOY_DIR}/${PN}/${PN}-${PV}-meta.json contains relevant meta- and checksum-information on the package.

Contributing

Please submit any patches against the meta-osselot layer via a GitHub Pull Request.

meta-osselot's People

Contributors

iris-ersc avatar jasper-ben avatar patrickvog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

meta-osselot's Issues

Exception: AttributeError: module 'hashlib' has no attribute 'file_digest'

BB_VERSION = "2.7.2"
BUILD_SYS = "x86_64-linux"
NATIVELSBSTRING = "universal"
TARGET_SYS = "arm-resy-linux-gnueabi"
MACHINE = "multi-v7-ml"
DISTRO = "resy"
DISTRO_VERSION = "4.3.66"
TUNE_FEATURES = "arm armv7a vfp thumb neon callconvention-hard"
TARGET_FPU = "hard"
meta
meta-poky
meta-yocto-bsp = "master:3aff9b01e559529bf1ce2aa9b318a0157092add9"
meta-multi-v7-ml-bsp-master = "master:e1831dcf9a0d9ce998e3a7f0d2f42e3caa3a26a4"
meta-u-boot-wic-bsp-master = "master:850440d2a48150520d7d9b989f93e11d8d4fc562"
meta-resy-master = "master:ceacfc5dd8cd85679def4d942f40e370cb532a02"
meta-oe
meta-networking
meta-filesystems
meta-python = "master:dea8afa45ef5a3226c88b062c13284d68f380e18"
meta-my-yocto-layer
meta-my-yocto-images-layer
workspace = ":"
meta-osselot = "master:8608fffeeb1a2a41dc5283eea0601098a76e00a4"

Checking sstate mirror object availability: 100% |#############################################################################################################################| Time: 0:00:02
Sstate summary: Wanted 222 Local 0 Mirrors 0 Missed 222 Current 176 (0% match, 44% complete)
Removing 3 stale sstate objects for arch x86_64: 100% |########################################################################################################################| Time: 0:00:00
NOTE: Executing Tasks
ERROR: binutils-cross-arm-2.41-r0 do_osselot_create_s_checksums: Error executing a python function in exec_func_python() autogenerated:

The stack trace of python calls that resulted in this exception/failure was:
File: 'exec_func_python() autogenerated', lineno: 2, function:
0001:
*** 0002:do_osselot_create_s_checksums(d)
0003:
File: '/workdir/sources/meta-osselot/classes/osselot.bbclass', lineno: 152, function: do_osselot_create_s_checksums
0148: except:
0149: bb.warn(f"Could not open file {item.path}")
0150: else:
0151: with fb:
*** 0152: digest = hashlib.file_digest(fb, osselot_hash_algorithm).hexdigest()
0153: checksums[os.path.relpath(item.path, s)] = {}
0154: checksums[os.path.relpath(item.path, s)][osselot_hash_algorithm] = digest
0155: write_json(osselot_s_checksums_file, checksums)
0156:}
Exception: AttributeError: module 'hashlib' has no attribute 'file_digest'

ERROR: Logfile of failure stored in: /workdir/build/multi-v7-ml-debug-training-master/tmp/work/x86_64-linux/binutils-cross-arm/2.41/temp/log.do_osselot_create_s_checksums.3833
ERROR: Task (/workdir/sources/poky-training-master/meta/recipes-devtools/binutils/binutils-cross_2.41.bb:do_osselot_create_s_checksums) failed with exit code '1'
NOTE: Tasks Summary: Attempted 237 tasks of which 229 didn't need to be rerun and 1 failed.
NOTE: The errors for this build are stored in /workdir/build/multi-v7-ml-debug-training-master/tmp/log/error-report/error_report_20240208151920.txt
You can send the errors to a reports server by running:
send-error-report /workdir/build/multi-v7-ml-debug-training-master/tmp/log/error-report/error_report_20240208151920.txt [-s server]
NOTE: The contents of these logs will be posted in public if you use the above command with the default server. Please ensure you remove any identifying or proprietary information when prompted before sending.
NOTE: Writing buildhistory
NOTE: Writing buildhistory took: 2 seconds

Summary: 1 task failed:
/workdir/sources/poky-training-master/meta/recipes-devtools/binutils/binutils-cross_2.41.bb:do_osselot_create_s_checksums
Summary: There was 1 ERROR message, returning a non-zero exit code.

Fix bitbake QA issue

Investigate and fix:

WARNING: /home/jasper/playground/test-osselot/demo/kas/build/../poky/meta/recipes-core/meta/cve-update-db-native.bb: QA Issue: cve-update-db-native: native/nativesdk class is not inherited last, this can result in unexpected behaviour. Cl
asses inherited after native/nativesdk: cve-check.bbclass [native-last]

Optimize IO heavy checksum tasks

Checksum tasks currently do a lot of read and write operations on disk, due to the way checksum information is stored (each file checksum is individually written and read from disk. Use a per-package JSON files for data exchange instead.

fix Python bbappends

Currently, Python bbappends will falsely apply to any Python module, not just the python main package

Wrong source list & race condition in do_osselot_create_s_checksums when S == WORKDIR

Some layer "internal" recipes (e.g. https://git.openembedded.org/openembedded-core/tree/meta/recipes-core/initscripts/initscripts_1.0.bb?h=kirkstone) only add source code from within the layer repository. In these cases they set S = ${WORKDIR}. This causes do_osselot_create_s_checksums to create checksums for all of the WORKDIR, including the checksums output itself. This can also cause race conditions, when folders are deleted during the do_osselot_create_s_checksums process.

To fix this, do_osselot_create_s_checksums should only consider top level files as source files if S is set to ${WORKDIR}

meta-osselot only works with single-project recipes

Currently, meta-osselot tries to map a recipe to a single curated project. This means, that recipes that rely on more than one project will inevitably end up with un-curated data, even if both projects by themselves are available within the Osselot database.

I am not sure how much of a need there is for this use-case.

If this is something we want to support, we would probably have to add the ability to specify more than one OSSELOT_NAME and OSSELOT_VERSION for each recipe.

Evaluate options to identify build-relevant files

Currently, meta-osselot treats all source files as license compliance relevant by default. However, in reality only a subset of files will end up in the target binary. Automatically identifying these would drastically reduce the amount of work when curating a version diff.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.