nexb / license-expression Goto Github PK

Utility library to parse, normalize and compare License expressions for Python using a boolean logic engine. For expressions using SPDX or any other license id scheme.

Home Page: http://aboutcode.org

License: Other

Batchfile 2.02% Python 95.00% Shell 2.51% Makefile 0.46%

licensing boolean-expression license-expression spdx spdx-license python

license-expression's Introduction

license-expression

license-expression is a comprehensive utility library to parse, compare, simplify and normalize license expressions (such as SPDX license expressions) using boolean logic.

License: Apache-2.0
Python: 3.8+
Homepage: https://github.com/nexB/license-expression/
Install: pip install license-expression also available in most Linux distro.

Software project licenses are often a combination of several free and open source software licenses. License expressions -- as specified by SPDX -- provide a concise and human readable way to express these licenses without having to read long license texts, while still being machine-readable.

License expressions are used by key FOSS projects such as Linux; several packages ecosystem use them to document package licensing metadata such as npm and Rubygems; they are important when exchanging software data (such as with SPDX and SBOM in general) as a way to express licensing precisely.

license-expression is a comprehensive utility library to parse, compare, simplify and normalize these license expressions (such as SPDX license expressions) using boolean logic like in: GPL-2.0-or-later WITH Classpath-exception-2.0 AND MIT.

It includes the license keys from SPDX https://spdx.org/licenses/ (version 3.23) and ScanCode license DB (version 32.0.8, last published on 2023-02-27). See https://scancode-licensedb.aboutcode.org/ to get started quickly.

license-expression is both powerful and simple to use and is a used as the license expression engine in several projects and products such as:

AboutCode-toolkit https://github.com/nexB/aboutcode-toolkit
AlekSIS (School Information System) https://edugit.org/AlekSIS/official/AlekSIS-Core
Barista https://github.com/Optum/barista
Conda forge tools https://github.com/conda-forge/conda-smithy
DejaCode https://dejacode.com
DeltaCode https://github.com/nexB/deltacode
FenixscanX https://github.com/SmartsYoung/FenixscanX
FetchCode https://github.com/nexB/fetchcode
Flict https://github.com/vinland-technology/flict and https://github.com/vinland-technology
license.sh https://github.com/webscopeio/license.sh
liferay_inbound_checker https://github.com/carmenbianca/liferay_inbound_checker
REUSE https://reuse.software/ and https://github.com/fsfe/reuse-tool
ScanCode-io https://github.com/nexB/scancode.io
ScanCode-toolkit https://github.com/nexB/scancode-toolkit

license-expression is also packaged for most Linux distributions. See below.

Alternative:

There is no known alternative library for Python, but there are several similar libraries in other languages (but not as powerful of course!):

JavaScript https://github.com/jslicense/spdx-expression-parse.js
Rust https://github.com/ehuss/license-exprs
Haskell https://github.com/phadej/spdx
Go https://github.com/kyoh86/go-spdx
Ada https://github.com/Fabien-Chouteau/spdx_ada
Java https://github.com/spdx/tools and https://github.com/aschet/spdx-license-expression-tools

Build and tests status

Linux & macOS (Travis)	Windows (AppVeyor)	Linux, Windows & macOS (Azure)

Source code and download

Also available in several Linux distros:

Support

Submit bugs and questions at: https://github.com/nexB/license-expression/issues
Join the chat at: https://gitter.im/aboutcode-org/discuss

Description

This module defines a mini language to parse, validate, simplify, normalize and compare license expressions using a boolean logic engine.

This supports SPDX license expressions and also accepts other license naming conventions and license identifiers aliases to resolve and normalize any license expressions.

Using boolean logic, license expressions can be tested for equality, containment, equivalence and can be normalized or simplified.

It also bundles the SPDX License list (3.20 as of now) and the ScanCode license DB (based on latest ScanCode) to easily parse and validate expressions using the license symbols.

Usage examples

The main entry point is the Licensing object that you can use to parse, validate, compare, simplify and normalize license expressions.

Create an SPDX Licensing and parse expressions:

>>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> expression = ' GPL-2.0 or LGPL-2.1 and mit '
>>> parsed = licensing.parse(expression)
>>> print(parsed.pretty())
OR(
  LicenseSymbol('GPL-2.0-only'),
  AND(
    LicenseSymbol('LGPL-2.1-only'),
    LicenseSymbol('MIT')
  )
)

>>> str(parsed)
'GPL-2.0-only OR (LGPL-2.1-only AND MIT)'

>>> licensing.parse('unknwon with foo', validate=True, strict=True)
license_expression.ExpressionParseError: A plain license symbol cannot be used
as an exception in a "WITH symbol" statement. for token: "foo" at position: 13

>>> licensing.parse('unknwon with foo', validate=True)
license_expression.ExpressionError: Unknown license key(s): unknwon, foo

>>> licensing.validate('foo and MIT and GPL-2.0+')
ExpressionInfo(
    original_expression='foo and MIT and GPL-2.0+',
    normalized_expression=None,
    errors=['Unknown license key(s): foo'],
    invalid_symbols=['foo']
)

Create a simple Licensing and parse expressions:

>>> from license_expression import Licensing, LicenseSymbol
>>> licensing = Licensing()
>>> expression = ' GPL-2.0 or LGPL-2.1 and mit '
>>> parsed = licensing.parse(expression)
>>> expression = ' GPL-2.0 or LGPL-2.1 and mit '
>>> expected = 'GPL-2.0-only OR (LGPL-2.1-only AND mit)'
>>> assert str(parsed) == expected
>>> assert parsed.render('{symbol.key}') == expected

Create a Licensing with your own license symbols:

>>> expected = [
...   LicenseSymbol('GPL-2.0'),
...   LicenseSymbol('LGPL-2.1'),
...   LicenseSymbol('mit')
... ]
>>> assert licensing.license_symbols(expression) == expected
>>> assert licensing.license_symbols(parsed) == expected

>>> symbols = ['GPL-2.0+', 'Classpath', 'BSD']
>>> licensing = Licensing(symbols)
>>> expression = 'GPL-2.0+ with Classpath or (bsd)'
>>> parsed = licensing.parse(expression)
>>> expected = 'GPL-2.0+ WITH Classpath OR BSD'
>>> assert parsed.render('{symbol.key}') == expected

>>> expected = [
...   LicenseSymbol('GPL-2.0+'),
...   LicenseSymbol('Classpath'),
...   LicenseSymbol('BSD')
... ]
>>> assert licensing.license_symbols(parsed) == expected
>>> assert licensing.license_symbols(expression) == expected

And expression can be deduplicated, to remove duplicate license subexpressions without changing the order and without consider license choices as simplifiable:

>>> expression2 = ' GPL-2.0 or (mit and LGPL 2.1) or bsd Or GPL-2.0  or (mit and LGPL 2.1)'
>>> parsed2 = licensing.parse(expression2)
>>> str(parsed2)
'GPL-2.0 OR (mit AND LGPL 2.1) OR BSD OR GPL-2.0 OR (mit AND LGPL 2.1)'
>>> assert str(parsed2.simplify()) == 'BSD OR GPL-2.0 OR (LGPL 2.1 AND mit)'

Expression can be simplified, treating them as boolean expressions:

>>> expression2 = ' GPL-2.0 or (mit and LGPL 2.1) or bsd Or GPL-2.0  or (mit and LGPL 2.1)'
>>> parsed2 = licensing.parse(expression2)
>>> str(parsed2)
'GPL-2.0 OR (mit AND LGPL 2.1) OR BSD OR GPL-2.0 OR (mit AND LGPL 2.1)'
>>> assert str(parsed2.simplify()) == 'BSD OR GPL-2.0 OR (LGPL 2.1 AND mit)'

Two expressions can be compared for equivalence and containment:

>>> expr1 = licensing.parse(' GPL-2.0 or (LGPL 2.1 and mit) ') >>> expr2 = licensing.parse(' (mit and LGPL 2.1) or GPL-2.0 ') >>> licensing.is_equivalent(expr1, expr2) True >>> licensing.is_equivalent(' GPL-2.0 or (LGPL 2.1 and mit) ', ... ' (mit and LGPL 2.1) or GPL-2.0 ') True >>> expr1.simplify() == expr2.simplify() True >>> expr3 = licensing.parse(' GPL-2.0 or mit or LGPL 2.1') >>> licensing.is_equivalent(expr2, expr3) False >>> expr4 = licensing.parse('mit and LGPL 2.1') >>> expr4.simplify() in expr2.simplify() True >>> licensing.contains(expr2, expr4) True

Development

Checkout a clone from https://github.com/nexB/license-expression.git
Then run ./configure --dev and then source tmp/bin/activate on Linux and POSIX. This will install all dependencies in a local virtualenv, including development deps.
On Windows run configure.bat --dev and then Scripts\bin\activate instead.
To run the tests, run pytest -vvs

license-expression's People

Contributors

Stargazers

Watchers

Forkers

pombredanne saravananoffl yash-nisar chetanya-shrimali ndip007 techytushar carmenbianca pkolbus xavierfigueroav mxmehl hesa stephanlachnit natureshadow vargenau razerm brandonrwin ayansinhamahapatra felixonmars wolfi-chainguard-demo

license-expression's Issues

There should be a way to remove particular licenses or license expression from license expressions

This would be useful in the case where we want to remove an invalid license key from a license expression without converting that license expression to a string, then removing the license key from the string, then re-parsing the resulting license expression string back into a license expression object.

Fix README reST rendering

@all3fox you broke the display of the README in https://github.com/nexB/license-expression/blob/9f16fa719a1a00e1028eb336977ba01bb4d7a5d6/README.rst
You should pip install restview and check what is not correct.

Advertise installation via AUR

On Arch Linux and its derivates, users can also install license-expression using this AUR. It would be nice if this was reflected in the README and other docu :)

I had to package it there because its a dependency for reuse.

If you have any feedback about the AUR package, please let me know!

Update README

https://github.com/nexB/license-expression#development

I think this is not sync with the current configure script as we are now using configure --dev to have the testing utilities installed and the binaries lie under tmp/bin/ instead of bin (the binaries location seem different between configure and configure.bat)

Add function to combine expressions

See https://github.com/nexB/scancode-toolkit/blob/483e47a69e24a0b9ac08f0861ce63dbb7be457f9/src/packagedcode/utils.py#L153

Support complex with expressions

The new EPL 2.0 may need support in the future for something like EPL-2.0 with (Classpath-exception-2.0 and Assembly-exception)
We may need to update then the support of OR/AND on the WITH side.

Improve support for arbitrary license names

The current implementation has an arbitrary limitation that an expression with and, or or with keywords that are part of a license name and not part a keyword of the expression would not parse nor resolve correctly.

Now this limitation does not need to exist as when a list of LicenseRef is provided that is otherwise not ambiguous (e.g. with no name resolving to more than one license key) the parsing is not ambiguous.

So I propose to update this to handle properly expressions with such licenses. This would work only when resolution on parsing is requested or when resolution is requested AND that license references are provided.

mit AND (mit OR bsd-new) incorrectly simplifies to mit

>>> from license_expression import Licensing
>>> Licensing().parse('mit AND (mit OR bsd-new)').simplify()
LicenseSymbol('mit', is_exception=False)

From the above output, one can see that mit AND (mit OR bsd-new) simplifies down to just mit. If this were just boolean logical elements, this result would be correct. However, when its a license-expression, we are loosing data (this case, the bsd-new key).

In a perfect world, I would like the result to be simply reduce to mit OR bsd-new.

SPDX Failing to parse license for no obvious reason

Hi license-expression. I must begin that this is a great piece of software, and I'm grateful for your contributions.

I noticed a strange edge case when using the spdx license parser. The parser raises an exception when I try to parse Sleepycat License but is fine with Sleepydog License or even Sleepyca License.

Reproducible example:

SPDX_LICENSING = license_expression.get_spdx_licensing()

# ExpressionParseError: Invalid symbols sequence such as (A B) for token: "License" at position: 10
_ = SPDX_LICENSING.parse('Sleepycat License')

# Works
_ = SPDX_LICENSING.parse('Sleepydog License')
_ = SPDX_LICENSING.parse('Sleepyca License')

Relevant versions installed via conda.

python                    3.11.4          h47c9636_0_cpython    conda-forge
license-expression        30.1.1             pyhd8ed1ab_0    conda-forge

Thanks in advance!

Update ABOUT files to latest SPEC

See https://github.com/nexB/aboutcode-toolkit/blob/develop/SPEC

about-code inventory license-expression/ inventory.cvs
Running attributecode version 3.0.0.dev5
Collecting inventory from: license-expression-master and writing output to: inventory.cvs

INFO: license-expression.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: license-expression.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: etc/scripts/irc-notify.py.ABOUT: Field dje_license_key is not a supported field and is ignored.
INFO: etc/scripts/irc-notify.py.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: etc/scripts/irc-notify.py.ABOUT: Field notice_text is not a supported field and is ignored.
CRITICAL: etc/scripts/irc-notify.py.ABOUT: Field about_resource is required
INFO: src/license_expression/_pyahocorasick.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/certifi.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/certifi.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/pip.ABOUT: Field author_file is not a supported field and is ignored.
INFO: thirdparty/base/pip.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/pip.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/setuptools.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/setuptools.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/six.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/six.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/virtualenv.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/virtualenv.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/virtualenv.py.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/virtualenv.py.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/wheel.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/wheel.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/base/wincertstore.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/base/wincertstore.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/dev/apipkg.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/dev/apipkg.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/dev/colorama.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/dev/colorama.ABOUT: Field keywords is not a supported field and is ignored.
INFO: thirdparty/dev/colorama.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/dev/pluggy.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/dev/pluggy.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/dev/py.ABOUT: Field contatct is not a supported field and is ignored.
INFO: thirdparty/dev/py.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/dev/py.ABOUT: Field license_text_file is not a supported field and is ignored.
CRITICAL: thirdparty/dev/pytest.ABOUT: Cannot load invalid ABOUT file: '/Users/tomd/Downloads/license-expression-master/thirdparty/dev/pytest.ABOUT': ScannerError(None, None, 'mapping values are not allowed here', <yaml.error.Mark object at 0x103b48438>)
mapping values are not allowed here
  in "<unicode string>", line 6, column 20:
    description: pytest: simple powerful testing with P ...
                       ^
INFO: thirdparty/dev/tox.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/dev/tox.ABOUT: Field license_text_file is not a supported field and is ignored.
INFO: thirdparty/prod/boolean.py.ABOUT: Field dje_license is not a supported field and is ignored.
INFO: thirdparty/prod/boolean.py.ABOUT: Field license_text_file is not a supported field and is ignored.

Improve readability of WITH expressions

Expressions such as "Foo WITH Bar OR Baz" are not super readable because the operator precedence is not something super intuitive.
It would be great to have the option to render an expression this way: "(Foo WITH Bar) OR Baz"

Issue with `is_equivalent` using different `Licensing` instances

While this first code works fine:

from license_expression import Licensing

expression = 'gpl-2.0 AND zlib'
licensing = Licensing()
parsed1 = licensing.parse(expression)
parsed2 = licensing.parse(expression)
assert Licensing().is_equivalent(parsed1, parsed2)

This second fails:

expression = 'gpl-2.0 AND zlib'
parsed1 = Licensing().parse(expression)
parsed2 = Licensing().parse(expression)
assert Licensing().is_equivalent(parsed1, parsed2)

It seems to be related to the operator since this last one works as well:

expression = 'gpl-2.0'
parsed1 = Licensing().parse(expression)
parsed2 = Licensing().parse(expression)
assert Licensing().is_equivalent(parsed1, parsed2)

Accept plain strings list as Licensing "symbols"

The latest code in #6 changes the API and drops the ugly "LicenseRef" object in favor of plain license symbols objects. We should also accept a lists of plain strings for license keys and be smart in this case to avoid raising exceptions for "with" expression since this plain list of strings could only be transformed in plain LicenseSymbol and could not know which ones could be ExceptionSymbol . This would help deal more simply with the simple cases when you just have a list of license ids and only need proper parsing and may not want full validation.

Track which version of the SPDX license list is used

Update SPDX license list version

The license-expression README says that the current SPDX license list being used in the library is 3.13 but the latest SPDX license list is Version: 3.17 (as of 2022-05-08).

This library should be updated to use the current SPDX license list.

The order of the expression shouldn't be changed

>>> from license_expression import Licensing, LicenseSymbol
>>> exp = 'mit or apache-2.0 or public-domain'
>>> licensing = Licensing().parse(exp)
>>> licensing.simplify().render()
'apache-2.0 OR mit OR public-domain'
>>>

Most of the time, the left most license key in the license expression is considered as the primary license (or primary license choice if both fall into the same license category). However, the simplify() break the order.

In my opinion, the simplify should not modify/break the order of a license_expression.

Use `tox` to run unit-tests

Also, bundle tox and its requirements as a development dependency.

Please provide a proper documentation

Currently the docs folder only contains a placeholder documentation. It would be nice if license-expression would at least have an automatically created API documentation via sphinx.ext.autodoc.

Problem with exception symbols when using `get_spdx_licensing().validate()`

Hi, I am trying to validate a given LicenseExpression using get_spdx_licensing().validate(). This is very helpful in providing a list of unknown symbols not on the SPDX License and Exception Lists. I encountered the problem, though, that exception symbols are also compared against the SPDX License List and licenses against the Exception list:

Example:

licensing = get_spdx_licensing()
le = licensing.parse("389-exception with MIT")
get_spdx_licensing().validate(le)

yields:

ExpressionInfo(
    original_expression='389-exception WITH MIT',
    normalized_expression='389-exception WITH MIT',
    errors=[],
    invalid_symbols=[]
)

As 389-exception is not a license and MIT not an exception, I would expect an error here. Furthermore, I would find it helpful if there were two separate lists for invalid_symbols: For example, one invalid_license_symbols and one invalid_exception_symbols.

ParseError when symbol contained in exception string that is not in the Licensing

>>> from license_expression import Licensing
>>> l = Licensing(['lgpl-3.0-plus'])
>>> license_expression = 'lgpl-3.0-plus WITH openssl-exception-lgpl-3.0-plus'
>>> l.parse(license_expression)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lib/python3.6/site-packages/license_expression/__init__.py", line 386, in parse
    expression = super(Licensing, self).parse(tokens)
  File "lib/python3.6/site-packages/boolean/boolean.py", line 216, in parse
    raise ParseError(token, tokstr, position, PARSE_INVALID_SYMBOL_SEQUENCE)
boolean.boolean.ParseError: Invalid symbols sequence such as (A B) for token: "lgpl-3.0-plus" at position: 37

>>> l = Licensing(['lgpl-3.0-plus'])
>>> license_expression = 'lgpl-3.0-plus AND openssl-exception-lgpl-3.0-plus'
>>> l.parse(license_expression)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lib/python3.6/site-packages/license_expression/__init__.py", line 386, in parse
    expression = super(Licensing, self).parse(tokens)
  File "lib/python3.6/site-packages/boolean/boolean.py", line 216, in parse
    raise ParseError(token, tokstr, position, PARSE_INVALID_SYMBOL_SEQUENCE)
boolean.boolean.ParseError: Invalid symbols sequence such as (A B) for token: "lgpl-3.0-plus" at position: 36

>>> l = Licensing(['lgpl-3.0-plus', 'openssl-exception-lgpl-3.0-plus'])
>>> l.parse(license_expression)
LicenseWithExceptionSymbol(license_symbol=LicenseSymbol('lgpl-3.0-plus', is_exception=False), exception_symbol=LicenseSymbol('openssl-exception-lgpl-3.0-plus', is_exception=False))

>>> l = Licensing(['lgpl-3.0-plus'])
>>> license_expression = 'lgpl-3.0-plus WITH openssl-exception-lgpl-2.0-plus'
>>> l.parse(license_expression)
LicenseWithExceptionSymbol(license_symbol=LicenseSymbol('lgpl-3.0-plus', is_exception=False), exception_symbol=LicenseSymbol('openssl-exception-lgpl-2.0-plus', is_exception=False))

Show whether or not a license is deprecated in index.json

https://scancode-licensedb.aboutcode.org/index.json returns all licenses, including deprecated licenses. It would be useful to expose the is_deprecated attribute of the license, like what was done with the SPDX keys and exceptions in #56

Add a changelog

Would it be possible to introduce a changelog file or a verbose description for the Git tags/releases that lists new features and especially breaking changes? Ideally, it would follow the Keep a Changelog spec.

I am maintainer of this project's AUR (Arch Linux) package and of the REUSE tool's package which depends on license-expression. It's hard for me to tell whether a new version of license-expression changes a function that tools depending on it rely on.

refine setuptools licensing documentation details

See #14 (comment)

Rename master to main branch

Do not simplify OR

(mit OR gpl) and mit should not be simplified as mit
but instead stay as is.

Validate "or later" licenses

The master branch implementation treats "or later" licenses as separate keys with an eventual aliases.

The alternate-or-later-handling branch implementation treats "or later" as keywords and not as separate license keys.
If this later implementation ends up a winner, if would make sense to add validation to license symbols to check if a license supports an "or later" version or not to avoid stupid things like "MIT or later"

Error thrown when Invalid license key character provided

Tern uses license-expression to validate SPDX licenses. When an invalid license key is provided (i.e. contains invalid characters like / or ,), license-expression throws an error when it should handle it.

>>> import license_expression
>>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> license_data = "MIT/X11"
>>> licensing.validate(license_data).errors == []

Traceback (most recent call last):
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 777, in validate
    parsed_expression = self.parse(expression, strict=strict)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 539, in parse
    tokens = list(self.tokenize(
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 603, in tokenize
    for token in tokens:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 996, in replace_with_subexpression_by_license_symbol
    for token_group in token_groups:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 935, in build_token_groups_for_with_subexpression
    tokens = list(tokens)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 597, in <genexpr>
    tokens = (t for t in tokens if t.string and t.string.strip())
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 921, in build_symbols_from_unknown_tokens
    for symtok in build_token_with_symbol():
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 901, in build_token_with_symbol
    toksym = LicenseSymbol(string)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 1213, in __init__
    raise ExpressionError(
license_expression.ExpressionError: Invalid license key: the valid characters are: letters and numbers, underscore, dot, colon or hyphen signs and spaces: 'MIT/X11'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 780, in validate
    expression_info.invalid_symbols.append(e.token_string)
AttributeError: 'ExpressionError' object has no attribute 'token_string'
>>> license_data = "MIT,X11"
>>> licensing.validate(license_data).errors == []
Traceback (most recent call last):
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 777, in validate
    parsed_expression = self.parse(expression, strict=strict)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 539, in parse
    tokens = list(self.tokenize(
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 603, in tokenize
    for token in tokens:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 996, in replace_with_subexpression_by_license_symbol
    for token_group in token_groups:
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 935, in build_token_groups_for_with_subexpression
    tokens = list(tokens)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 597, in <genexpr>
    tokens = (t for t in tokens if t.string and t.string.strip())
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 921, in build_symbols_from_unknown_tokens
    for symtok in build_token_with_symbol():
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 901, in build_token_with_symbol
    toksym = LicenseSymbol(string)
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 1213, in __init__
    raise ExpressionError(
license_expression.ExpressionError: Invalid license key: the valid characters are: letters and numbers, underscore, dot, colon or hyphen signs and spaces: 'MIT,X11'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rose/ternenv/lib/python3.10/site-packages/license_expression/__init__.py", line 780, in validate
    expression_info.invalid_symbols.append(e.token_string)
AttributeError: 'ExpressionError' object has no attribute 'token_string'

When a valid license key is provided (i.e. no unexpected characters), the library returns as expected:

>>> license_data = "MIT-X11"
>>> licensing.validate(license_data).errors == []
False

I would expect the library to handle unexpected characters and mark expressions with unexpected characters as an invalid license.

Failing to validate GPL licenses

licensing.validate() has 'Unknown license key(s)' error for GPL licenses, e.g. 'LGPLv2.1', 'GPLv2', 'GPL2'.

Side note: Also some images have several licenses, e.g. MIT, GPL2 and others.
When they are listed as 'MIT GPL2' for example, it's okay, validation just fails with errors ('Unknown license key(s)').
But when they are listed with commas instead - 'MIT,GPL2' it throws an exception for invalid characters.
In some cases, f.e. photon:3.0 the licenses come in this form.
The latter can be easily resolved, but I just wonder if it would be better those use-cases to be
handled within the validate method instead?

Licensing.parse() raises too many exceptions

When I do:

try:
    Licensing().parse("MIT AND OR 0BSD")
except ExpressionError:
    # Handle this
    pass

The exception is not handled because a ParseError was raised instead. This isn't unexpected per se, because the docstring says as much, but I cannot see the difference between the two errors as a consumer of the library. I've rewritten my code to put except (ExpressionError, ParseError): everywhere, but it seems a little unnecessary to me.

Is it possible to make ExpressionError a subclass of ParseError? Or is it possible to consistently raise ExpressionErrors instead?

Adopt calver for versions

Since we are soon to include bundle licenses data for easy bootstrapping using calver for versions makes sense.

When simplifying, do not sort systematically

When I simplify an expression such as gpl and apache and mit and apache the results is always sorted: apache AND gpl AND mit
I would like to have the option to NOT sort and get instead gpl and apache and mit

LLGPL appears to be treated as a plain license, not as a license exception

Hi! 👋

As background: I have added this project as backend for namcap, a validation tool for packages and build scripts, that is used on Arch Linux. From what I can tell after integrating is, that it works pretty well for our use-case and helps us a great deal in being more compliant with SPDX license identifiers (see https://rfc.archlinux.page/0016-spdx-license-identifiers/). Thanks for that! 🎉

However, there appear to be edge cases and maybe you are able to help me in figuring this particular one out.

When trying to package an upstream that uses the LLGPL preamble my assumption would be, after reading

Exceptions are added to a license using the License Expression operator, "WITH".

on https://spdx.org/licenses/exceptions-index.html that an expression such as LGPL-2.1-only WITH LLGPL would be valid. This does not seem to be the case though:

Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> licensing.parse("LGPL-2.1-or-later WITH LLGPL", strict=True)
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 540, in parse
    tokens = list(self.tokenize(
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 604, in tokenize
    for token in tokens:
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 1090, in replace_with_subexpression_by_license_symbol
    raise ParseError(
boolean.boolean.ParseError: A plain license symbol cannot be used as an exception in a "WITH symbol" statement. for token: "LLGPL" at position: 23

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 548, in parse
    raise ExpressionParseError(
license_expression.ExpressionParseError: A plain license symbol cannot be used as an exception in a "WITH symbol" statement. for token: "LLGPL" at position: 23
>>> licensing.parse("LGPL-2.1-or-later WITH LLGPL")
LicenseWithExceptionSymbol(license_symbol=LicenseSymbol('LGPL-2.1-or-later', aliases=('LGPL-2.1+',), is_exception=False), exception_symbol=LicenseSymbol('LLGPL', aliases=('LicenseRef-scancode-llgpl',), is_exception=False))
>>>

When looking at above output, it becomes clear, that the expression fails parsing when applying strict rules, because LLGPL is treated as a "plain license". This is evidenced by it being represented by a LicenseSymbol where is_exception=False when not applying strict rules during parsing.

Relatedly, we apply strict parsing by default, because we want to have SPDX compliant expressions and we want to have them to be correctly distinguished between plain licenses and exceptions for packaging reasons. In the case of LLGPL this does not seem to work correctly and namcap fails parsing the license expression.

Furthermore (and to make things somewhat more complicated, but also more specific and useful for us as a distribution), when it comes to packaging, on Arch Linux we rely on a system-wide package (see https://gitlab.archlinux.org/archlinux/packaging/packages/licenses) to provide "common" license and exception files in well-known locations. Those "common" license files are of licenses and exceptions that are frequently used verbatim and do not contain or require individually identifying information (e.g. specific list of authors) or an ever-changing date identifier. We also provide full lists of all known license and exception identifiers separately. The package allows us to centrally share common license files and not repackage them in every package.
Namcap in turn relies on this information to identify and correlate known identifiers and the ones that are common (and thus do not need to be packaged).

This unfortunately does not work with LLGPL though, since license-expression treats it as a plain license, not an exception identifier. As such namcap fails when adding it in a WITH expression (e.g. LGPL-2.1-or-later WITH LLGPL) and would if provided plainly (e.g. LLGPL) require the user to prefix it with LicenseRef-, because LLGPL is not found in the list of "known" licenses (as it is in the list of "known" license exceptions).
From my understanding, the expression LGPL-2.1-or-later WITH LLGPL should be valid though and LLGPL should be treated as a license exception, not a plain license.

This leads me to the question: Is there a specific reason why LLGPL is treated as a plain license and not as a license exception?

Add Python 3.6 to Travis and Appveyor tests

Typo in boolean.py.ABOUT download_url

https://github.com/nexB/license-expression/blob/master/thirdparty/prod/boolean.py.ABOUT#L3

Yet another reason for an integration tool as described in https://github.com/nexB/attributecode/issues/281

Deprecated SPDX licenses are marked as unknown

The documentation for the SPDX License List states:

"When a license identifier is "deprecated" on the SPDX License List, it effectively means that there is an updated license identifier and the deprecated license identifier, while remaining valid, should no longer be used. "

"wxWindows" is one of the deprecated license but it's marked as an unknown license key when trying to validate it with the code below.

   expression = get_spdx_licensing().parse("wxWindows")
   print(get_spdx_licensing().validate(expression))

Other deprecated identifiers containing an exception like "GPL-2.0-with-autoconf-exception" are also marked as unknown. I am not sure if this has to do with #82 but I think these should also be accepted.

Possible simplify error

This is likely a bug, possibly fixed with dedup instead of simplify

>>> from license_expression import *
>>> l=Licensing()
>>> l.parse('(mit OR gpl-2.0) AND mit AND bsd-new')
AND(OR(LicenseSymbol(u'mit', is_exception=False), LicenseSymbol(u'gpl-2.0', is_exception=False)), LicenseSymbol(u'mit', is_exception=False), LicenseSymbol(u'bsd-new', is
_exception=False))
>>> x=l.parse('(mit OR gpl-2.0) AND mit AND bsd-new')
>>> x.simplify()
AND(LicenseSymbol(u'bsd-new', is_exception=False), LicenseSymbol(u'mit', is_exception=False))
>>> print x.simplify()
bsd-new AND mit

`AND` statements not flattened in dedup()

For example licensing.dedup() should have simplified the following expression:

(gpl AND mit) AND mit AND (gpl OR mit) -> gpl AND mit AND (gpl OR mit)

But currently it does not flatten AND statements in parenthesis, but it should.

This can be done either by:

licensing.dedup(str(expression.flatten())) (?)
By replacing OR statements by one symbol, simplifying and then getting the OR statements back by using substitution tables.

GNU-All-permissive-Copying-License

is here where we need fill in the License tag ?

Using detector: trivy
Traceback (most recent call last):
File "/usr/bin/go_vendor_license", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/lib/python3.11/site-packages/go_vendor_tools/cli/go_vendor_license.py", line 607, in main
install_command(args)
File "/usr/lib/python3.11/site-packages/go_vendor_tools/cli/go_vendor_license.py", line 529, in install_command
license_data: LicenseData = detector.detect(directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/go_vendor_tools/license_detection/trivy.py", line 139, in detect
return TrivyLicenseData(
^^^^^^^^^^^^^^^^^
File "<string>", line 9, in __init__
File "/usr/lib/python3.11/site-packages/go_vendor_tools/license_detection/base.py", line 149, in __post_init__
combine_licenses(*self.license_set) if self.license_map else None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/go_vendor_tools/licensing.py", line 28, in combine_licenses
return simplify_license(
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/go_vendor_tools/licensing.py", line 53, in simplify_license
parsed = licensing.parse(str(expression), validate=validate, strict=strict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 560, in parse
self.validate_license_keys(expression)
File "/usr/lib/python3.11/site-packages/license_expression/__init__.py", line 467, in validate_license_keys
raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): GNU-All-permissive-Copying-License
error: Bad exit status from /var/tmp/rpm-tmp.fs55Ke (%install)
Please fill in the License tag!
Bad exit status from /var/tmp/rpm-tmp.fs55Ke (%install)

Are DocumentRef licenses supported?

The SPDX specifications allow this syntax:
license-ref = ["DocumentRef-"1*(idstring)":"]"LicenseRef-"1*(idstring)
However if I try to parse a license of DocumentRef-James:LicenseRef-Dean I get expections.
Am I doing something wrong or are these license-ref constructs not supported?

Add replace method to LicenseExpression

I would like to replace a expression and by another expression in a given expression.
The semantics should be the same as the string.replace stdlib function eg:

string.replace(s, old, new[, maxreplace])

    Return a copy of string s with all occurrences of substring old replaced by new. 
    If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.

e.g. the function should apply to all exact sub expressions matching old in the expression tree and should not be recursive like in string.replace:

>>> 'foobarbar'.replace('foobar', 'foo')
'foobar'

The maxreplace is not needed.

Consider azure pipelines for CI

Travis seems to be a lost cause on macOS for Python (it is not supported). @altendky recommended to try Azure pipelines that support WIndows, Linux and macOS.
This is following issues with #35

match license by legalcode url

on some public data providers licenses are given as urls to the legal code... eg. this entry is referring to http://creativecommons.org/licenses/by-nc/4.0/legalcode.

Is there a simple way to use license-expression to match these?

Consider a way to keep track of "primary" licenses

Say I start from these expressions:

primary: bsd-new
initial: bsd-new AND bsd-simplified AND mit AND mit AND bsd-new AND gpl-2.0

I would like a way to end with this combining the two expressions above with AND

transformed: bsd-new AND (bsd-simplified AND gpl-2.0 AND mit)

Deploy an online tool

This issue will capture progress on making license-expression and boolean.py into online tools.

Transpiling tools that did not quite work:

RapydScript https://github.com/atsepkov/RapydScript
Lacks support for importing things (like, import __future__), its list of importable modules is here
Batavia https://github.com/pybee/batavia
The idea here is to compile python to its bytecode and then run it in a javascript vm. Because python bytecode changes from version to version, batavia currently supports 3.4.4 and possibly 3.5.x (their docs state many things). My problem was compiling those pythons on Archlinux and when I finally did a) their test suite did not quite pass and b) it chocked on import __future__.
PyPyJS https://github.com/pypyjs/pypyjs
The whole thing is complicated and I failed to make it work. Main demotivator: the website offers python 2.7.9 and there is a github issue that indicates python 3 support is stalling.

Transpiling tools that look promising:

Transcrypt https://github.com/qquick/Transcrypt and http://transcrypt.org
This is currently the only candidate that has a distinct transpiling step (you actually see the .js files as a result) that could be made to work. However, there are problems which are summarized on SO.
Brython https://github.com/brython-dev/brython
The most laid-back approach: just works, delivers results but sometimes the resulting boolean expressions come out wrong (runtime error).

Transpiling tools that look promising but not yet tried:

Skulpt http://www.skulpt.org/
It looks similar to Brython, which is why untried.
flexx https://github.com/zoofIO/flexx
https://github.com/alehander42/pseudo-python
https://github.com/alehander42/pseudo

Update the aboutcode-toolkit CI check

Currently, the aboutcode-toolkit version used for CI checking of ABOUT files is v3.1.1 while the upstream is at v4.0.0. .ABOUT files generated with the latest toolkit do not pass the CI checks at this time.

Provide built-in support for SPDX and scancode license expression validation

I would like to have a function that takes an expression string as an argument and validates this expression. It could be build from Licensing.parse() but I would prefer having it return some object that tells me everything about the expression validity:

if the syntax is valid or not and error messages if not
what are the valid and invalid license symbols
what are the valid and invalid exceptions
what are the obsolete license symbols

This function should be taking either the ScanCode license DB as an input for license symbols ( https://scancode-licensedb.aboutcode.org ) or some list of symbols. It should bundle an up-to-date licenses list from ScanCode and SPDX for easy bootstrapping. For this we need nexB/scancode-licensedb#7
In addition it should also support and accept arbitrary LicenseRef- (and possibly DocumentRef- ) in SPDX mode.

Support Yocto license expressions syntax

Based on https://github.com/openembedded/openembedded-core/blob/14241ed09f9ed317045cf75a6d08416d3579bb8d/meta/lib/oe/license.py#L217 and https://www.openembedded.org/wiki/Recipe_License_Fields these are license symbols using & for AND and | for OR which should be supported most likely mostly out of the box.
See also:

Add support for Python 3

File "lib/python3.5/site-packages/license_expression.py", line 324, in build
    if isinstance(expression, basestring) and expression.strip():
NameError: name 'basestring' is not defined

Could be solved with the following:

try:
    unicode = unicode
except NameError:
    str = str
    unicode = str
    bytes = bytes
    basestring = (str, bytes)
else:
    str = str
    unicode = unicode
    bytes = str
    basestring = basestring

Add support for "with" statement

Also support validation against a list of given exceptions

Fix Travis CI for MacOSX and Python 3 support

Following up on @sschuberth PR #5 there is something fishy in the travis setup: the tests run on Mac and Linux but only on Python 2.7 and Python 3.4 and 3.5 are ignored.