GithubHelp home page GithubHelp logo

spdx-java-tagvalue-store's Introduction

spdx-java-tagvalue-store

SPDX store that supports serializing and deserializing SPDX tag/value files.

This library utilizes the SPDX Java Library Storage Interface extending the ExtendedSpdxStore which allows for utilizing any underlying store which implements the SPDX Java Library Storage Interface.

Code quality badges

| Bugs | Security Rating | Maintainability Rating | Technical Debt |

Using the Library

This library is intended to be used in conjunction with the SPDX Java Library.

Create an instance of a store which implements the SPDX Java Library Storage Interface. For example, the InMemSpdxStore is a simple in-memory storage suitable for simple file serializations and deserializations.

Create an instance of TagValueStore(IModelStore baseStore) passing in the instance of a store created above along with the format.

Serializing and Deserializing

This library supports the ISerializableModelStore interface for serializing and deserializing files based on the format specified.

Development Status

Mostly stable - although it has not been widely used.

spdx-java-tagvalue-store's People

Contributors

goneall avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

spdx-java-tagvalue-store's Issues

Release 1.1.5

  • Review all PR's and Issues
  • Pass unit tests
  • Test tools-java dependent library
  • Test cdx-to-spdx dependent library
  • Run mvn org.owasp:dependency-check-maven:check
  • Update version
  • Run mvn deploy
  • Release to Maven on Sonatype
  • Create Github release

Validator does not check the correct case of properties

I have a tag-value file that contains the following line:

FilesAnalyzed: False

The validator tells me that the file is correct.

Since tags and format properties are case sensitive (SPDX 2.2, section 1.7.7), the correct value is false and the validator should have told that the file is incorrect.

Files will be added to the last package when serializing

If there are any files which are not in the Document Describes nor in a hasFile / Contains relationship to a package, they will be added at the end of the SPDX Document. Since there is an implied contains relationship to any files which immediately follow a package, these files will be included in the contains relationship to the last package in the SPDX document before the files.

This can probably be fixed by adding any non-described, non contained files at the very beginning before any packages.

Note - this issue has been presence since the implementation of the 2.0 spec - several years - and has not been reported, so this may not be a real issue in practice.

Sort output on serialization

With the recent performance improvements, the order of collections is no longer preserved.

In order to provide consistent output, each collection can be sorted when being serialized.

Validation differences in `extractedText`

During research for this issue I came across an inconsistency in the java-tools (and the online validator).
Converting this tag-value file (which is marked as valid by the java-tools):

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: SAG-PM generated SBOM
DocumentNamespace: dns:softwareassuranceguardian.com
Creator: Organization: dns:reliableenergyanalytics.com
Creator: Tool: SAG-PM Version: 1.2
Created: 2022-11-26T18:45:28Z
PackageName: apache-tomcat-9.0.69.zip
PackageVersion: 9.0.69
SPDXID: SPDXRef-Package-fc4a1bf0-78a0-43ca-b4a9-78adfb42138c
PackageSupplier: Organization: Apache Foundation
PackageDownloadLocation: https://dlcdn.apache.org/tomcat/tomcat-9/v9.0.69/bin/apache-tomcat-9.0.69.zip/
FilesAnalyzed: false
LicenseID: LicenseRef-Unlicense
LicenseName: Unlicense

to json will include a new tag extractedText:

"hasExtractedLicensingInfos" : [ {
    "licenseId" : "LicenseRef-Unlicense",
    "extractedText" : "WARNING: TEXT IS REQUIRED",
    "name" : "Unlicense"
  } ]

As also mentioned in the issue linked above, I believe that the extracted text is mandatory and the above tag-value example should not be marked as valid.

verification of tag-File fails because of used ID

I try to use tools-java to verify this file from tools-python. But when I execute the command

 java -jar target/tools-java-1.1.1-jar-with-dependencies.jar Verify ../tools-python/data/SPDXSimpleTag.tag

I get the the following error:

14:27:47.000 [main] ERROR org.spdx.storage.simple.InMemSpdxStore - Can not delete ID __anon__gnrtd12.  It is in use
Analysis exception processing SPDX file: Can not delete ID __anon__gnrtd12.  It is in use.

Since the error comes from InMemSpdxStore I decided to open the issue in this repo.

External document reference without a space between the hash and hashvalue fails

External document references include a SHA1:[value]. Currently, the tag/value parser expects a space after the colon and before the value.

Per the spec, this space is not required.

Note that the regex pattern that needs to change is here:

private static Pattern EXTERNAL_DOC_REF_PATTERN = Pattern.compile("(\\S+)\\s+(\\S+)\\s+SHA1:\\s+(\\S+)");

Relationship comment deleted when converting json file to tag-value

When converting the following json file

{
  "SPDXID" : "SPDXRef-DOCUMENT",
  "spdxVersion" : "SPDX-2.3",
  "creationInfo" : {
    "created" : "2022-01-01T00:00:00Z",
    "creators" : [ "Tool: test-tool" ]
  },
  "name" : "document name",
  "dataLicense" : "CC0-1.0",
  "documentDescribes" : [ "SPDXRef-fileA"],
  "documentNamespace" : "https://some.namespace",
  "files" : [  {
    "SPDXID" : "SPDXRef-fileA",
    "checksums" : [ {
      "algorithm" : "SHA1",
      "checksumValue" : "d6a770ba38583ed4bb4525bd96e50461655d2758"
    } ],
    "fileName" : "./fileA.c"
  } ],
  "relationships" : [ {
    "spdxElementId" : "SPDXRef-fileA",
    "relationshipType" : "DESCRIBED_BY",
    "relatedSpdxElement" : "SPDXRef-DOCUMENT",
    "comment" : "comment on DESCRIBED_BY"
  } ]
}

to a tag-value file, the result is

DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA

FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT

where the comment on the relationship SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT is deleted.

SPDX Tag-Value validator: Unexpected Error: org.spdx.library.model.SpdxIdInUseException: Can not create Apache-2.0. It is already in use with type ListedLicense which is incompatible with type ExtractedLicensingInfo

While attempting to validate the attached SPDX Tag-Value file, it results in this error in the online validator:

Analysis exception processing SPDX file: Unexpected Error: org.spdx.library.model.SpdxIdInUseException: 
Can not create Apache-2.0. It is already in use with type ListedLicense which is incompatible with type 
ExtractedLicensingInfo

This is difficult to understand what the problem is -- is it a parser issue or something wrong with the SPDX file?

sample.spdx.txt

Performance issue with a large number of files

With a large number of files, there is 6 ms per file on a relatively high performance machine. For a file with 145K files, this can add up to more than 1 minute 40 seconds.

From profiling, this is primarily due to the call to modelStore.delete(documentNamespace, lastFile.getId()); in BuildDocument.addLastFile().

The delete function is extremely slow - see issue spdx/Spdx-Java-Library#40

This is one of the causes for issue spdx/spdx-online-tools#289

Multiple LicenseID

The following SPDX file is valid according to the Java tools
phpwiki.spdx.txt

However, it contains many duplicate LicenseID:

grep LicenseID phpwiki.spdx.txt|uniq -c
      1 LicenseID: LicenseRef-scancode-bsd-unmodified
      1 LicenseID: LicenseRef-scancode-commercial-license
      1 LicenseID: LicenseRef-scancode-free-unknown
      1 LicenseID: LicenseRef-scancode-mysql-linking-exception-2018
      5 LicenseID: LicenseRef-scancode-other-permissive
     20 LicenseID: LicenseRef-scancode-php-2.0.2
     15 LicenseID: LicenseRef-scancode-proprietary-license
      3 LicenseID: LicenseRef-scancode-public-domain
     23 LicenseID: LicenseRef-scancode-unknown-license-reference
      3 LicenseID: LicenseRef-scancode-unknown-spdx
      1 LicenseID: LicenseRef-scancode-warranty-disclaimer

Shouldn't this be flagged as invalid SPDX?

Deserializing document does not populate documentNamespace in model store

I used the following minimal example for testing:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
DocumentNamespace: some_namespace
DocumentName: SPDX-tool-test
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-somefile

FileName: ./foo.txt
SPDXID: SPDXRef-somefile
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: LGPL-3.0-only
FileCopyrightText: <text>Copyright 2022 some guy</text>

I deserialized via:

var document = SpdxToolsHelper.deserializeDocument(file);

(where the tools helper is from tools-java, but it boils down to the tagvalue store from this repo).

While document.getDocumentUri() correctly returns the namespace, no value is present in the model store for documentNamespace, as can be checked via

document.getModelStore().getPropertyValueNames("some_namespace", "SPDXRef-DOCUMENT")

or

document.getModelStore().getValue("some_namespace", "SPDXRef-DOCUMENT", "documentNamespace")

Note: The problem seems specific to this store, at least the xml store does not have the same issue. I didn't try the other stores yet

Version 0.5 Release Checklist

  • Create release for any dependent libraries which have changed (Spdx-Java-Library)
  • Update dependency versions in the pom.xml file
  • Update the version in the pom.xml file
  • Publish using the command maven deploy - this will deploy to the bintray SPDX tools
  • Sync the bintray repo with Maven Central
  • Tag the repo with the release version
  • Publish the release on github

Incorrect conversion for FileCopyrightText

I have a tag-value SPDX file that contains:

PackageCopyrightText: NOASSERTION

for one package and

PackageCopyrightText: NONE

for another.

After converting the file to JSON then to tag-value, I get the following:

PackageCopyrightText: <text>NOASSERTION</text>

and

PackageCopyrightText: <text>NONE</text>

Support parsing snippets without the SnippetFromFileSPDXID

The SPDX spec is ambiguous with respect to if the SnippetFromFileSPDXID is required for snippets or if a snippet immediately following a file is assumed to belong to that file (see SPDX Spec issue #651).

The tag/value parser could be enhanced to associate the snippet with the immediately preceeding SPDX file if the SnippetFromFileSPDXID is not present.

ExternalDocumentRef with non sha1 checksums do not validate correctly

When validating a spdx Tag/Value doc with an ExternalDocumentRef using a checksum with an algorithm different than sha1, Verify will complain that the reference is invalid even when it is well formed and it is a valid checksum algorithm.

Example using ext reference
DocumentRef-kubernetes-v1.22.0-beta.1 https://k8s.io/sbom/source/v1.22.0-beta.1 SHA256: 9b01bf4e25f170cba6b8ad1db4297d2d4b7ddebbebf87cc440a1a6f537f48637

java -jar tmp/tools-java-1.0.1-jar-with-dependencies.jar Verify tmp/kubernetes-release.spdx 
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
Analysis exception processing SPDX file: Invalid external document reference: DocumentRef-kubernetes-v1.22.0-beta.1 https://k8s.io/sbom/source/v1.22.0-beta.1 SHA256: 9b01bf4e25f170cba6b8ad1db4297d2d4b7ddebbebf87cc440a1a6f537f48637 at line number 6

If I change the external reference to SHA1, using the following tag it works:
ExternalDocumentRef:DocumentRef-kubernetes-v1.22.0-beta.1 https://k8s.io/sbom/source/v1.22.0-beta.1 SHA1: 89b0b411c5381564689f5ce96a84b2987d25acfa

java -jar tmp/tools-java-1.0.1-jar-with-dependencies.jar Verify tmp/kubernetes-release.spdx 
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
This SPDX Document is valid.

I also tested with sha512 and it also did not work.

tagvalue document with snippet covert error

If snippet is defined in a document with tag-value format, an error occurs when converting to other format(rdf, xml, etc.)

Snippet Information

SnippetSPDXID: SPDXRef-EA073AD1F072E19FD4AB65B3C1555974

error log
Caused by: org.spdx.library.InvalidSPDXAnalysisException: Error parsing snippet. Unrecognized tag: SnippetSPDXID: at line number 546
at org.spdx.tag.BuildDocument.buildSnippet(BuildDocument.java:484)
BuildDocument.java:484
at org.spdx.tag.BuildDocument.buildDocument(BuildDocument.java:404)
BuildDocument.java:404
at org.spdx.tag.HandBuiltParser.data(HandBuiltParser.java:100)
HandBuiltParser.java:100
at org.spdx.tagvaluestore.TagValueStore.deSerialize(TagValueStore.java:88)
TagValueStore.java:88
at org.spdx.tools.SpdxConverter.convert(SpdxConverter.java:151)
SpdxConverter.java:151

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.