Comments (13)
Thanks for the detailed report! We are working on a fix.
from syft.
Facing the same problem. Rolled back to 1.0.1 for now.
from syft.
OK, I think I found the bug in the regexp!
syft/syft/pkg/cataloger/binary/classifiers.go
Lines 91 to 95 in 1e31356
First, it uses \s
rather than \x00
, then it requires the presence of a -
followed by something.
With the bytes I have in my binary ([NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.22+7[NUL]-J-ms8m[NUL]
), the [0-9]+[^-\s]+
will actually match the whole version up to and including the following null byte, then the (-(?!jvmci)[^-\x00]+)+
portion will match the -J-ms8m
.
from syft.
@tbroyer thanks for looking at that regex! I agree that it's pulling in a null byte. My hope is to refactor these classifiers a bit to get some more logic into go and out of regexes, since I think a regex that long and complicated is pretty difficult to review and get right.
from syft.
Fwiw, I replicated the bug on regex101 (replacing null bytes with §): https://regex101.com/r/fiy5l3/3
Just replacing the two \s
with \x00
(§
in the test) and the +
at the end with *
was enough to correctly match my Eclipse Temurin, match the OpenJDK examples from the code, and not match a GraalVM example recreated from the test added in the same commit: https://regex101.com/r/MoSWPq/3
This means this could be fixed quickly, pending the bigger refactor you're talking about getting rid of regexes.
- `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\s]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\s].?|[^-\x00]{6,}))+)\x00`),
+ `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\x00]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\x00].?|[^-\x00]{6,}))*)\x00`),
Let me know if you want me to do a PR.
from syft.
@willmurphyscode that regex is only complicated because of the lack of lookahead and lookbehind in go :(
from syft.
Very strange, I don't get that behavior at all locally even with the tags mentioned
from syft.
OOOh! it does this on AMD64 but not ARM64
from syft.
I added the failing test for this issue!
from syft.
@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?
It looks like that example finds 2 different java binaries: https://github.com/anchore/syft/actions/runs/8633703946/job/23667391245?pr=2766#step:11:1697 has:
"[Pkg(name="java/jre" version="11.0.22+7\x00-J-ms8m" type="binary" id="ea3e54cbbb41c9ac") Pkg(name="java/jre" version="11.0.22+7" type="binary" id="6e4db1ab636e47e6")]" should have 1 item(s), but has 2
I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.
from syft.
@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?
It looks like that example finds 2 different java binaries:
Yes it does (see OP #2750 (comment))
I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.
I didn't look closely enough, but indeed the one with the "correct version" is matched by java-binary-oracle (see CPE and PURL in OP), and java-binary-openjdk matches "too much" and creates the second one.
It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)
from syft.
I can't help but think there must be a better solution for this problem. As I understand it, the issue is that we have a binary file, java
, which in some cases matching a specific thing including jvmci
we want to classify as graalvm
, but if it's not that, we want it to continue on and try to match the standard java regex. Does this summarize the issue?
Could simplify the regex back to what we had before, but then add some sort of inverted evidence matcher where we can add a test for the jvmci
part of it to not be present?
I think this could work using something like:
EvidenceMatcher: excludeVersionMatches(`-jvmci-`,
// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
FileContentsVersionMatcher(`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`))
with a function similar to:
func excludeVersionMatches(pattern string, matcher EvidenceMatcher) EvidenceMatcher {
pat := regexp.MustCompile(pattern)
return func(resolver file.Resolver, classifier Classifier, location file.Location) ([]pkg.Package, error) {
var out []pkg.Package
pkgs, err := matcher(resolver, classifier, location)
if err != nil {
return nil, err
}
for _, p := range pkgs {
if !pat.MatchString(p.Version) {
out = append(out, p)
}
}
return out, nil
}
}
from syft.
It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)
I just downloaded JDK 21 LTS and 22 from Oracle (https://www.oracle.com/java/technologies/downloads/) and it looks like versions are very similar to OpenJDK flavors, but it doesn't have that [NUL]openjdk
prefix; which means a (?<!\x00openjdk)
lookbehind would discriminate between the two classifiers …on x64, because things are different on aarch64, and between linux and mac!
On aarch64, the pattern is more [NUL](release)[NUL][NUL][NUL][NUL][NUL](version)[NUL][NUL][NUL]openjdk[NUL]java[NUL]
, with the openjdk
part only being present in Eclipse Temurin (https://adoptium.net/fr/temurin/releases/?arch=aarch64) and not in Oracle JDK (where there are four consecutive [NUL]
) …on linux. On mac, in Eclipse Temurin, it's [NUL](version)[NUL](release)[NUL]-Jms8m[NUL]java[NUL]openjdk[NUL]
(inversion of release and version, inversion of openjdk and java, and many fewer [NUL])
So indeed maybe an inverted matcher could work here, though more like a more complex FileContentsVersionMatcher
with an "inverted regexp" that, if it matched, would disqualify the classifier.
So without adding more platforms for now for the regexps, java-binary-oracle and java-binary-graalvm could reuse the java-binary-openjdk regexp as the "inverted matcher", and java-binary-openjdk regex would be back to the one from v1.0.1.
{
Class: "java-binary-openjdk",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
// x64
// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
// TODO the updates might need to be part of the CPE Attributes, like: 1.8.0:update152
CPEs: singleCPE("cpe:2.3:a:oracle:openjdk:*:*:*:*:*:*:*:*"),
},
// …
{
Class: "java-binary-oracle",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
// [NUL]19.0.1+10-21[NUL]
`(?m)\x00(?P<version>[0-9]+[.0-9]+[+][-0-9]+)\x00`,
// must not match (see java-binary-openjdk):
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
CPEs: singleCPE("cpe:2.3:a:oracle:jre:*:*:*:*:*:*:*:*"),
},
{
Class: "java-binary-graalvm",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
`(?m)\x00(?P<version>[0-9]+[.0-9]+[.0-9]+\+[0-9]+-jvmci-[0-9]+[.0-9]+-b[0-9]+)\x00`,
// must not match (see java-binary-openjdk):
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
CPEs: singleCPE("cpe:2.3:a:oracle:graalvm:*:*:*:*:*:*:*:*"),
},
from syft.
Related Issues (20)
- Binary copied to image omitted from SBOM HOT 4
- Relationships / Dependencies are present in Syft json and SPDX json files but not in Cyclonedx json file format HOT 3
- Not all the packages are getting imported in Blackduck scanner HOT 5
- Scanning a git repository folder present in /tmp produce an empty sbom HOT 1
- Capture licenses for all packages HOT 6
- Install Issue - Ubuntu Image on Mac M1 Pro HOT 3
- SBOM generated for JAR doesn't parsing all pom.xml HOT 2
- SBOM generation is missing a few Python packages listed in the requirements.txt file
- Option in parameter or configuration to set value in metadata > authors in SBOM (CycloneDX)
- Syft incorrectly identifying jruby jar files
- Parameter confirmation of docker _registry scanning HOT 1
- install.sh: check checksums file's signature HOT 2
- Reverse conversion of metadata mode is broken
- syft does not find anything in archives if /tmp is a tmpfs
- Support cataloging dlopen ELF metadata
- Syft Directory Source: Git Tag and Metadata Information
- syft outputs incorrect license LicenseRef-AND
- Detect fluent-bit binaries
- Binary detection workflow enhancements
- SYFT_PACKAGE_EXCLUDE_BINARY_OVERLAP_BY_OWNERSHIP=false is not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from syft.