Comments (13)
Thanks for the detailed report! We are working on a fix.
from syft.
Facing the same problem. Rolled back to 1.0.1 for now.
from syft.
OK, I think I found the bug in the regexp!
syft/syft/pkg/cataloger/binary/classifiers.go
Lines 91 to 95 in 1e31356
First, it uses \s
rather than \x00
, then it requires the presence of a -
followed by something.
With the bytes I have in my binary ([NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.22+7[NUL]-J-ms8m[NUL]
), the [0-9]+[^-\s]+
will actually match the whole version up to and including the following null byte, then the (-(?!jvmci)[^-\x00]+)+
portion will match the -J-ms8m
.
from syft.
@tbroyer thanks for looking at that regex! I agree that it's pulling in a null byte. My hope is to refactor these classifiers a bit to get some more logic into go and out of regexes, since I think a regex that long and complicated is pretty difficult to review and get right.
from syft.
Fwiw, I replicated the bug on regex101 (replacing null bytes with §): https://regex101.com/r/fiy5l3/3
Just replacing the two \s
with \x00
(§
in the test) and the +
at the end with *
was enough to correctly match my Eclipse Temurin, match the OpenJDK examples from the code, and not match a GraalVM example recreated from the test added in the same commit: https://regex101.com/r/MoSWPq/3
This means this could be fixed quickly, pending the bigger refactor you're talking about getting rid of regexes.
- `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\s]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\s].?|[^-\x00]{6,}))+)\x00`),
+ `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\x00]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\x00].?|[^-\x00]{6,}))*)\x00`),
Let me know if you want me to do a PR.
from syft.
@willmurphyscode that regex is only complicated because of the lack of lookahead and lookbehind in go :(
from syft.
Very strange, I don't get that behavior at all locally even with the tags mentioned
from syft.
OOOh! it does this on AMD64 but not ARM64
from syft.
I added the failing test for this issue!
from syft.
@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?
It looks like that example finds 2 different java binaries: https://github.com/anchore/syft/actions/runs/8633703946/job/23667391245?pr=2766#step:11:1697 has:
"[Pkg(name="java/jre" version="11.0.22+7\x00-J-ms8m" type="binary" id="ea3e54cbbb41c9ac") Pkg(name="java/jre" version="11.0.22+7" type="binary" id="6e4db1ab636e47e6")]" should have 1 item(s), but has 2
I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.
from syft.
@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?
It looks like that example finds 2 different java binaries:
Yes it does (see OP #2750 (comment))
I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.
I didn't look closely enough, but indeed the one with the "correct version" is matched by java-binary-oracle (see CPE and PURL in OP), and java-binary-openjdk matches "too much" and creates the second one.
It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)
from syft.
I can't help but think there must be a better solution for this problem. As I understand it, the issue is that we have a binary file, java
, which in some cases matching a specific thing including jvmci
we want to classify as graalvm
, but if it's not that, we want it to continue on and try to match the standard java regex. Does this summarize the issue?
Could simplify the regex back to what we had before, but then add some sort of inverted evidence matcher where we can add a test for the jvmci
part of it to not be present?
I think this could work using something like:
EvidenceMatcher: excludeVersionMatches(`-jvmci-`,
// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
FileContentsVersionMatcher(`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`))
with a function similar to:
func excludeVersionMatches(pattern string, matcher EvidenceMatcher) EvidenceMatcher {
pat := regexp.MustCompile(pattern)
return func(resolver file.Resolver, classifier Classifier, location file.Location) ([]pkg.Package, error) {
var out []pkg.Package
pkgs, err := matcher(resolver, classifier, location)
if err != nil {
return nil, err
}
for _, p := range pkgs {
if !pat.MatchString(p.Version) {
out = append(out, p)
}
}
return out, nil
}
}
from syft.
It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)
I just downloaded JDK 21 LTS and 22 from Oracle (https://www.oracle.com/java/technologies/downloads/) and it looks like versions are very similar to OpenJDK flavors, but it doesn't have that [NUL]openjdk
prefix; which means a (?<!\x00openjdk)
lookbehind would discriminate between the two classifiers …on x64, because things are different on aarch64, and between linux and mac!
On aarch64, the pattern is more [NUL](release)[NUL][NUL][NUL][NUL][NUL](version)[NUL][NUL][NUL]openjdk[NUL]java[NUL]
, with the openjdk
part only being present in Eclipse Temurin (https://adoptium.net/fr/temurin/releases/?arch=aarch64) and not in Oracle JDK (where there are four consecutive [NUL]
) …on linux. On mac, in Eclipse Temurin, it's [NUL](version)[NUL](release)[NUL]-Jms8m[NUL]java[NUL]openjdk[NUL]
(inversion of release and version, inversion of openjdk and java, and many fewer [NUL])
So indeed maybe an inverted matcher could work here, though more like a more complex FileContentsVersionMatcher
with an "inverted regexp" that, if it matched, would disqualify the classifier.
So without adding more platforms for now for the regexps, java-binary-oracle and java-binary-graalvm could reuse the java-binary-openjdk regexp as the "inverted matcher", and java-binary-openjdk regex would be back to the one from v1.0.1.
{
Class: "java-binary-openjdk",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
// x64
// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
// TODO the updates might need to be part of the CPE Attributes, like: 1.8.0:update152
CPEs: singleCPE("cpe:2.3:a:oracle:openjdk:*:*:*:*:*:*:*:*"),
},
// …
{
Class: "java-binary-oracle",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
// [NUL]19.0.1+10-21[NUL]
`(?m)\x00(?P<version>[0-9]+[.0-9]+[+][-0-9]+)\x00`,
// must not match (see java-binary-openjdk):
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
CPEs: singleCPE("cpe:2.3:a:oracle:jre:*:*:*:*:*:*:*:*"),
},
{
Class: "java-binary-graalvm",
FileGlob: "**/java",
EvidenceMatcher: FileContentsVersionMatcher(
`(?m)\x00(?P<version>[0-9]+[.0-9]+[.0-9]+\+[0-9]+-jvmci-[0-9]+[.0-9]+-b[0-9]+)\x00`,
// must not match (see java-binary-openjdk):
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
Package: "java",
PURL: mustPURL("pkg:generic/java@version"),
CPEs: singleCPE("cpe:2.3:a:oracle:graalvm:*:*:*:*:*:*:*:*"),
},
from syft.
Related Issues (20)
- Support traefik in linux/arm/v6, linux/riscv64
- Generate a Maven friendly CPE
- Supplier information missing in the SBOM HOT 5
- Support old bitnami/mariadb
- dpkg packages that are in `deinstalled` state should not be in SBOM HOT 4
- Python packages: name normalization
- Support Bitnami embedded SBOMs HOT 2
- Convert command should consider NTIA requirements
- syft stuck at 'Cataloged contents' HOT 4
- Dependency graph of BOMs generated with Syft is incomplete due to missing root node HOT 4
- Syft fails when /tmp is missing, but continues without exit code 0 HOT 3
- Syft Not reading components with Dots in Names from requirements.txt HOT 4
- Syft does not log warnings when no tty is present HOT 3
- Version parsing regression for Go binaries HOT 1
- syft extract the full description of the license in python HOT 8
- feat: dpkg license improvement for non SPDX licenses HOT 1
- Option to set `PackageSupplier` in root of SPDX document generated by CLI HOT 2
- UT TestParseRpmFiles is failing
- Syft Extract dependencies from Package.json in JavaScript Package Cataloger
- Missing dependency relationships between direct dependencies and transient dependencies in NPM packages HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from syft.