GithubHelp home page GithubHelp logo

Comments (13)

willmurphyscode avatar willmurphyscode commented on May 29, 2024 1

Thanks for the detailed report! We are working on a fix.

from syft.

Hritik14 avatar Hritik14 commented on May 29, 2024

Facing the same problem. Rolled back to 1.0.1 for now.

from syft.

tbroyer avatar tbroyer commented on May 29, 2024

OK, I think I found the bug in the regexp!

// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
// Equivalent to the following regexp with lookahead support:
// (?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<release>[0-9]+[.0-9]*) (?P<version>[0-9]+[^-\x00]+(-(?!jvmci)[^-\x00]+)+)
`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\s]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\s].?|[^-\x00]{6,}))+)\x00`),

First, it uses \s rather than \x00, then it requires the presence of a - followed by something.

With the bytes I have in my binary ([NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.22+7[NUL]-J-ms8m[NUL]), the [0-9]+[^-\s]+ will actually match the whole version up to and including the following null byte, then the (-(?!jvmci)[^-\x00]+)+ portion will match the -J-ms8m.

from syft.

willmurphyscode avatar willmurphyscode commented on May 29, 2024

@tbroyer thanks for looking at that regex! I agree that it's pulling in a null byte. My hope is to refactor these classifiers a bit to get some more logic into go and out of regexes, since I think a regex that long and complicated is pretty difficult to review and get right.

from syft.

tbroyer avatar tbroyer commented on May 29, 2024

Fwiw, I replicated the bug on regex101 (replacing null bytes with §): https://regex101.com/r/fiy5l3/3

Just replacing the two \s with \x00 (§ in the test) and the + at the end with * was enough to correctly match my Eclipse Temurin, match the OpenJDK examples from the code, and not match a GraalVM example recreated from the test added in the same commit: https://regex101.com/r/MoSWPq/3

This means this could be fixed quickly, pending the bigger refactor you're talking about getting rid of regexes.

- `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\s]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\s].?|[^-\x00]{6,}))+)\x00`),
+ `(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^-\x00]+(-([^-j\x00][^-\x00]?|[^-\x00][^-v\x00][^-\x00]?|[^-\x00][^-\x00][^-m\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-c\x00][^-\x00]?|[^-\x00][^-\x00][^-\x00][^-\x00][^-i\x00].?|[^-\x00]{6,}))*)\x00`),

Let me know if you want me to do a PR.

from syft.

LaurentGoderre avatar LaurentGoderre commented on May 29, 2024

@willmurphyscode that regex is only complicated because of the lack of lookahead and lookbehind in go :(

from syft.

LaurentGoderre avatar LaurentGoderre commented on May 29, 2024

Very strange, I don't get that behavior at all locally even with the tags mentioned

from syft.

LaurentGoderre avatar LaurentGoderre commented on May 29, 2024

OOOh! it does this on AMD64 but not ARM64

from syft.

LaurentGoderre avatar LaurentGoderre commented on May 29, 2024

I added the failing test for this issue!

from syft.

willmurphyscode avatar willmurphyscode commented on May 29, 2024

@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?

It looks like that example finds 2 different java binaries: https://github.com/anchore/syft/actions/runs/8633703946/job/23667391245?pr=2766#step:11:1697 has:

"[Pkg(name="java/jre" version="11.0.22+7\x00-J-ms8m" type="binary" id="ea3e54cbbb41c9ac") Pkg(name="java/jre" version="11.0.22+7" type="binary" id="6e4db1ab636e47e6")]" should have 1 item(s), but has 2

I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.

from syft.

tbroyer avatar tbroyer commented on May 29, 2024

@LaurentGoderre thanks for adding the failing test! Does @tbroyer 's fix for the regex make your test pass?

It looks like that example finds 2 different java binaries:

Yes it does (see OP #2750 (comment))

I think we might also wish we had a negative look behind for the oracle JRE binary classifier, since the version info can look pretty similar in the binary.

I didn't look closely enough, but indeed the one with the "correct version" is matched by java-binary-oracle (see CPE and PURL in OP), and java-binary-openjdk matches "too much" and creates the second one.

It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)

from syft.

kzantow avatar kzantow commented on May 29, 2024

I can't help but think there must be a better solution for this problem. As I understand it, the issue is that we have a binary file, java, which in some cases matching a specific thing including jvmci we want to classify as graalvm, but if it's not that, we want it to continue on and try to match the standard java regex. Does this summarize the issue?

Could simplify the regex back to what we had before, but then add some sort of inverted evidence matcher where we can add a test for the jvmci part of it to not be present?

I think this could work using something like:

			EvidenceMatcher: excludeVersionMatches(`-jvmci-`,
				// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
				// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
				FileContentsVersionMatcher(`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`))

with a function similar to:

func excludeVersionMatches(pattern string, matcher EvidenceMatcher) EvidenceMatcher {
	pat := regexp.MustCompile(pattern)
	return func(resolver file.Resolver, classifier Classifier, location file.Location) ([]pkg.Package, error) {
		var out []pkg.Package
		pkgs, err := matcher(resolver, classifier, location)
		if err != nil {
			return nil, err
		}
		for _, p := range pkgs {
			if !pat.MatchString(p.Version) {
				out = append(out, p)
			}
		}
		return out, nil
	}
}

from syft.

tbroyer avatar tbroyer commented on May 29, 2024

It looks like the java-binary-oracle regexp is not specific enough (or possibly should be removed?)

I just downloaded JDK 21 LTS and 22 from Oracle (https://www.oracle.com/java/technologies/downloads/) and it looks like versions are very similar to OpenJDK flavors, but it doesn't have that [NUL]openjdk prefix; which means a (?<!\x00openjdk) lookbehind would discriminate between the two classifiers …on x64, because things are different on aarch64, and between linux and mac!

On aarch64, the pattern is more [NUL](release)[NUL][NUL][NUL][NUL][NUL](version)[NUL][NUL][NUL]openjdk[NUL]java[NUL], with the openjdk part only being present in Eclipse Temurin (https://adoptium.net/fr/temurin/releases/?arch=aarch64) and not in Oracle JDK (where there are four consecutive [NUL]) …on linux. On mac, in Eclipse Temurin, it's [NUL](version)[NUL](release)[NUL]-Jms8m[NUL]java[NUL]openjdk[NUL] (inversion of release and version, inversion of openjdk and java, and many fewer [NUL])

So indeed maybe an inverted matcher could work here, though more like a more complex FileContentsVersionMatcher with an "inverted regexp" that, if it matched, would disqualify the classifier.
So without adding more platforms for now for the regexps, java-binary-oracle and java-binary-graalvm could reuse the java-binary-openjdk regexp as the "inverted matcher", and java-binary-openjdk regex would be back to the one from v1.0.1.

		{
			Class:    "java-binary-openjdk",
			FileGlob: "**/java",
			EvidenceMatcher: FileContentsVersionMatcher(
				// x64
				// [NUL]openjdk[NUL]java[NUL]0.0[NUL]11.0.17+8-LTS[NUL]
				// [NUL]openjdk[NUL]java[NUL]1.8[NUL]1.8.0_352-b08[NUL]
				`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
			Package: "java",
			PURL:    mustPURL("pkg:generic/java@version"),
			// TODO the updates might need to be part of the CPE Attributes, like: 1.8.0:update152
			CPEs: singleCPE("cpe:2.3:a:oracle:openjdk:*:*:*:*:*:*:*:*"),
		},
// …
		{
			Class:    "java-binary-oracle",
			FileGlob: "**/java",
			EvidenceMatcher: FileContentsVersionMatcher(
				// [NUL]19.0.1+10-21[NUL]
				`(?m)\x00(?P<version>[0-9]+[.0-9]+[+][-0-9]+)\x00`,
				// must not match (see java-binary-openjdk):
				`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
			Package: "java",
			PURL:    mustPURL("pkg:generic/java@version"),
			CPEs:    singleCPE("cpe:2.3:a:oracle:jre:*:*:*:*:*:*:*:*"),
		},
		{
			Class:    "java-binary-graalvm",
			FileGlob: "**/java",
			EvidenceMatcher: FileContentsVersionMatcher(
				`(?m)\x00(?P<version>[0-9]+[.0-9]+[.0-9]+\+[0-9]+-jvmci-[0-9]+[.0-9]+-b[0-9]+)\x00`,
				// must not match (see java-binary-openjdk):
				`(?m)\x00openjdk\x00java\x00(?P<release>[0-9]+[.0-9]*)\x00(?P<version>[0-9]+[^\x00]+)\x00`),
			Package: "java",
			PURL:    mustPURL("pkg:generic/java@version"),
			CPEs:    singleCPE("cpe:2.3:a:oracle:graalvm:*:*:*:*:*:*:*:*"),
		},

from syft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.