GithubHelp home page GithubHelp logo

Comments (9)

haraldk avatar haraldk commented on June 16, 2024

Always wanted to say this: It's not a bug, it's a feature. πŸ˜‰

We intentionally replace the algorithm you describe from the JDK JPEG plugin, with the standard way of determine the JPEG "color space", from libjpeg. The reason of this is of course, as you mention, this is what all other software does.

By the way, most standard viewers, unlike Java AWT, - for example, Windows viewer, Paint, GIMP - also show this file with magenta background.
[...]
Most software, based on libjpeg library (as I think), interpret such JPEG by default as YCbCr. Your TwelveMonkey library does the same!

The "real" problem is, as usual, that JPEG (or the JFIF format) doesn't have an explicit way of stating what colors it encodes, leaving decoders to do a best effort guess...

If you search sites like Stack Overflow for things like "java jpeg pink" or similar, you will find that there is a lot of issues with the way the JDK JPEG plugin interprets "color space". So, even if in your case, the JDK implementation produces the desired outcome, there are more times when it does not. We don't want to change this.


However, there are other ways to solve this. You could filter out the reader as you do now, but why do you include it at all if you don't intend to use it for reading JPEGs?

Another possibility in your TIFF code, you could change from using ImageReader.read to ImageReader.readRaster. From its API doc:

This method allows formats that normally apply a color conversion, such as JPEG, and formats that do not normally have an associated colorspace, such as remote sensing or medical imaging data, to provide access to raw pixel data.

Then you can easily use your own algorithm to determine color space, or just use the PhotometricInterpretation field from the TIFF. My experience is that this field is usually correct for TIFFs with JPEG compression (7), but often incorrect for "old style" JPEG compression (6).

PS: Our TIFFImageReader decodes the SVS-file you mention correctly, both with and without the TwelveMonkeys JPEGImageReader. Feel free to look at the implementation to determine color space and conversion here.

from twelvemonkeys.

Daniel-Alievsky avatar Daniel-Alievsky commented on June 16, 2024

Always wanted to say this: It's not a bug, it's a feature. πŸ˜‰

Π•verything in the world is relative πŸ˜‰
From a programmer's point of view, this can be called a feature. But from the point of view of the end user of libraries it looks different. It is possible, that for a long time I used some library, SCIFIO, for reading large pyramid images like SVS (it is really so). All worked well. At some moment, in my large project, I needed another library, TwelveMonkeys. And, oops... all stopped working for a lot of SVS files.

What should do the end user? He probably cannot stop usage of SCIFIO, because this requires a lot of reworking - all modules for viewing, reading, analysis of large images. The only possible solution is to decline your wonderful library with great regret... because of this "feature".

I believe that this behavior could actually become a good feature if you add a clear, documented way to manage it. For example, you may add a static method (and/or a corresponding system property) like "setYCbCrRecognitionStrategy" with enum variants LIBJPEG and JAVA_IMAGEIO, with default value JAVA_IMAGEIO. If the end user is interested in better recognition strategy, complying libjpeg tradition instead of little strange ImageIO behavior, he will call this method at the very beginning.

Fortunately, I'm not only an end user, I'm also a developer, working over improving SCIFIO TIFF support. Currently I simply added a detection of the original Oracle reader, as I described. I saw an analogous code in TwelveMonkey, when you try to find necessary delegateProvider:
lookupProviderByName(registry, "com.sun.imageio.plugins.jpeg.JPEGImageReaderSpi", ImageReaderSpi.class);
Do you think that this is a stable and safe solution? Are you not afraid that in future Java versions the original package of JPEG support will be renamed, for example, to "com.oracle.imageio..."? Maybe you know a better way to be sure to find the required Oracle class?

Another possibility in your TIFF code, you could change from using ImageReader.read to ImageReader.readRaster

You are right, it is a possible way, which is used by TwelveMonkeys itself. But there are not only "pro", there are also "contra". Maybe, JPEG is encoded in JPEG absolutely safe, for example, with help of JFIF marker, which means YCbCr. What is better - ignore this and try to read data "as-is" via readRaster with my own processing color space, or to use standard and well-tested native code of Oracle?

Feel free to look at the implementation to determine color space and conversion here.

Of course I already looked at this code. Thank you very much, it is really better than the corresponding code of SCIFIO and helps to understand all possible situations.

from twelvemonkeys.

haraldk avatar haraldk commented on June 16, 2024

I don't think you understand. πŸ˜€

The difference in "color space" (or, color encoding, if you like) "guessing" from the JDK JPEG plugin is intentional. It's one of several reasons users switch to the TwelveMonkeys library over the standard JDK plugin. It's a feature. We implemented it this way, because we found it better detected the encoded color space of "real world" JFIF files.

I agree it could have been better documented, feel free to provide a PR or suggest a change in the documentation about that. But I think it's less important because it's the same as what libjpeg does, and thus the de facto "standard" way to guess encoded color space for JFIF, and what everybody would expect in the first place.

Now, for TIFF files with embedded JPEG streams, things are different, because the TIFF format does describe the color space encoding inside the JPEG stream (TIFF tag 262/PhotometricInterpretation). There is no need to guess in the same way as you have to for JFIF. So delegating to a different plugin, to second-guess information you already know is not a good thing to do (when you extracted the JPEG stream from the TIFF, you basically threw this information away; there is no longer a single, "correct" way to decode it).


Anyway...

If your code relies on the implementation details of the JDK JPEG plugin, you need to either make sure you use that plugin, or change your code. What is the best for you, is of course something you need to decide. πŸ˜€

But to try to answer your questions:

Do you think that this is a stable and safe solution? Are you not afraid that in future Java versions the original package of JPEG support will be renamed, for example, to "com.oracle.imageio..."? Maybe you know a better way to be sure to find the required Oracle class?

It's obviously not 100% safe, but fairly stable (ie. it hasn't changed for roughly 20 years). I'm not worried about a rename, as we run tests with the latest available Java version + all LTS versions, so we will be alerted. And no, I do it this way because I don't know a better way.

Maybe, JPEG is encoded in JPEG absolutely safe, for example, with help of JFIF marker, which means YCbCr. What is better - ignore this and try to read data "as-is" via readRaster with my own processing color space, or to use standard and well-tested native code of Oracle?

JFIF marker (or other application markers) really isn't applicable for a JPEG stream inside a TIFF. The spec says:

"APPn (shall be ignored by TIFF readers)".

(Of course, if you see a JFIF marker, you might also assume the encoder didn't follow the spec anyway, but that is another discussion). So, yes, for JPEG (compression 7), ignoring the JFIF marker or whatever "color space" the JPEG decoder suggests, and just trust the PhotometricIntrepretation and do the conversion yourself is "better", in the sense that it's more correct.

PS: In my mind, "end users" use tested, packaged applications. Not libraries. So I think your question about what "end users" should do is moot.

from twelvemonkeys.

Daniel-Alievsky avatar Daniel-Alievsky commented on June 16, 2024

Regardless of the various arguments for and against, for now the simple fact is true: TwelveMonkeys library is currently principally incompatible with SCIFIO library and the similar Loci/BioFormat library, used by popular ImageJ/Fiji systems. I'll be glad if my efforts to create an alternative for SCIFIO TiffParser - my class TiffReader, which is compatible with TwelveMonkeys - will be useful and if authors of SCIFIO will import my classes to their library in future versions. But I'm afraid it will no be done quickly and will not help to users of BioFormats. If someone wants to use SCIFIO or BioFormats in his code for reading TIFF (like our company, which used SCIFIO for access to SVS files during last 10 years), he cannot simultaneously add dependence on TwelveMonkeys.

Obviously, the same incompatibility will appear with any library, which depends on ImageIO algorithms of decoding JPEG. It is the root of the problem: TwelveMonkeys changes behavior of very popular, basic operation ImageIO.read! I don't say that this algorithm, implemented in ImageIO.read, is good or that such dependence on this algorithm is good, but this is an existing fact.

I simply offered to resolve this problem, because it seems you could be do this easily from your side.

from twelvemonkeys.

haraldk avatar haraldk commented on June 16, 2024

I understand you want a quick fix, but unfortunately it's not that simple. πŸ˜•

Yes, we have a different algorithm for detecting colorspace encoding in JPEG/JFIF files. And we do it that way on purpose, because it is more compatible with the rest of the world.

However, we do not change the behavior of ImageIO.read. The methods in ImageIO always delegate to plugins discovered runtime, while the behavior you describe is plugin specific. Unfortunately, SCIFIO or other libraries that depend on ImageIO.read to behave in certain plugin-specific ways will always be fragile in this regard. And this is not something we can fix on our side.

Still, if you want a simpler fix than changing the code of the mentioned libraries, there are some more possible options:

  • Don't depend on the TwelveMonkeys JPEG plugin (you can still use the rest of the library)
  • Change/reset the order of JPEG plugins to make ImageIO.read pick the standard JDK JPEG plugin (you can still explicitly search for the TwelveMonkeys plugin in your own code)

Both of these will of course have the downside of not getting the full benefit of more robust JPEG reading, JPEG lossless support etc. This may be okay for you though, as I still haven't fully understood why you want to use the TwelveMonkeys JPEG plugin in the first place.

Or, if you really, really want to use TwelveMonkeys with SCIFIO or other library that uses ImageIO.read without fixing those libraries (which I still think would be the better choice), you are of course free to create PR for TwelveMonkeys to work around it. Here are some possibilities:

  • A system property that only disables the ordering before the JDK plugin, could work. It would be trivial to implement in the JPEGImageReaderSpi. But you would have to test if this always gives you the JDK plugin, or if the order becomes random. It would work basically the same as the last option above.
  • A variation of the above to avoid random order, a system property that specifies what class we should register after. It would be easy to implement in the JPEGImageReaderSpi, but needs some more thoughts on what to do when/if that class does not exist etc. It would also work basically the same as the last option above.
  • A system property that you can opt in to, to get the JDK JPEG plugin color space detection back in the TwelveMonkeys JPEG plugin. The current behavior must still be the default, and test on the current test dataset should run without it. You should probably also include "interop" modules with tests that verify that you actually get the desired behavior in the various libraries you want to support. More work than the two options above, but with the added benefit of more robust JPEG reading and JPEG lossless support. But still a lot of incorrectly colored "plain" JPEGs.

What do you think?

from twelvemonkeys.

Daniel-Alievsky avatar Daniel-Alievsky commented on June 16, 2024

Probably you don't understand me well. I don't need any changes in TwelveMonkeys for myself, because I'm working over fixing problems in SCIFIO and already resolved this problem in my TiffReader class. However, I'm worrying about a lot of users of already deployed and popular libraries SCIFIO/BioFormats - and probably other "fragile" libraries. It seems that there could be a lot of analogous modules, which simply call ImageIO.read or use 1st from ImageIO.getImageReadersByFormatName("jpeg") for reading JPEG stream, believing that the results will be stable (let's say I see the same code in JAI ImageIO TIFFJPEGDecompressor, but have not enough time to test this thoroughly). I cannot agree that it is a good idea from your side to change default behavior of JPEG color space recognition, implemented by standard JPEGImageReader by Sun company and described in official Oracle documentation. At least, you could create a corresponding bug issue in Oracle and to ask them to change this behavior and documentation from their side, if you are sure that it should be changed.

As I said, I already fixed this problem in my version of SCIFIO reader and hope that they will include my fix into their future versions. Maybe, you also could perform some efforts to minimize problems for possible users of TwelveMonkeys, which also needs another libraries that depend on ImageReader behavior? System property seems possible, but not enough solution, because not all applications have access to system properties - for example, WAR applications should work in already customized JRE. It seems mostly safe to restore standard behavior (standard for Java API, not for libjpeg) and provide some public interface, allowing to change it to your variant - for rare users, which really need this form of JPEG decoding.

By the way, while analysis of your code, I see that you correctly process a lot of rare cases, not checked in SCIFIO, like 16-bit YCbCr LZW TIFF etc. But the set of examples, which you provide inside this project, is not too large. In particular, now I'm searching for some example of YCbCr picture with sub-sampling and with "old" loss-less compressions like LZW/Deflate/Uncompressed, which has odd sizes, for example, 511x501 (not aligned by sub-sampling block sizes 2x2, 2x1, 4x2 etc.) I saw the corresponding processing in your YCbCrUpsamplerStream class ("padding" field). You included ycbcr-cat.tif example (a known example to libtiff library), but it has even width 250. Could you help me to find necessary "problematic" picture with non-even width or, maybe, could you advise me a tool that could create such a picture (I still didn't find such utility)? Sorry that I write this question in this issue.

from twelvemonkeys.

Daniel-Alievsky avatar Daniel-Alievsky commented on June 16, 2024

...Stop, update! I've tested JAI ImageIO together with TwelveMonkeys. Usage is simple:

   TIFFImageReader r = new TIFFImageReader(new TIFFImageReaderSpi());
        r.setInput(stream);
         . . . .
        bi = r.read(0, param);

Full source code: https://github.com/scichains/scichains-maps/blob/main/src/test/java/net/algart/matrices/io/formats/tests/JAIReadTiffTest.java (maybe I'll simplify it in future).

If we use this call for reading CMU-1-Small-Region.svs from OpenSlide demo examples, all works fine. But if I simultaneously add dependence on TwelveMonkeys, I see invalid "magenta" (or "pink") colors!

This example used dependence on JAI ImageIO clone, used by SCIFIO:

                <dependency>
			<groupId>io.scif</groupId>
			<artifactId>scifio-jai-imageio</artifactId>
		</dependency>

But the same behavior will be if we use (together with TwelveMonkeys) the "official" JAI ImageIO via the following dependence:

        <dependency>
            <groupId>com.github.jai-imageio</groupId>
            <artifactId>jai-imageio-core</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>com.twelvemonkeys.imageio</groupId>
            <artifactId>imageio-jpeg</artifactId>
            <version>3.9.4</version>
        </dependency>
        <dependency>
            <groupId>com.twelvemonkeys.imageio</groupId>
            <artifactId>imageio-tiff</artifactId>
            <version>3.9.4</version>
        </dependency>

I tested this in a separate project, you may do the same. Without TwelveMonkeys JAI TIFFImageReader class works well, together with TwelveMonkeys it stops working normally.

It seems it is not a problem to find a lot of other examples of libraries, incompatible (in principle!) with your library due to direct usage of Java AWT ImageReader - https://github.com/geotools/ ? Something else? It seems to be not a good idea to simply ignore users of all such libraries

from twelvemonkeys.

Daniel-Alievsky avatar Daniel-Alievsky commented on June 16, 2024

By the way, what is the reason to create YCbCr JPEG without sub-sampling?.. The only aspect where YCbCr format is better than usual RGB is possibility to reduce amount of Cb and Cr samples, because they are not so important for human as intensity (Y). If there is no actual sub-sampling, there is no actual sense to convert a picture to YUV format before encoding by JPEG compression algorithm. And, really, some microscopes like Aperio store JPEG tiles without sub-sampling in RGB format, without conversion to YCbCr.

Probably the authors of Java API simply wanted to improve well-known detection algorithm of libjpeg, due to the fact that usually nobody write YCbCr JPEG without sub-sampling.

from twelvemonkeys.

haraldk avatar haraldk commented on June 16, 2024

Okay...

I simply offered to resolve this problem, because it seems you could be do this easily from your side.

I actually thought this was an offer to help, but I guess I was wrong then. 🀷🏻

I don't need any changes in TwelveMonkeys for myself [...]

Well, in that case I guess we can just close this issue, and stop wasting more of each others time.

I don't claim any compatibility with any of the mentioned libraries. They will work fine in most cases, except a few odd edge cases, like the one you have found.

(As a side note; we do test for JAI TIFF interoperability, but it does not cover the mentioned edge case. As I have tried to explain, it isn't possible to fix this fully on our side (as it's the JAI TIFF reader that controls the TM JPEG reader and not the other way around, in this scenario). It has to be fixed on the JAI side, if anyone cares. It's possible to work around this for TIFF tiles, as we have both suggested, but at the same time you then opt out of displaying stand-alone JPEG files in predictable colors. These are also edge cases. But if this really is a concern, why not just use the TM TIFF reader? JAI doesn't support any format that TM doesn't also support, as far as I know.)

By the way, what is the reason to create YCbCr JPEG without sub-sampling?

I agree it's an odd thing to do. Using different quantization tables, perhaps? But I suggest you read up the subject, as you don't seem to put much weight into what I say anyway. πŸ˜€

Good luck with your work fixing SCIFIO, I hope you get your changes accepted there!

from twelvemonkeys.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.