auburnsounds / gamut Goto Github PK

View Code? Open in Web Editor NEW

41.0 2.0 2.0 1.51 MB

Image encoding and decoding library for D. Detailed layout control. Experimental codec QOIX.

License: Boost Software License 1.0

D 100.00%

image jpeg png qoi bc7 dds qoix tga gif

gamut's Introduction

Gamut

Gamut (DUB package: gamut) is an image decoding/encoding library for D.

Inspired by the FreeImage design, the Image concept is monomorphic and can do it all.

Gamut tries to have the fastest and most memory-conscious image decoders available in pure D code. It is nothrow @nogc @safe for usage in -betterC and in disabled-runtime D.

Decoding

PNG: 8-bit and 16-bit, L/LA/RGB/RGBA
JPEG: 8-bit, L/RGB/RGBA, baseline and progressive
JPEG XL: 8-bit, RGB (no alpha support), encoded with cjxl -e 4 or lower
TGA: 8-bit, indexed, L/LA/RGB/RGBA
GIF: indexed, animation support
BMP: indexed 1/4/8-bit no-RLE, 8-bit RGB/RGBA
QOI: 8-bit, RGB/RGBA
QOIX: 8-bit, 10-bit, L/LA/RGB/RGBA. Improvement upon QOI. This format may change between major Gamut tags, so is not a storage format.

Encoding

PNG. 8-bit, 16-bit, L/LA/RGB/RGBA
JPEG: 8-bit, greyscale/RGB, baseline
TGA: 8-bit, RGB/RGBA
GIF: 8-bit, RGBA, animation support
BMP: 8-bit, RGB/RGBA
QOI: 8-bit, RGB/RGBA
QOIX: 8-bit, 10-bit, L/LA/RGB/RGBA, premultiplied alpha
DDS: BC7 encoded, 8-bit, RGB/RGBA

Changelog

v3 Added premultiplied alpha pixel types. BREAKING.
- Decoders are now allowed to return any type if you do not specify LOAD_PREMUL or LOAD_NO_PREMUL. Update your loading code.
- Introduce image.premultiply() and image.unpremultiply().
- QOIX supports encoding premultiplied. Saves space and decoding times for transparent overlays.
v2.6 Added JPEG XL input. 8-bit, no alpha, cjxl --effort 4 or lower, raw streams not ISO BMFF.
v2.5 Added BMP input.
v2.4 Added BMP output.
v2.3 Added GIF input and GIF output. Added multilayer images.
v2.2 Added 16-bit PNG output.
v2.1 Added TGA format support.
v2 QOIX bitstream changed. Ways to disown and deallocate image allocation pointer. It's safe to update to latest tag in the same major version. Do keep a 16-bit source in case the bitstream changes.
v1 Initial release.

Why QOIX?

Our benchmark results for 8-bit color images:

Codec	decode mpps	encode mpps	bit-per-pixel
PNG (stb)	89.73	14.34	10.29693
QOI	201.9	150.8	10.35162
QOIX	179.0	125.0	7.93607

QOIX and QOI generally outperforms PNG in decoding speed and encoding speed.
QOIX outperforms QOI in compression efficiency at the cost of speed:
- because it's based upon better intra predictors
- because it is followed by LZ4, which removes some of the QOI worst cases.
QOIX adds support for 8-bit greyscale and greyscale + alpha images, with a "QOI-plane" custom codec.
QOIX adds support for 10-bit images, with a "QOI-10b" custom codec. It drops the last 6 bits of precision (lossy) to outperform PNG 16-bit in every way for some use cases.
QOIX support for premultiplied alpha brings even more speed and compression for transparent images.

Use the convert tool to encode QOIX.

Gamut API documentation

1. `Image` basics

Key concept: The Image struct is where most of the public API resides.

1.1 Get the dimensions of an image:

Image image = Image(800, 600);
int w = image.width();
int h = image.height();
assert(w == 800 && h == 600);

1.2 Get the pixel format of an image:

Image image = Image(800, 600);
PixelType type = image.type();
assert(type == PixelType.rgba8); // rgba8 is default if not provided

Key concept: PixelType completely describes the pixel format, for example PixelType.rgb8 is a 24-bit format with one byte for red, green and blue components each (in that order). Nothing is specified about the color space though.

Here are the possible PixelType:

enum PixelType
{
    l8,
    l16,
    lf32,
    
    la8,
    la16,
    laf32,
    lap8,
    lap16,
    lapf32,

    rgb8, 
    rgb16,
    rgbf32,

    rgba8,
    rgba16,
    rgbaf32
    rgbap8,
    rgbap16,
    rgbapf32
}

For now, all pixels format have one to four components:

1 component is implicitely Greyscale
2 components is implicitely Greyscale + alpha
3 components is implicitely Red + Green + Blue
4 components is implicitely Red + Green + Blue + Alpha

Bit-depth: Each of these components can be represented in 8-bit, 16-bit, or 32-bit floating-point (0.0f to 1.0f range).

Alpha premultiplication: When an alpha channel exist, both premultiplied and non-premultiplied variants exist.

1.3 Create an image:

Different ways to create an Image:

create() or regular constructor this() creates a new owned image filled with zeros.

createNoInit() or setSize() creates a new owned uninitialized image.

createViewFromData() creates a view into existing data.

createNoData() creates a new image with no data pointed to (still has a type, size...).

// Create with zero initialization.
Image image = Image(640, 480, PixelType.rgba8); 
image.create(640, 480, PixelType.rgba8);

// Create with no initialization.
image.setSize(640, 480, PixelType.rgba8);
image.createNoInit(640, 480, PixelType.rgba8);

// Create view into existing data.
image.createViewFromData(data.ptr, w, h, PixelType.rgb8, pitchbytes);

At creation time, the Image forgets about its former life, and leaves any isError() state or former data/type
Image.init is in isError() state
isValid() can be used instead of !isError()
Being valid == not being error == having a PixelType

2. Loading and saving an image

2.1 Load an `Image` from a file:

Another way to create an Image is to load an encoded image.

Image image;
image.loadFromFile("logo.png");
if (image.isError)
    throw new Exception(image.errorMessage);

You can then read width(), height(), type(), etc...

There is no exceptions in Gamut. Instead the Image itself has an error API:

bool isError() return true if the Image is in an error state. In an error state, the image can't be used anymore until recreated (for example, loading another file).

const(char)[] errorMessage() is then available, and is guaranteed to be zero-terminated with an extra byte.

2.2 Load an image from memory:

auto pngBytes = cast(const(ubyte)[]) import("logo.png"); 
Image image;
image.loadFromMemory(pngBytes);
if (!image.isValid) 
    throw new Exception(image.errorMessage());

Key concept: You can force the loaded image to be a certain type using LoadFlags.

Here are the possible LoadFlags:

LOAD_NORMAL      // Default: preserve type from original.

LOAD_ALPHA       // Force one alpha channel.
LOAD_NO_ALPHA    // Force zero alpha channel.

LOAD_GREYSCALE   // Force greyscale.
LOAD_RGB         // Force RGB values.

LOAD_8BIT        // Force 8-bit `ubyte` per component.
LOAD_16BIT       // Force 16-bit `ushort` per component.
LOAD_FP32        // Force 32-bit `float` per component.

LOAD_PREMUL      // Force premultiplied alpha representation (if alpha exists)
LOAD_NO_PREMUL   // Force non-premultiplied alpha representation (if alpha exists)

Example:

Image image;  
image.loadFromMemory(pngBytes, LOAD_RGB | LOAD_ALPHA | LOAD_8BIT | LOAD_NO_PREMUL);  // force PixelType.rgba8

Not all load flags are compatible, for example LOAD_8BIT and LOAD_16BIT cannot be used together.

2.3 Save an image to a file:

Image image;
if (!image.saveToFile("output.png"))
    throw new Exception("Writing output.png failed");

Key concept: ImageFormat is simply the codecs/containers files Gamut encode and decodes to.

enum ImageFormat
{
    unknown,
    JPEG,
    PNG,
    QOI,
    QOIX,
    DDS,
    TGA,
    GIF,
    JXL
}

This can be used to avoid inferring the output format from the filename:

Image image;
if (!image.saveToFile(ImageFormat.PNG, "output.png"))
    throw new Exception("Writing output.png failed");

2.4 Save an image to memory:

Image image;
ubyte[] qoixEncoded = image.saveToMemory(ImageFormat.QOIX);
scope(exit) freeEncodedImage(qoixEncoded);

The returned slice must be freed up with freeEncodedImage.

3. Accessing image pixels

3.1 Get the row pitch, in bytes:

int pitch = image.pitchInBytes();

Key concept: The image pitch is the distance between the start of two consecutive scanlines, in bytes. IMPORTANT: This pitch can be negative.

3.2 Access a row of pixels:

void* scan = image.scanptr(y);  // get pointer to start of pixel row
void[] row = image.scanline(y); // get slice of pixel row

Key concept: The scanline is void* because the type it points to depends upon the PixelType. In a given scanline, the bytes scan[0..abs(pitchInBytes())] are all accessible, even if they may be outside of the image (trailing pixels, gap bytes for alignment, border pixels).

3.3 Iterate on pixels:

assert(image.type == PixelType.rgba16);
assert(image.hasData());
for (int y = 0; y < image.height(); ++y)
{
    ushort* scan = cast(ushort*) image.scanptr(y);
    for (int x = 0; x < image.width(); ++x)
    {
        ushort r = scan[4*x + 0];
        ushort g = scan[4*x + 1];
        ushort b = scan[4*x + 2];
        ushort a = scan[4*x + 3];
    }
}

Key concept: The default is that you do not access pixels in a contiguous manner. See 4. for layout constraints that allow you to get all pixels at once.

4. Layout constraints

One of the most interesting feature of Gamut! Images in Gamut can follow given constraints over the data layout.

Key concept: LayoutConstraint are carried by images all their life.

Example:

// Do nothing in particular.
LayoutConstraint constraints = LAYOUT_DEFAULT; // 0 = default

// Layout can be given directly at image creation or afterwards.
Image image;  
image.loadFromMemory(pngBytes, constraints); 

// Now the image has a 1 pixel border (at least).
// Changing the layout only reallocates if needed.
image.setLayout(LAYOUT_BORDER_1);

// Those layout constraints are preserved.
// (but: not the excess bytes content, if reallocated)
image.convertToGreyscale();
assert(image.layoutConstraints() == LAYOUT_BORDER_1);

Important: Layout constraints are about the minimum guarantee you want. Your image may be more constrained than that in practice, but you can't rely on that.

If you don't specify LAYOUT_VERT_STRAIGHT, you should expect your image to be possibly stored upside-down, and account for that possibility.
If you don't specify LAYOUT_SCANLINE_ALIGNED_16, you should not expect your scanlines to be aligned on 16-byte boundaries, even though that can happen accidentally.

Beware not to accidentally reset constraints when resizing:

// If you do not provide layout constraints, 
// the one choosen is 0, the most permissive.
image.setSize(640, 480, PixelType.rgba8, LAYOUT_TRAILING_3);

4.1 Scanline alignment

Scanline alignment guarantees minimum alignment of each scanline.

LAYOUT_SCANLINE_ALIGNED_1 = 0
LAYOUT_SCANLINE_ALIGNED_2
LAYOUT_SCANLINE_ALIGNED_4
LAYOUT_SCANLINE_ALIGNED_8
LAYOUT_SCANLINE_ALIGNED_16
LAYOUT_SCANLINE_ALIGNED_32
LAYOUT_SCANLINE_ALIGNED_64
LAYOUT_SCANLINE_ALIGNED_128

4.2 Layout multiplicity

Multiplicity guarantees access to pixels 1, 2, 4 or 8 at a time. It does this with excess pixels at the end of the scanline, but they need not exist if the scanline has the right width.

LAYOUT_MULTIPLICITY_1 = 0
LAYOUT_MULTIPLICITY_2
LAYOUT_MULTIPLICITY_4
LAYOUT_MULTIPLICITY_8

Together with scanline alignment, this allow processing a scanline using aligned SIMD without processing the last few pixels differently.

4.3 Trailing pixels

Trailing pixels gives you up to 7 excess pixels after each scanline.

LAYOUT_TRAILING_0 = 0
LAYOUT_TRAILING_1
LAYOUT_TRAILING_3
LAYOUT_TRAILING_7

Allows unaligned SIMD access by itself.

4.4 Pixel border

Border gives you up to 3 excess pixels around an image, eg. for filtering.

LAYOUT_BORDER_0 = 0
LAYOUT_BORDER_1
LAYOUT_BORDER_2
LAYOUT_BORDER_3

4.5 Forcing pixels to be upside down or straight

Vertical constraint forces the image to be stored in a certain vertical direction (by default: any).

LAYOUT_VERT_FLIPPED
LAYOUT_VERT_STRAIGHT

4.6 Gapless pixel access

The Gapless constraint force the image to have contiguous scanlines without excess bytes.

LAYOUT_GAPLESS

If you have both LAYOUT_GAPLESS and LAYOUT_VERT_STRAIGHT, then you can access a slice of all pixels at once, with the ubyte[] allPixelsAtOnce() method.

image.setSize(640, 480, PixelType.rgba8, LAYOUT_GAPLESS | LAYOUT_VERT_STRAIGHT);
ubyte[] allpixels = image.allPixelsAtOnce(y);

LAYOUT_GAPLESS is incompatible with constraints that needs excess bytes, like borders, scanline alignment, trailing pixels...

5. Geometric transforms

Gamut provides a few geometric transforms.

Image image;
image.flipHorizontal(); // Flip image pixels horizontally.
image.flipVertical();   // Flip image vertically (pixels or logically, depending on layout)

6. Multi-layer images

6.1 Create multi-layer images

All Image have a number of layers.

Image image;
image.create(640 ,480);
assert(image.layers == 1); // typical image has one layer
assert(image.hasOneLayer);

Create a multi-layer image, cleared with zeroes:

// This single image has 24 black layers.
image.createLayered(800, 600, 24); 
assert(image.layers == 24);

Create a multi-layer uninitialized image:

// Make space for 24 800x600 rgba8 different images.
image.createLayeredNoInit(800, 600, 24);
assert(image.layers == 24);

Create a multi-layer as a view into existing data:

// Create view into existing data.
// layerOffsetBytes is byte offset between first scanlines 
// of two consecutive layers.
image.createLayeredViewFromData(data.ptr, 
                                w, h, numLaters, 
                                PixelType.rgb8, 
                                pitchbytes,
                                layerOffsetBytes);

Gamut Image is secretly similar to 2D Array Texture in OpenGL. Each layer is store consecutively in memory.

6.2 Get individual layer

image.layer(int index) return non-owning view of a single-layer.

Image image;
image.create(640, 480, 5);
assert(image.layer(4).width  == 640);
assert(image.layer(4).height == 480);
assert(image.layer(4).layers ==   1);

Key concept: All image operations work on all layers by default.

Regarding layout: Each layer has its own border, trailing bytes... and follow the same layout constraints. Moreover, LAYOUT_GAPLESS also constrain the layers to be immediately next in memory, without any byte (like it constrain the scanlines). The layers cannot be stored in reverse order.

6.2 Get sub-range of layers

image.layerRange(int start, int stop) return non-owning view of a several layers.

6.3 Access layer pixels

Get a pointer to a scanline:

// Get the 160th scanline of layer 2.
void* scan = image.layerptr(2, 160);

Get a slice of a whole scanline:

// Get the 160th scanline of layer 2.
void[] line = image.layerline(2, 160);

Actually, scanptr(y) and scanline(y) only access the layer index 0.

// Get the 160th scanline of layer 0.
void* scan = image.scanptr(160);
void[] line = image.scanline(160);

Key concept: First layer has index 0.

Consequently, there are two ways to access pixel data in Image:

// Two different ways to access layer pixels.
assert(image.layer(2).scanline(160) == image.layerline(2, 160)

The calls:

image.layerptr(layer, y)

image.layerline(layer, y)

are like:

image.scanptr(y)

image.scanline(y)

but take a layer index.

gamut's People

Contributors

Stargazers

Watchers

Forkers

mrcsnm gizmomogwai

gamut's Issues

Crash qoix with DMD -O

crash inside convertScanlines to la8
also with rgba8
hard to reduce...

scanline() should return void* not ubyte*

This makes it clearer that the type is monomorphic, else it's accidentally correct for ubyte format and too easy to mess up

More work on QOI-10b

write bits 2 by 2
read bits 2 by 2
decode lines in a double-buffered qoi10_rgba_t[] row, so that the stream channels and decoded channels can be different. Allocate that double-buffer with the malloc again. Only then we can support the load flags internally.
speed-up opcodes in decoding loop
quality: some opcode can be tweaked for a better compromise

JPEG loading speedup possible?

Graillon diffuse takes 34ms to decode, isn't it a bit much for a 1332 x 1276?

JPGD_SUPPORT_FREQ_DOMAIN_UPSAMPLING adds 7ms by itself

QOIX: LZSSE?

https://github.com/ConorStokes/LZSSE

Possibly better than current, but the current is plenty fast. So it's useful if better ratio.

Color representation

a struct that represent a color, in various color spaces.

When we write image.fillWithColor(col);, col is converted to the right space.

Demand pixel access

Suppose that before using pixels, we call a lock/unlock sequence similar to https://wiki.libsdl.org/SDL_LockSurface
Accessing pixels would only be allowed within these calls.

Then with a callback and the user pointer method, we can support any kind of lazy on-demand image.

Ideally a subimage access can be granted.
Ideally this match GPU buffer access.

Add View

Creating a subimage / view shouldn't allocate, unless you manually copy metadata etc.

BC7 decoder?

There is one public domain here:
https://github.com/iOrange/bcdec/blob/main/bcdec.h

More usability issues

Emulate the imageformats functions as public API.

Main problem is just that I could not be ensured that the order is what I needed
so I needed to add a lot more code to make sure, eg. that all data was interleaved RGB(A)
maybe a toInterleaved8 and toInterleaved16 function pair that does the heavy lifting would make it easier...
together with a format parameter to specify whether conversion to RGB/RGBA should happen

you get:
channel count
bits per channel (8 or 16)
functions for auto-converting to the format you need if need be
Of course, for GPU-side compressed formats I'd rather need the raw binary blob together with width and height

imgformats makes it really easy to get the pixels out

All the extra flags you need to pass does make the API slightly more hard to wrap ones head around I'll be honest

Implement premultiplied alpha in PixelType

A. If we add 6 new pixel types (lap, rgbap for 8-bit, 16-bit and 32-bit FP), then we can support pre-multiplied with quite a bit of changes. The interesting thing is that PNG with the unspecified ipHone PNG does support premultiplied

B. Decode unpremultiplied always, do not keep track of status, just provide ways to do it and undo it at the discretion of the user. This is simpler, but people will not get the advantages of premultiplied or less often.
Does premultiplied alpha encodes better, and how much better in terms of space? Check with PNG and QOIX. => YES, 2x better for some QOIX overlays!
Implement function to premultiply, or unpremultiply. This is the easy part, already exist in Dplug ImageKnob.

Big question is (above) whether to keep that information in the type system (PixelType) or not. This is NOT a LayoutOptions, and I don't think it should be a separate status also. Premultiplied alpha is better for compositing, but not for editing. It also looses a bit of data, but it repeatable.

TGA decoder doesn't check overflow and allocate blindly.

QOIX with LZ4 is buggy

Idea: try with squiz-box at first (doesn't build)
LZ4 currently disabled

LayoutOptions to control allocations.

The LayoutOptions would be an optional struct, allowing to control sitting of the pixel data.

In Dplug, OwnedImage has different constraints that can be combined.

- rowAlignment 
- xMultiplicity
- trailingSamples

~~This wouldn't be kept after allocation~~. ~~Cloning would not preserve those constraints....~~ ~~unless it is kept through a pointer?~~
~~I don't think borders should be in.~~

kept in a bit field, in the struct
preserve through layout compatibility detection
borders are in

Image decoders should return an allocation made from a given LayoutOptions.

method to allocate and reallocate an image with a Layout options
Image keeps the original allocation adress somewhere (because of rowAlignment).
QOI decoder takes a LayoutOptions and follows it
QOIX decoder takes a LayoutOptions and follows it
PNG decoder takes a LayoutOptions and follows it
JPEG decoder takes a LayoutOptions and follows it

The need is that some image processing need to access by 4, on aligned boundaries.
dplug:canvas also need 3 trailing bytes after each line.

Please provide a way to gracefully fail for corrupt jpegs

First of all thanks for this excellent library. I was able to replace imagefmt quickly with your library (which supports more imageformats and variants.

While testing I encountered assert(false calls in stop_decoding (e.g. trying to load a jpeg from an empty file).
It would be great if the error handling could be done on application level and the load in this case would return an invalid image.

I created a local patch for that (by removing the assert and replacing all calls to stop_decoding with stop_decoding and a return, but I am not sure that this is the best way.

qoix example => bad image in arm64

commit 2bc035c4b4384e460ffad7e9fa32b24918648e0d seems to have introduced a regression in jpeg decoding

I found an joeg in a jpeg testsuite, that works with version 2.1.1, but not with 2.1.2 or 2.1.3.
I uploaded the image to https://github.com/gizmomogwai/gamut/blob/main/examples/test-suite/test-images/316be81dfdeeb942e904feb3a77f4f83.jpg
When I run the decoder on it with newer versions I get just grey output, with 2.1.1 all is as expected.

Support negative pitch

It is a layout options.

Improve image typing

no-type is similar to... errored() state. Perhaps should merge the two types. AN Image is created in errored() state, since it has no type. Should do that and also make sure going into error state kills any former use of the image.
hasPlainPixels should maybe be isPlainPixels(). => done
isPlainPixels, isPlanar and isCompressed should maybe not depend upon hasData() => yes, preferably

Status of some layout constraints when not owned

can LAYOUT_BORDER pixel be changed? yes
- when not owned too? no
- what does the subrect operation does to the constraint? => it keeps the constraint
can LAYOUT_TRAILING pixels be changed? yes
- when not owned too? no
- what does the subrect operation does to the constraint? => it keeps the constraint
can LAYOUT_MULTIPLICITY pixel be changed? yes
- when not owned too? => no
- what does the subrect operation does to the constraint? => it removes the constraint because subrect
can bytes between scanlines in LAYOUT_SCANLINE_ALIGNED be changed? yes
- when not owned too? => no
- what does the subrect operation does to the constraint? => it removes the constraint because subrect
is LAYOUT_GAPLESS preserved by subrect => it removes the constraint

QOIX in uint8 or la8 mode is not compressing at all

We must adapt the opcode to provide some compression in case of uint8 or la8 format.

At the moment our opcode are:

enum int QOI_OP_LUMA   = 0x00; /* 0xxxxxxx */
enum int QOI_OP_INDEX  = 0x80; /* 10xxxxxx */
enum int QOI_OP_LUMA2  = 0xc0; /* 110xxxxx */
enum int QOI_OP_LUMA3  = 0xe0; /* 11100xxx */
enum int QOI_OP_ADIFF  = 0xe8; /* 11101xxx */
enum int QOI_OP_RUN    = 0xf0; /* 11110xxx */
enum int QOI_OP_RUN2   = 0xf8; /* 111110xx */
enum int QOI_OP_GRAY   = 0xfc; /* 11111100 */
enum int QOI_OP_RGB    = 0xfd; /* 11111101 */
enum int QOI_OP_RGBA   = 0xfe; /* 11111110 */
enum int QOI_OP_END    = 0xff; /* 11111111 */

but nothing here really helps with a uint8 map or la8 map.

Finalize QOIX format for v1.0

Compression with LZ4 should be optional. If the compression isn't better, leave it out. The QOIX header should have a field for compression type.
Worth it to remove Big Endian fields? => no

Bump version to include 8 bit support?

Hey, I stumbled over this awesome library, thanks a lot for writing it!
Would you mind tagging a new version so that dub users like me can have 8 bit support for png's? I've seen #21, so I understand if the answer is no for now...

Per-component in-place shuffle

A way to free encoded image without importing core.stdc.srdlib: free

Using an API shall not mandate to import C stdlib.

BC7

GAMUT_MAX_IMAGE_WIDTH_x_HEIGHT seems to be too restrictive

I was playing with rather big images, and stumbled over this constant.
I set it now to 0x7fff_ffff / 16 although I am no sure if gamut supports more than 32bit / pixel.
GAMUT_MAX_PIXEL_SIZE seems to indicate that one pixel could even be worth 16 bytes?

Reuseing allocation when?

try to reuse former allocation when chaining create functions, often the former allocation is forgotten for no reason... But reusing the allocation is perhaps a prime reason to reuse the Image?
allocating storage should return a slice, and take a slice for former allocation

Is realloc worth it vs free+malloc? Sometimes yes.

 // PERF: on Windows, reusing previous allocation is much faster for same alloc size
    //       314x faster             vs free+malloc for same size
    //       10x faster              vs free+malloc for decreasing size 1 by 1
    //       424x slower (quadratic) vs free+malloc for increasing size 1 by 1
    //       0.5x slower             vs free+malloc for random size

If we store the allocation as slice, we could second guess realloc().

QOIX format

For now it's a copy of the QOI format. Many ideas of thing to implement.

Based on "kodak" photos, analysis of OP used in vanilla QOI:

est-images\kodak\kodim01.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         39.2       267.9         10.02          1.47    948.83    1.0
qoi          7.3        10.9         53.95         35.98    728.66    0.8
 * OP_INDEX =     18.2% of pixels,      8.6% of size
 * OP_DIFF  =      5.7% of pixels,      2.7% of size
 * OP_LUMA  =     69.4% of pixels,     65.8% of size
 * OP_RUN   =      7.0% of pixels,      2.4% of size
 * OP_RGB   =     10.8% of pixels,     20.5% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.3        10.9         53.95         35.98    728.66    0.8

test-images\kodak\kodim02.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         31.4       257.9         12.52          1.52    861.82    1.0
qoi          7.1        10.9         55.10         35.97    642.50    0.7
 * OP_INDEX =     35.4% of pixels,     17.1% of size
 * OP_DIFF  =      8.4% of pixels,      4.1% of size
 * OP_LUMA  =     59.5% of pixels,     57.4% of size
 * OP_RUN   =     11.6% of pixels,      4.1% of size
 * OP_RGB   =      9.0% of pixels,     17.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.1        10.9         55.10         35.97    642.50    0.7

test-images\kodak\kodim03.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         27.6       242.2         14.25          1.62    752.92    1.0
qoi          6.7        10.4         58.55         37.70    546.71    0.7
 * OP_INDEX =     31.5% of pixels,     20.2% of size
 * OP_DIFF  =     10.2% of pixels,      6.5% of size
 * OP_LUMA  =     41.5% of pixels,     53.3% of size
 * OP_RUN   =     22.1% of pixels,      9.4% of size
 * OP_RGB   =      4.1% of pixels,     10.5% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.7        10.4         58.55         37.70    546.71    0.7

test-images\kodak\kodim04.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         32.9       259.6         11.95          1.51    881.04    1.0
qoi          7.2        11.1         54.43         35.43    700.72    0.8
 * OP_INDEX =     16.5% of pixels,      8.4% of size
 * OP_DIFF  =      9.2% of pixels,      4.7% of size
 * OP_LUMA  =     63.3% of pixels,     64.7% of size
 * OP_RUN   =      9.2% of pixels,      3.6% of size
 * OP_RGB   =      9.1% of pixels,     18.6% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.2        11.1         54.43         35.43    700.72    0.8

test-images\kodak\kodim05.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         30.3       263.2         12.96          1.49    989.17    1.0
qoi          7.5        11.2         52.28         35.13    811.29    0.8
 * OP_INDEX =      7.7% of pixels,      3.6% of size
 * OP_DIFF  =      6.4% of pixels,      2.9% of size
 * OP_LUMA  =     66.0% of pixels,     61.3% of size
 * OP_RUN   =      5.6% of pixels,      1.9% of size
 * OP_RGB   =     16.4% of pixels,     30.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.5        11.2         52.28         35.13    811.29    0.8

test-images\kodak\kodim06.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         29.3       245.1         13.43          1.60    855.12    1.0
qoi          7.0        10.7         56.02         36.79    638.04    0.7
 * OP_INDEX =     26.2% of pixels,     13.7% of size
 * OP_DIFF  =      8.4% of pixels,      4.4% of size
 * OP_LUMA  =     64.5% of pixels,     67.5% of size
 * OP_RUN   =     11.1% of pixels,      4.3% of size
 * OP_RGB   =      4.8% of pixels,     10.1% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.0        10.7         56.02         36.79    638.04    0.7

test-images\kodak\kodim07.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         41.4       260.7          9.49          1.51    782.38    1.0
qoi          6.9        10.5         57.32         37.43    598.44    0.8
 * OP_INDEX =     28.2% of pixels,     16.5% of size
 * OP_DIFF  =      9.5% of pixels,      5.6% of size
 * OP_LUMA  =     47.2% of pixels,     55.1% of size
 * OP_RUN   =     18.4% of pixels,      7.4% of size
 * OP_RGB   =      6.6% of pixels,     15.5% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.9        10.5         57.32         37.43    598.44    0.8

test-images\kodak\kodim08.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         44.7       275.9          8.79          1.43    976.58    1.0
qoi          7.6        11.2         51.57         35.08    846.51    0.9
 * OP_INDEX =      9.9% of pixels,      4.3% of size
 * OP_DIFF  =      4.8% of pixels,      2.1% of size
 * OP_LUMA  =     63.8% of pixels,     55.2% of size
 * OP_RUN   =      5.0% of pixels,      1.4% of size
 * OP_RGB   =     21.4% of pixels,     37.0% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.6        11.2         51.57         35.08    846.51    0.9

test-images\kodak\kodim09.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         33.8       255.4         11.62          1.54    823.43    1.0
qoi          6.9        10.9         56.81         36.05    597.10    0.7
 * OP_INDEX =     43.8% of pixels,     21.6% of size
 * OP_DIFF  =      9.4% of pixels,      4.6% of size
 * OP_LUMA  =     58.8% of pixels,     57.9% of size
 * OP_RUN   =     13.2% of pixels,      5.2% of size
 * OP_RGB   =      5.5% of pixels,     10.7% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.9        10.9         56.81         36.05    597.10    0.7

test-images\kodak\kodim10.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         36.8       259.5         10.69          1.52    830.27    1.0
qoi          7.2        11.0         54.73         35.79    637.09    0.8
 * OP_INDEX =     24.9% of pixels,     13.2% of size
 * OP_DIFF  =     10.5% of pixels,      5.5% of size
 * OP_LUMA  =     62.9% of pixels,     66.5% of size
 * OP_RUN   =     10.9% of pixels,      4.5% of size
 * OP_RGB   =      4.8% of pixels,     10.2% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.2        11.0         54.73         35.79    637.09    0.8

test-images\kodak\kodim11.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         33.4       252.7         11.77          1.56    836.13    1.0
qoi          6.8        10.5         57.74         37.34    640.25    0.8
 * OP_INDEX =     27.4% of pixels,     14.4% of size
 * OP_DIFF  =      7.0% of pixels,      3.7% of size
 * OP_LUMA  =     58.7% of pixels,     61.8% of size
 * OP_RUN   =     13.5% of pixels,      4.8% of size
 * OP_RGB   =      7.3% of pixels,     15.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.8        10.5         57.74         37.34    640.25    0.8

test-images\kodak\kodim12.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         34.1       252.8         11.52          1.56    797.40    1.0
qoi          7.2        10.6         54.53         37.23    563.70    0.7
 * OP_INDEX =     41.4% of pixels,     22.9% of size
 * OP_DIFF  =     10.3% of pixels,      5.7% of size
 * OP_LUMA  =     49.5% of pixels,     54.6% of size
 * OP_RUN   =     17.9% of pixels,      7.3% of size
 * OP_RGB   =      4.3% of pixels,      9.5% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.2        10.6         54.53         37.23    563.70    0.7

test-images\kodak\kodim13.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         27.8       259.2         14.15          1.52   1051.85    1.0
qoi          7.6        11.1         51.93         35.30    824.83    0.8
 * OP_INDEX =      6.6% of pixels,      3.0% of size
 * OP_DIFF  =      3.6% of pixels,      1.6% of size
 * OP_LUMA  =     74.3% of pixels,     67.1% of size
 * OP_RUN   =      3.5% of pixels,      1.0% of size
 * OP_RGB   =     15.1% of pixels,     27.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.6        11.1         51.93         35.30    824.83    0.8

test-images\kodak\kodim14.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         33.2       260.7         11.85          1.51    946.69    1.0
qoi          7.5        10.9         52.30         36.11    722.09    0.8
 * OP_INDEX =     17.1% of pixels,      8.3% of size
 * OP_DIFF  =      7.1% of pixels,      3.5% of size
 * OP_LUMA  =     66.5% of pixels,     64.7% of size
 * OP_RUN   =      7.9% of pixels,      2.8% of size
 * OP_RGB   =     10.6% of pixels,     20.6% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.5        10.9         52.30         36.11    722.09    0.8

test-images\kodak\kodim15.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         37.2       263.6         10.56          1.49    833.25    1.0
qoi          7.1        11.0         55.03         35.85    680.70    0.8
 * OP_INDEX =     20.3% of pixels,     10.8% of size
 * OP_DIFF  =      9.7% of pixels,      5.2% of size
 * OP_LUMA  =     49.6% of pixels,     52.7% of size
 * OP_RUN   =     14.2% of pixels,      5.1% of size
 * OP_RGB   =     12.3% of pixels,     26.2% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.1        11.0         55.03         35.85    680.70    0.8

test-images\kodak\kodim16.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         26.5       239.2         14.83          1.64    786.52    1.0
qoi          6.8        10.5         57.94         37.55    558.80    0.7
 * OP_INDEX =     36.8% of pixels,     21.2% of size
 * OP_DIFF  =      8.9% of pixels,      5.1% of size
 * OP_LUMA  =     55.2% of pixels,     63.4% of size
 * OP_RUN   =     17.3% of pixels,      7.0% of size
 * OP_RGB   =      1.4% of pixels,      3.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.8        10.5         57.94         37.55    558.80    0.7

test-images\kodak\kodim17.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         32.5       255.0         12.10          1.54    849.63    1.0
qoi          7.2        10.8         54.54         36.48    627.20    0.7
 * OP_INDEX =     24.7% of pixels,     13.3% of size
 * OP_DIFF  =      9.8% of pixels,      5.3% of size
 * OP_LUMA  =     64.1% of pixels,     69.3% of size
 * OP_RUN   =     11.3% of pixels,      4.7% of size
 * OP_RGB   =      3.4% of pixels,      7.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.2        10.8         54.54         36.48    627.20    0.7

test-images\kodak\kodim18.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         29.1       265.6         13.51          1.48   1011.32    1.0
qoi          7.5        11.3         52.37         34.86    793.62    0.8
 * OP_INDEX =      9.7% of pixels,      4.4% of size
 * OP_DIFF  =      5.3% of pixels,      2.4% of size
 * OP_LUMA  =     75.5% of pixels,     68.6% of size
 * OP_RUN   =      3.2% of pixels,      1.3% of size
 * OP_RGB   =     12.8% of pixels,     23.3% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.5        11.3         52.37         34.86    793.62    0.8

test-images\kodak\kodim19.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         32.7       258.3         12.04          1.52    903.88    1.0
qoi          7.4        11.1         53.39         35.35    693.89    0.8
 * OP_INDEX =     22.4% of pixels,     10.8% of size
 * OP_DIFF  =      8.6% of pixels,      4.2% of size
 * OP_LUMA  =     67.5% of pixels,     65.0% of size
 * OP_RUN   =      7.6% of pixels,      3.2% of size
 * OP_RGB   =      8.7% of pixels,     16.8% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.4        11.1         53.39         35.35    693.89    0.8

test-images\kodak\kodim20.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         27.5       222.2         14.32          1.77    691.27    1.0
qoi          5.9         9.1         66.22         43.03    514.17    0.7
 * OP_INDEX =     45.6% of pixels,     27.8% of size
 * OP_DIFF  =      5.6% of pixels,      3.4% of size
 * OP_LUMA  =     43.6% of pixels,     53.2% of size
 * OP_RUN   =     23.4% of pixels,      5.3% of size
 * OP_RGB   =      4.2% of pixels,     10.2% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         5.9         9.1         66.22         43.03    514.17    0.7

test-images\kodak\kodim21.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         31.8       262.9         12.38          1.50    898.56    1.0
qoi          6.9        10.7         56.72         36.70    638.75    0.7
 * OP_INDEX =     40.8% of pixels,     18.7% of size
 * OP_DIFF  =      7.6% of pixels,      3.5% of size
 * OP_LUMA  =     66.2% of pixels,     60.6% of size
 * OP_RUN   =      9.4% of pixels,      3.7% of size
 * OP_RGB   =      7.4% of pixels,     13.6% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         6.9        10.7         56.72         36.70    638.75    0.7

test-images\kodak\kodim22.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         36.7       270.6         10.72          1.45    956.46    1.0
qoi          7.5        11.2         52.43         34.98    739.29    0.8
 * OP_INDEX =     16.0% of pixels,      7.5% of size
 * OP_DIFF  =      6.6% of pixels,      3.1% of size
 * OP_LUMA  =     73.7% of pixels,     68.8% of size
 * OP_RUN   =      4.9% of pixels,      2.0% of size
 * OP_RGB   =     10.0% of pixels,     18.6% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.5        11.2         52.43         34.98    739.29    0.8

test-images\kodak\kodim23.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         28.1       254.7         13.99          1.54    804.75    1.0
qoi          7.3        11.0         54.22         35.61    659.42    0.8
 * OP_INDEX =     14.9% of pixels,      8.4% of size
 * OP_DIFF  =     12.4% of pixels,      7.0% of size
 * OP_LUMA  =     57.0% of pixels,     64.6% of size
 * OP_RUN   =     12.0% of pixels,      5.2% of size
 * OP_RGB   =      6.5% of pixels,     14.8% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.3        11.0         54.22         35.61    659.42    0.8

test-images\kodak\kodim24.png
       decode ms   encode ms   decode mpps   encode mpps   size kb   rate
png         34.8       261.1         11.29          1.51    887.05    1.0
qoi          7.4        10.8         53.36         36.54    710.98    0.8
 * OP_INDEX =     12.7% of pixels,      6.8% of size
 * OP_DIFF  =      7.2% of pixels,      3.9% of size
 * OP_LUMA  =     56.9% of pixels,     61.1% of size
 * OP_RUN   =     12.2% of pixels,      3.4% of size
 * OP_RGB   =     11.5% of pixels,     24.7% of size
 * OP_RGBA  =      0.0% of pixels,      0.0% of size
qoix         7.4        10.8         53.36         36.54    710.98    0.8

Possibly better predictor than qoi2avg

That would changes qoix bitstream, how many bits does it win?

Alpha premul and LOAD_PREMULTIPLY_ALPHA

Ability to go alpha premultiplied (without going back).

Image public methods must deal with no-data, no-plainpixels, zero-height, etc.

Else correctness is not ensured.

Better constructors

I think the image initializing functions should be more readable and explicit:

    //  image.create() or regular constructor creates a new image and fills it with zeros.
    //  image.createUninitialized() or `setSize` create a new uninitialized image
    //  image.createWithNoData(); create an image with no data
    //  image.createViewFromData(); creates a view from existing data (would need a clone to own)

TGA loader/encoder

To properly become a Dplug OwnedImage, Image should expose...

a way to disown the Image data
a way to return separately the allocation pointer (free), and the first scanline position
a way to call the right free() function from Gamut for the pixel data
better semantics for null images, zero-size images, etc

Needed or handful for the API

Actually using Gamut brings the annoyingness of some names in great light.

Rename changeLayout to setLayout
image.is8Bit, image.is16Bit, image.isFP32

LZ4 1.9. vs LZ4 1.5

Current state of affairs: older LZ4 better? But bth are translated differently to D.

LZ4 1.9.4
TOTAL  decode mpps   encode mpps      bit-per-pixel
            190.72        106.50            8.19183

LZ4 1.5.0
TOTAL  decode mpps   encode mpps      bit-per-pixel
            208.40        109.73            8.14308

10-bit QOIX

New "QOI-10b" custom codec supports 1/2/3/4 channels and break byte alignment down to bits.

Implement all opcodes of qoi2avg but with 10-bit per components
proper test suite with photos and alpha and stuff
optimize number of bits for each opcode, that is possible

GAPLESS and FLIPPED load flag

The GAPLESS constraints says that there is no space between scanlines.
It is incompatible with whatever wants gaps.

Support allocators

do people want allocators just for image data or also all the various things, like metadata?
Is allocator a global? If no, images need to retain their allocation method. If that is so, we need a user pointer to give to the allocator.
The problem of a global is that then the library needs initialization, which it currently doesn't.
At this point it would be handy to store some thing like that as metadata (for example: initial allocation method).

LZ4_read_ARCH() is wrong arm64, LZ4_64bits() too

LOAD flags dont' work properly with QOI formats

They should get a type from header then convert it using load flags functions or similar. Then, if the codec supports that type, decode tis number of channels directly.

Use fpnge PNG encoder instead of stb_image

png.cpp compression compared to stb_image_write.h: 12-19x faster with roughly 5-11% avg. smaller files.

also: opens avenue for fast PNG loading, if generated by fpng, but then we would need two PNG decoders
can it do 16-bit encode? seems like it's only rgb8 and rgba8. 16-bit encode is more important than speed, since right now only QOIX supports 16-bit encodes

QOIX average predictor including alpha

For now, average prediction leaves out the alpha channel. Should we do that?

Load flags issue

Trying to load Dplug image with:

    Image rgb;
    rgb.loadFromMemory(cast(const(ubyte[])) imageDataRGB, LOAD_RGB | LOAD_8BIT | LOAD_ALPHA | LAYOUT_VERT_STRAIGHT | LAYOUT_GAPLESS);

yield wrong image, but

rgb.loadFromMemory(cast(const(ubyte[])) imageDataRGB, 0);
rgb.convertTo(PixelType.rgba8, LAYOUT_VERT_STRAIGHT | LAYOUT_GAPLESS);

works.

The image in particular is Distort diffuse image.

Scanline conversion functions

All scanline conversion loops should be their own function
They are also common in a number of codec, try to remove them from stb_image
also in: DDS encoder

Version out more codecs

Seems we compile codecs when not using them

Investigate miniz for PNG decoding/writing

miniz zlib decoder translated
we may loose Apple PNG support (Screenshots) since they are invalid PNG but accepted by stb_image decoder. The fix to get it back is to skip the zlib header on decode, when CgBI chunk is encountered (ie. what stb_image.h does)
- it doesn't seem we had it in the first place, quite buggy in stb_image.h => fixed by backporting one stb_image fix

Planar ability

Some use cases are better with planar images (arguably can increase cache locality of neibouring rows of pixels).
Some video formats necessitates those (YUV 420, 444, 422, etc).
How can we support that? Sounds like a lot of work.

Replace LZ4 with zstd?

Comparison from https://github.com/inikep/lzbench

Compressor | encode |  decode | out size | ratio (lower = better)
lz4 1.9.2             737 MB/s   4448 MB/s   100880800    47.60
zstd 1.4.3 -5         104 MB/s    932 MB/s    63993747    30.19
xz 5.2.4 -0            24 MB/s     70 MB/s    62579435    29.53
xpack 2016-06-02 -6    43 MB/s   1086 MB/s    62213845    29.35
ucl_nrv2e 1.03 -9    2.13 MB/s    429 MB/s    69645134    32.86
tornado 0.6a -5        51 MB/s    195 MB/s    64129604    30.26
lzma 19.00 -0          34 MB/s     80 MB/s    64013917    30.20
lzlib 1.11 -0          36 MB/s     61 MB/s    63847386    30.12
lzham 1.0 -d26 -0      11 MB/s    271 MB/s    64089870    30.24
lzfse 2017-03-08       90 MB/s    934 MB/s    67624281    31.91
lizard 1.0 -45        17 MB/s    1810 MB/s    67317588    31.76
lizard 1.0 -29      2.07 MB/s    2697 MB/s    68694227    32.41
fastlzma2 1.0.1 -10 3.99 MB/s     105 MB/s    48666065    22.96
brieflz 1.2.0 -8    0.46 MB/s     473 MB/s    64912139    30.63

Probably worth it to go a bit slower at decompression, much slower at encode, and to compress more.