GithubHelp home page GithubHelp logo

Comments (9)

Cyan4973 avatar Cyan4973 commented on April 27, 2024

When reaching this line, if the test triggers the error code,
this is pretty bad : it means the normalized distribution is not correctly normalized.
In such case, the algorithm is right to stop processing there.

Now, if that is the right scenario, it means the next step is to understand why the normalization would fail. It could be a very specific corner case that the fuzzer is unable to produce.

It may sound a long stretch, but could it be possible to access the faulty file, for debugging ?

If the problem is the one described above, it means it's not related to the size of the file.
It might be possible to capture just the place where the problem occurs.

from zstd.

dpayne avatar dpayne commented on April 27, 2024

Sorry, I can't give out the faulty file since it contains some sensitive information. One thing to note is that after removing the line and compressing, then decompressing the file, the file is a perfect match for the original.

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

OK.
It's a pity to not be able to investigate the problem, but I'm glad it works correctly for you.
To be fair, I'm very surprised : this test was supposed to be an important sanitizer check, I took for granted that compression would necessarily behave badly if it fails.
Apparently that's not always the case.

Also, as a secondary question :
Do you have any idea why "there are more symbols than the max symbol limit" ?
This situation is not supposed to happen, so it would be interesting to understand why it does.

from zstd.

dpayne avatar dpayne commented on April 27, 2024

The comment "there are more symbols than the max symbol limit" was purely a guess from what I saw of the code. It's very likely I mis-read the code and the real issue is something else entirely.

If it helps, I can give you some gdb output, so for example here's some variables from gdb when it stops at that line

Breakpoint 1, FSE_buildCTable (CTable=0x7fffffff1d10, normalizedCounter=0x7fffffff3920, maxSymbolValue=252, tableLog=8) at ../lib/fse.c:1460
1460 return (size_t)-FSE_ERROR_GENERIC; /* Must have gone through all positions */
(gdb) p position
$1 = 49
(gdb) p maxSymbolValue
$2 = 252
(gdb) p symbol
$3 = 253

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

OK.
I see that the compression level is quite affected, since it tries to fit up to 253 symbols into a table a 256 elements. It can work, but compression ratio will suffer considerably.

symbol necessarily exits the look at maxSymbolValue+1, so this part is correct.

What is not correct is "position", which is supposed to end at "0".
Here it ends at "49".
It should be possible to know how many symbols are missing from this value. Not sure if it is very useful though.

I've made a small update of FSE within the "dev" branch of FSE.
https://github.com/Cyan4973/FiniteStateEntropy/tree/dev
Maybe it can help to solve this situation.

from zstd.

dpayne avatar dpayne commented on April 27, 2024

When using the dev branch of FSE, compression works.

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

Thanks for the feedback

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

Fix integrated into zstd "dev" branch

from zstd.

Cyan4973 avatar Cyan4973 commented on April 27, 2024

merged into master

from zstd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.