Comments (9)
When reaching this line, if the test triggers the error code,
this is pretty bad : it means the normalized distribution is not correctly normalized.
In such case, the algorithm is right to stop processing there.
Now, if that is the right scenario, it means the next step is to understand why the normalization would fail. It could be a very specific corner case that the fuzzer is unable to produce.
It may sound a long stretch, but could it be possible to access the faulty file, for debugging ?
If the problem is the one described above, it means it's not related to the size of the file.
It might be possible to capture just the place where the problem occurs.
from zstd.
Sorry, I can't give out the faulty file since it contains some sensitive information. One thing to note is that after removing the line and compressing, then decompressing the file, the file is a perfect match for the original.
from zstd.
OK.
It's a pity to not be able to investigate the problem, but I'm glad it works correctly for you.
To be fair, I'm very surprised : this test was supposed to be an important sanitizer check, I took for granted that compression would necessarily behave badly if it fails.
Apparently that's not always the case.
Also, as a secondary question :
Do you have any idea why "there are more symbols than the max symbol limit" ?
This situation is not supposed to happen, so it would be interesting to understand why it does.
from zstd.
The comment "there are more symbols than the max symbol limit" was purely a guess from what I saw of the code. It's very likely I mis-read the code and the real issue is something else entirely.
If it helps, I can give you some gdb output, so for example here's some variables from gdb when it stops at that line
Breakpoint 1, FSE_buildCTable (CTable=0x7fffffff1d10, normalizedCounter=0x7fffffff3920, maxSymbolValue=252, tableLog=8) at ../lib/fse.c:1460
1460 return (size_t)-FSE_ERROR_GENERIC; /* Must have gone through all positions */
(gdb) p position
$1 = 49
(gdb) p maxSymbolValue
$2 = 252
(gdb) p symbol
$3 = 253
from zstd.
OK.
I see that the compression level is quite affected, since it tries to fit up to 253 symbols into a table a 256 elements. It can work, but compression ratio will suffer considerably.
symbol necessarily exits the look at maxSymbolValue+1, so this part is correct.
What is not correct is "position", which is supposed to end at "0".
Here it ends at "49".
It should be possible to know how many symbols are missing from this value. Not sure if it is very useful though.
I've made a small update of FSE within the "dev" branch of FSE.
https://github.com/Cyan4973/FiniteStateEntropy/tree/dev
Maybe it can help to solve this situation.
from zstd.
When using the dev branch of FSE, compression works.
from zstd.
Thanks for the feedback
from zstd.
Fix integrated into zstd "dev" branch
from zstd.
merged into master
from zstd.
Related Issues (20)
- Windows binaries are missing on v1.5.6 release HOT 3
- Clicking the website URL on GitHub repository displays a warning if browser is in HTTPS-only mode
- MSVC CMake build failed on v1.5.6
- v1.5.6 Windows binary downloads are double zipped HOT 4
- Raise version's in win32 binaries header HOT 3
- Why was the new release 1.5.6 removed? HOT 15
- long file names are cut off in output HOT 3
- Should zstd check archive consistency before overwriting files? HOT 1
- Should zstd delete incomplete archives? HOT 5
- 32-bit x86 build failure with 1.5.6 HOT 3
- v1.5.6 breaks 32-bit Windows clang-cl build HOT 3
- Decompress multiple zstaa backups on FAT32 drives HOT 4
- Replication of bug #3517 HOT 16
- Separate dictionary references to enable dictionary usage for any combination of window size and content size HOT 1
- Decompression speed regression in zstd 1.5.6 (win)
- Embed hash of raw dictionary in compressed resource (optionally) HOT 4
- Decompression crash after upgrading from zstd 1.4.5 to 1.5.6 HOT 12
- Missing check on failed allocation leads to NULL-ptr dereference
- libzstd.lib missed in package, also VC sample seems include wrong mem.h or ambigious including!
- Environment variable for --memory HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.