Comments (4)
Is it possible that the input buffer is being modified during the compression operation? That would explain this symptom, I think. This could happen either by some other thread touching the input buffer, or if the the CCtx's internal buffers are colliding in part with the input buffer, so that Zstd is accidentally mutating the input by writing into its CCtx.
Is the code snippet you provided literally the reproduction case you've discovered? Can you describe the environment a little more? Are there other threads running in your program?
from zstd.
Thanks for the detailed debugging.
Similar scenarios are supposed to be abundantly tested and fuzzed, so I'm surprised such an obvious bug would be able to pass through. For example, the fact that memory is allocated with malloc
, and therefore is not initialized, is not a bug: the initialization process is supposed to be cognizant of this fact, and adjust accordingly, zeroing some memory segments only when necessary.
In the case of the FSE tables, it should not be necessary: we expect these tables to be written to first, during block statistics stage. So even if this memory contains garbage data, it should not matter.
Anyway, another important detail is that v1.5.2
is > 2 years old, and our code base has evolved since.
Would you be able to repeat the experience using the current source code in dev
branch ?
from zstd.
Thanks for the detailed debugging.
Similar scenarios are supposed to be abundantly tested and fuzzed, so I'm surprised such an obvious bug would be able to pass through. For example, the fact that memory is allocated with
malloc
, and therefore is not initialized, is not a bug: the initialization process is supposed to be cognizant of this fact, and adjust accordingly, zeroing some memory segments only when necessary. In the case of the FSE tables, it should not be necessary: we expect these tables to be written to first, during block statistics stage. So even if this memory contains garbage data, it should not matter.Anyway, another important detail is that
v1.5.2
is > 2 years old, and our code base has evolved since.Would you be able to repeat the experience using the current source code in
dev
branch ?
i dont know how, but it does happen in my environment, and i cant repeat it in dev branch(even in my other dev environment), its strange.
A few hours ago I was able to more accurately debug that it was the zc->blockState.nextCBlock block that was the problem. If I use memset to clear this memory but all workspace, the program can also complete the compression normally.
// zstd_compress.c : 1910
zc->blockState.nextCBlock = (ZSTD_compressedBlockState_t*) ZSTD_cwksp_reserve_object(ws, sizeof(ZSTD_compressedBlockState_t));
RETURN_ERROR_IF(zc->blockState.nextCBlock == NULL, memory_allocation, "couldn't allocate nextCBlock");
//memset(zc->blockState.nextCBlock, 0, ZSTD_cwksp_align(sizeof(ZSTD_compressedBlockState_t), sizeof(void*)));
So I guess this piece of junk value might be playing some toxic role in the ZSTD_buildSequencesStatistics-> ZSTD_buildCTable->FSE_buildCTable_wksp function. But this piece of code is so complicated that I can't understand what is done.
/UPDATE/
I find the real problem now. In function FSE_buildCTable_wksp, we will calculate deltaNbBits and deltaFindState in the symbolTT array element in the last loop. But when normalizedCounter[s] == 0, The branch only calculates the value of deltaNbBits. So the deltaFindState may get a garbage value. Then when the program runs to FSE_encodeSymbol function(in fse.h), the program will get a Segmentation Fault.
statePtr->value = stateTable[ (statePtr->value >> nbBitsOut) + symbolTT.deltaFindState];
from zstd.
when normalizedCounter[s] == 0,
it means the symbol s
should not be present at all, not even once.
If s
is nonetheless found as part of the input, then indeed there will be a pretty big problem: it's an non-codable event.
But this should not happen, because prior to calculating the tables, the process starts by histogramming the whole input. So no symbol should be missing. Only symbols which are confirmed absent will receive a weight of 0
.
This is a very blatant issue, and the issue board would have witnessed mountains of segv and support requests if it was present in an earlier release such as v1.5.2
.
from zstd.
Related Issues (20)
- We need a ZStd JavaScript library HOT 1
- Compiler warnings present when integrated with Swift Package Manager
- will zstd get nvcomp acceleration or a gpu support like g-brotli? HOT 1
- How can I change the window size? HOT 1
- Increase minimum C standard from C89/C90 to C11 HOT 3
- New zstd 1.5.5 version is two times slower in compression speed than older 1.4.5 version HOT 12
- ZSTD with T option does not scale on multicore CPUs HOT 7
- Question about FSE Huffman literal part
- C++ Builder and mem.h ambiguity HOT 7
- Reducing DCtx Size for Embedded Systems (like esp32) HOT 2
- Automatic code formatting? HOT 1
- No `uncompressed` and `ratio` information in `zstd --list` output if the zstd file is created via pipe HOT 3
- lz4 "legacy" format support HOT 1
- Add common file types that are compressed to ' --exclude-compressed' HOT 3
- windows
- compressing files containing multiple similar portions HOT 5
- Using ZSTD_compressBound for Streaming Input HOT 4
- head file not found HOT 1
- Support history buffers in zstd hardware acceleration HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.