Comments (8)
@mtxrym, so far, I don't experience any major hash collision with my test data files. Please let me know if you discover a problem.
from cpp-fstlib.
it's not about hash collision; transitions size is larger than 1024, buff is not large enough, segment fault
from cpp-fstlib.
@mtxrym, thanks for more detailed information. Fortunately, the transition size never goes beyond 256, because arc
is char
and the transitions cannot have the same arc entry more than once in fst structure.
But buff
also includes id
(uint32
) and output
(value size depends on output_t
) and state_output
(same as output
). It might exceed 1024 when transitions contain many arcs with long string output though, it's a rare case with usual word dictionaries I think. Just in case it happens, I'll make a change to throw an exception when data exceeds the buff size. Thanks for focusing on that.
One question. Did you really experience that situation? If so, what kind of data did you use?
from cpp-fstlib.
just terms with averge 30~40bytes , largest 100 bytes(with chinese character), 400w together to compile, it encounters coredump with buf_len is 1058, exceed 1024
from cpp-fstlib.
What does it mean by 400w together to compile
? Also in your case, how much buffer size is needed for your data, 2048, 4096?
from cpp-fstlib.
it seems 2048 is enough;
fst::compile<uint32_t>(items, m_term_str, true)
items is vector size ablout 400w
from cpp-fstlib.
@mtxrym, thanks for the more information. I'll make a necessary change to fix it soon. Thanks!
from cpp-fstlib.
@mtxrym, the current master doesn't have this bug anymore. Thanks for your contribution!
from cpp-fstlib.
Related Issues (10)
- how to support prefix search in cpp-fstlib? HOT 1
- how to dynamically add new items after fst::compile called ? HOT 1
- conditional logical bug when calculating output HOT 5
- Problem to make this lib support uint64
- Support 'set' HOT 1
- Revised unit test HOT 1
- Support 'uint64_t'
- `decompile` command support HOT 1
- is it possible to use mmap lazy initilize for search to reduce memory usage? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cpp-fstlib.