petar-dambovaliev / aho-corasick Goto Github PK
View Code? Open in Web Editor NEWefficient string matching in Golang via the aho-corasick algorithm.
License: MIT License
efficient string matching in Golang via the aho-corasick algorithm.
License: MIT License
Hi,
I really like the implementation but would love to have a FindAll(haystack []byte)
and also a Build(patterns [][]byte)
. I saw that all patterns and haystacks are converted to []byte
internally. What do you think about accepting []byte
by default and have the string
methods call these? E.g. FindAllByte(haystack []byte)
and then FindAll(haystack string) { FindAllByte([]byte(haystack)) }
.
could you please build a go.mod
file so that I could download and manage it efficiently?
Hi, I found this while looking for a lower memory usage alternative to anknown/ahocorasick.
I have a dataset of around 6 million strings. The total memory usage, as shown by pprof, after building the automaton is just over 30GB, compared to 6.5GB for the anknown version.
Do you have any tips for working out why it's using so much more RAM?
Thanks in advance.
Hi, thanks for the work on this. I'm having a few issues with the Opts struct (I'm relatively new to Go). I seem to get an exported issue with it, so wondering if the fields should be in caps ?
I experience incorrect results when some of the patterns are overlapping in characters.
It only happens when the MatchOnlyWholeWords is option is true. (if set to true combined with matchKind standard, it results in only one match)
I expect the below test to pass.
func TestOverlappingPatterns(t *testing.T) {
trieBuilder := aho_corasick.NewAhoCorasickBuilder(aho_corasick.Opts{
MatchOnlyWholeWords: true,
MatchKind: aho_corasick.LeftMostLongestMatch,
DFA: false,
})
patterns := []string{"phonebook", "the phone"}
trie := trieBuilder.Build(patterns)
result := trie.FindAll("I'll look into the phonebook")
if len(result) == 0 {
t.Error("Did not find match in string")
t.FailNow()
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.