alexandrnikitin / ahocorasick.net Goto Github PK
View Code? Open in Web Editor NEWImplementation of Aho-Corasick string matching algorithm for .NET
License: MIT License
Implementation of Aho-Corasick string matching algorithm for .NET
License: MIT License
Consider this input:
var sut = new AhoCorasickTree(new[] { "abcd", "bc" });
var x = sut.Contains("abc"); // => false
This should yield true because "abc" contains "bc". The failure transition from "c" of the "abcd" subtree to the "c" of "bc" seems to be missing.
Using the following code:
List<string> uniqueWords = new();
static string? parsedTermsComplete;
public void AttemptOne()
{
var wordArray = uniqueWords.ToArray();
var keyWords = new AhoCorasickTree(wordArray);
var keywordsPositions = keyWords.Search(parsedTermsComplete).ToList();
// var result = keyWords.Contains(parsedTermsComplete!); - alternative still fails.
}
I get the following error:
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> System.ArgumentException: should contain keywords
at AhoCorasick.Net.AhoCorasickTree..ctor(String[] keywords)
at RegexTesting.Program.Attempts.AttemptOne() in C:\Demo\RegexTesting\RegexTesting\Program.cs:line 1058
at BenchmarkDotNet.Autogenerated.Runnable_0.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 318
at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data)
at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor)
at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters)
at BenchmarkDotNet.Autogenerated.Runnable_0.Run(IHost host, String benchmarkName) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 175
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 58
The following code works in the same benchmark suite - so it isn't that the list is empty....
public void AttemptTwo()
{
var wordArray = uniqueWords.ToArray();
int i = uniqueWords.Count - 1;
foreach (var item in wordArray)
{
var keyWords = new AhoCorasickTree(new[] { item });
if (keyWords.Contains(parsedTermsComplete))
{
uniqueWords.RemoveAt(i);
}
i--;
}
}
Not sure if I have misunderstood the implementation, but wanted to raise it as I thought that this would work.
For implementation, parsedTermsComplete should contain several copies of every single word in the uniqueWords List - as that is how the parsed terms were created - I was looking to find a way to QC that every single word in uniqueWords did in fact exist in the parsedTermsComplete.
Don't get me wrong, the implementation that works is super fast, 3 times faster than any other implementation of the test - just wondered why I can't use a List to Array.
Something like this:
public IEnumerable<KeyValuePair<string, int>> Search(string text)
{ ... }
where the key is the matched pattern and the value is the start index into the searched string.
I tried adding this method assuming that IsFinished means a node is in the dictionary ("blue node" as in the description on Wikipedia). But that doesn't seem to be the case so I gave up ๐ข
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.