GithubHelp home page GithubHelp logo

alexandrnikitin / ahocorasick.net Goto Github PK

View Code? Open in Web Editor NEW
29.0 29.0 4.0 520 KB

Implementation of Aho-Corasick string matching algorithm for .NET

License: MIT License

C# 88.48% F# 10.85% Batchfile 0.67%
aho-corasick c-sharp search-algorithm string-search

ahocorasick.net's People

Contributors

alexandrnikitin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ahocorasick.net's Issues

Some failure transitions missing?

Consider this input:

var sut = new AhoCorasickTree(new[] { "abcd", "bc" });
var x = sut.Contains("abc"); // => false

This should yield true because "abc" contains "bc". The failure transition from "c" of the "abcd" subtree to the "c" of "bc" seems to be missing.

List to Array Failing in Benchmark Tests

Using the following code:

List<string> uniqueWords = new();
static string? parsedTermsComplete;

public void AttemptOne()
{
    var wordArray = uniqueWords.ToArray();
    var keyWords = new AhoCorasickTree(wordArray);
    var keywordsPositions = keyWords.Search(parsedTermsComplete).ToList();
    // var result = keyWords.Contains(parsedTermsComplete!); - alternative still fails.
}

I get the following error:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> System.ArgumentException: should contain keywords
at AhoCorasick.Net.AhoCorasickTree..ctor(String[] keywords)
at RegexTesting.Program.Attempts.AttemptOne() in C:\Demo\RegexTesting\RegexTesting\Program.cs:line 1058
at BenchmarkDotNet.Autogenerated.Runnable_0.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 318
at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data)
at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor)
at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters)
at BenchmarkDotNet.Autogenerated.Runnable_0.Run(IHost host, String benchmarkName) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 175
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 58

The following code works in the same benchmark suite - so it isn't that the list is empty....

public void AttemptTwo()
{
    var wordArray = uniqueWords.ToArray();
    int i = uniqueWords.Count - 1;
    foreach (var item in wordArray)
    {
        var keyWords = new AhoCorasickTree(new[] { item });
        if (keyWords.Contains(parsedTermsComplete))
        {
            uniqueWords.RemoveAt(i);
        }
        i--;
    }
}

Not sure if I have misunderstood the implementation, but wanted to raise it as I thought that this would work.

For implementation, parsedTermsComplete should contain several copies of every single word in the uniqueWords List - as that is how the parsed terms were created - I was looking to find a way to QC that every single word in uniqueWords did in fact exist in the parsedTermsComplete.

Don't get me wrong, the implementation that works is super fast, 3 times faster than any other implementation of the test - just wondered why I can't use a List to Array.

Add method to output all matched substrings

Something like this:

public IEnumerable<KeyValuePair<string, int>> Search(string text)
{ ... }

where the key is the matched pattern and the value is the start index into the searched string.

I tried adding this method assuming that IsFinished means a node is in the dictionary ("blue node" as in the description on Wikipedia). But that doesn't seem to be the case so I gave up ๐Ÿ˜ข

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.