GithubHelp home page GithubHelp logo

bmatch's Introduction


bmatch.go


Build Status

bmatch is faster fixed pattern search for Go.

Go provides different search mechanisms to find the indices of a fixed string or byte pattern, that are backed by different search algorithms:

  • strings.Index switches by the pattern length
    • 1: generic assembler function strings•IndexByte(..) (i.e. for OSX/darwin see /usr/local/go/src/runtime/asm_amd64.s)
    • 1-31: generic assembler function strings•indexShortString(..) (i.e. for OSX/darwin see /usr/local/go/src/runtime/asm_amd64.s)
    • 31 calling a Rabin-Karp search algorithm using a rolling addition hash and use string comparison to prove the result

  • strings.Replace for single string replacing invokes a Boyer-Moore search over a string in /usr/local/go/src/strings/search.go implementing a stringFinder type
  • bytes.Index using generic assembler code (bytes•IndexByte(..)) to find the index of the first element of the pattern (i.e. for OSX/darwin see /usr/local/go/src/runtime/asm_amd64.s) and then compares the following sequence using another assembler function (Equal(..)).

The before mentioned assembler routines compare each byte of the haystack one by one. See unsafeMEMCHR.go for a faster approach. All but the Boyer-Moore search are relatively slow - even in comparison to python str.find function (implemented in C). bmatch's underlying algorithms outperform all of Go's search functions.


If you installed bmatch before 2016-03-20 it is recommended that you reinstall bmatch as there was a bugfix for "needles at haystack start"


Usage

Install bmatch by the usual

go get github.com/AndreasBriese/bmatch

In your go code use import "github.com/AndreasBriese/bmatch" and apply it on []byte types of the haystack (byte sequence to search in) and needle (pattern to search for).

index, err := bmatch.Index(&haystack, &needle) gives the first (left) index or -1 if not present,

indices, err := bmatch.FindAll(&haystack, &needle) to get an []int array with indices of (overlapping!) occurrences, or

count, err := bmatch.Count(&haystack, &needle) to get the number of (overlapping!) occurences of needle in haystack.

Benchmarks (go test -bench . cpu=1)

 ###############
 bmatch.go
 Haystack: ./theciaworldfactb00571.zip loaded (3013205 bytes)
 Alphabet size: 93
 
 PASS
 BenchmarkM_Bmatch_10_C          	       1	1002289595 ns/op 
 BenchmarkM_BoyerMoore_10_C       	       1	2683892806 ns/op
 BenchmarkM_BytesIndex_10_C       	       1	2630128009 ns/op
 BenchmarkM_Bmatch_30_C           	       2	 528011588 ns/op
 BenchmarkM_BoyerMoore_30_C       	       1	1402345246 ns/op
 BenchmarkM_BytesIndex_30_C       	       1	2654263863 ns/op
 BenchmarkM_Bmatch_1024_C         	      20	 105107250 ns/op
 BenchmarkM_BoyerMoore_1024_C     	       5	 214846485 ns/op
 BenchmarkM_BytesIndex_1024_C     	       1	2592599288 ns/op
 BenchmarkM_Bmatch_30_FI        	      10	 104080809 ns/op
 BenchmarkM_BoyerMoore_30_FI    	       5	 245957698 ns/op
 BenchmarkM_StringsIndex_30_FI  	       3	 392142229 ns/op
 BenchmarkM_BytesIndex_30_FI    	       2	 776670641 ns/op
 BenchmarkM_Bmatch_1024_FI      	      30	  43820654 ns/op
 BenchmarkM_BoyerMoore_1024_FI  	      10	 103821315 ns/op
 BenchmarkM_StringsIndex_1024_FI	       1	1306085958 ns/op
 BenchmarkM_BytesIndex_1024_FI  	       1	1276277951 ns/op

on a MacBookPro 2013 with i7 and 8GB Ram searching for 500 random patterns plus 20 patterns that are not present in the "1995 CIA World Factbook" (~3MB english natural text). Benchmark naming: .._searchFunction_patternMaximumLength_C=count|FI=first left Index

go test will try to download the test corpus from http://archive.org if it is not present in the folder.

License
bmatch.go (C)opyright 2016 Andreas Briese, eduToolbox@Bri-C GmbH, Sarstedt, with MIT license - see the headers in the code in the subfolders of the various search algorithms for details and reference to their predecessors (C-code mostly taken from the SMART tool http://www.dmi.unict.it/~faro/smart/ v.13.02).

bmatch's People

Contributors

andreasbriese avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

gernest mantyr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.