dlclark / regexp2 Goto Github PK
View Code? Open in Web Editor NEWA full-featured regex engine in pure Go based on the .NET engine
License: MIT License
A full-featured regex engine in pure Go based on the .NET engine
License: MIT License
r := MustCompile(`(?:){40}`, RE2)
m, err := r.FindStringMatch("12")
will panic with
panic: runtime error: index out of range [-1] [recovered]
panic: runtime error: index out of range [-1]
goroutine 6 [running]:
testing.tRunner.func1.1(0x590420, 0xc000016320)
testing/testing.go:988 +0x30d
testing.tRunner.func1(0xc000134120)
testing/testing.go:991 +0x3f9
panic(0x590420, 0xc000016320)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*runner).trackPush1(...)
github.com/dlclark/regexp2/runner.go:992
github.com/dlclark/regexp2.(*runner).execute(0xc000146000, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:387 +0x4511
github.com/dlclark/regexp2.(*runner).scan(0xc000146000, 0xc000014230, 0x2, 0x2, 0x0, 0x0, 0x7fffffffffffffff, 0x0, 0x8, 0x8)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc000132100, 0x5a3d00, 0x0, 0xc000014230, 0x2, 0x2, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0x21a
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
github.com/dlclark/regexp2.TestRE2ECMA(0xc000134120)
github.com/dlclark/regexp2/regexp_re2_test.go:125 +0x8b
testing.tRunner(0xc000134120, 0x5b45d8)
testing/testing.go:1039 +0xdc
created by testing.(*T).Run
testing/testing.go:1090 +0x372
exit status 2
FAIL github.com/dlclark/regexp2 0.006s
Things that I know don't matter:
40
needs to be 17
or biggerThis was found through fuzzing goja with the go-fuzz corpus for regexp which is why the example is such :). I may rewrite it to fuzz regexp2 as well and post it if there is interest.
Test case:
func TestReplaceRef(t *testing.T) {
re := MustCompile("(123)hello(789)", None)
res, err := re.Replace("123hello789", "$1456$2", -1, -1)
if err != nil {
t.Fatal(err)
}
if res != "123456789" {
t.Fatalf("Wrong result: %s", res)
}
}
Result:
--- FAIL: TestReplaceRef (0.00s)
regexp_test.go:775: Wrong result: $1456789
s1 := `^Google\nApple$`
s2 := `^Google\nApple\Z`
data := "Google\nApple\n"
// will get result
re, err := regexp2.Compile(s, regexp2.Singleline)
// will not get result
re, err := regexp2.Compile(s, regexp2.Singleline|regexp2.RE2)
Why?
Thank you very much for the porting!
I checked your library and found that most unicode character classes have not been implemented yet.
Reference: http://www.fileformat.info/info/unicode/category/index.htm
Looks like fundamental character categories, such as [\p{P}]
(= any punctuations), are available:
package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
re := regexp2.MustCompile(`(?<=[カキケコ\p{Po}])ん+`, 0) // works
isMatch, err := re.FindStringMatch(`ブック。んん`)
if err == nil {
fmt.Println(isMatch)
}
}
But most advanced character classes (block) such as [\p{Katakana}]
have not been implemented:
package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
re := regexp2.MustCompile(`(?<=[カキケコ\p{Katakana}])ん+`, 0) // panic with [\p{Katakana}]
isMatch, err := re.FindStringMatch(`ブック。んん`)
if err == nil {
fmt.Println(isMatch)
}
}
The sample code above causes panic: not impelemented
.
I hope you'd implement them in a future.
s := `[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`
regexp.MustCompile(s, regexp.None)
panic: regexp2: Compile(`[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`): error parsing regexp: unrecognized escape sequence \_ in `[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;\/\*]+`
it is panic. But succeeded in python.
In [47]: s = r"""[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*['"][^\n'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))['"][\s\)]*[\r\n;
...: \/\*]+"""
In [48]: re.compile(s)
Out[48]:
re.compile(r'[\r\n;\/\*]+\s*\b(include|require)(_once)?\b[\s\(]*[\'"][^\n\'"]{1,100}((\.(jpg|png|txt|jpeg|log|tmp|db|cache)|\_(tmp|log))|((http|https|file|php|data|ftp)\:\/\/\[.{0,25}))[\'"][\s\)]*[\r\n;\/\*]+',
re.UNICODE)
With the introduction of fastclock, it spawns a go routine with a given timeout.
https://github.com/dlclark/regexp2/blob/master/fastclock.go#L75
This timeout is defaulted to "forever".
https://github.com/dlclark/regexp2/blob/master/regexp.go#L22-L32
If you are using any unit tests, this can leak if using uber-go/goleak.
I am using Chroma which sets the timeout to 250ms, which is better than never, but it still leaks a routine on my quicker tests.
I do not know the solution, but can a way be implemented to make sure this go routine is killed when it is no longer needed? Could we store the number of Matches
that is using the clock, and when the matches all go away, the go routine stops as soon as it can?
As someone who is new to this repo, I am not 100% sure. It is just a problem we are hitting now in our unit tests.
I am tokenizing some text by matching a set of regexes against the beginning of a string holding the contents of a file. I noticed that regexp2 was extremely slow for this use-case, and after running the profiler found that the time was dominated by getRunes()
.
This is occurring because, before every match, regexp2 converts the entire 22kb string to a slice of runes. I've worked around the issue be pre-converting the string to a slice of runes myself, then using FindRulesMatch()
, but it was quite surprising and non-obvious.
A solution would be to convert runes on the fly (as most matches are under 10 characters, converting the whole string each time is redundant). Looking at the code, it doesn't seem like it would super painful to achieve. The runner
would need to be modified to use DecodeRuneInString
to advance the index into the string, rather than a direct index into a slice of runes.
It looks like \d
matches ߀
(\u07c0
) with regexp2, but not with the standard library regexp.
See the following example:
package main
import (
"fmt"
"regexp"
"github.com/dlclark/regexp2"
)
func main() {
re := regexp.MustCompile(`^\d$`)
re2 := regexp2.MustCompile(`^\d$`, regexp2.RE2)
notZero := "߀" // \u07c0
match := re.MatchString(notZero)
fmt.Printf("regexp: %v\n", match)
match2, _ := re2.MatchString(notZero)
fmt.Printf("regexp2: %v\n", match2)
}
Perhaps this is a known issue, but I'm wondering if there is a way to get additional compatibility with the standard library.
Getting a panic when trying to load a 64bit int on a 32bit machine at line:
Line 103 in af93f4c
The godocs say to instead use Int64.Load.
When compiling using regexp2.ECMAScript
the regexp [a-\s]
fails with the following but it should pass:
error parsing regexp: cannot include class \115 in character range in `[a-\s]`
regexp101 shows how it should be interpreted.
The following test results in an infinite loop.
func TestOverlappingMatch(t *testing.T) {
re := MustCompile(`((?:0*)+?(?:.*)+?)?`, 0)
match, err := re.FindStringMatch("0\xfd")
if err != nil {
t.Fatal(err)
}
for match != nil {
t.Logf("start: %d, length: %d", match.Index, match.Length)
match, err = re.FindNextMatch(match)
if err != nil {
t.Fatal(err)
}
}
}
$ go test -v -run TestOverlappingMatch
=== RUN TestOverlappingMatch
TestOverlappingMatch: regexp_test.go:802: start: 0, length: 2
TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1
TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1
TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1
....
Test case:
func TestReplaceRef(t *testing.T) {
re := MustCompile("(123)hello(789)", None)
res, err := re.Replace("123hello789", "\\1456\\2", -1, -1)
if err != nil {
t.Fatal(err)
}
if res != "123456789" {
t.Fatalf("Wrong result: %s", res)
}
}
Result:
--- FAIL: TestReplaceRef (0.00s)
regexp_test.go:775: Wrong result: \1456\2
This is NOT an issue and just to let you know that a new "absent operator" has been implemented on Ruby's regexp lib named Onigmo. Sorry for this if this'd disturb you.
Note that the implementation of the operator has a rigid background theory: https://staff.aist.go.jp/tanaka-akira/pub/prosym49-akr-paper.pdf
I recognize that your Regexp2 is based upon .NET Framework and extending your lib like that might not be good in some cases.
Note that I don't mean I need the operator right now.
I just wrote that for the case you'd have any interests in the new operator.
Cheers,
hi,
i want found some repeated number in a string
string : 3331112233
reg: (\d)\1{3}
result is nil
I came across something called categories in regex which I need to use for one of my requirements (https://www.regular-expressions.info/unicode.html). Does this library support categories?
a regex= ^(ac|bb)$\n, but this i dont use option Multiline,I think it will error when MustCompile,but it not ,and can match string "ac\n",so how can i do ,it will throw an error
package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
re,_ := regexp2.Compile(Deployment
, 0)
fmt.Println(re.MatchString(D.*
)) // ExpectedOutput: true , ActualOutput: false
fmt.Println(re.MatchString(D*
)) // ExpectedOutput: true , ActualOutput: false
fmt.Println(re.MatchString(Dep
)) // ExpectedOutput: true , ActualOutput: false
fmt.Println(re.MatchString(Deployment
)) // ExpectedOutput: true , ActualOutput: true
}
package main
import (
"fmt"
"github.com/dlclark/regexp2"
"regexp"
)
func main() {
str := `我的邮箱是[email protected]和[email protected]`
reg := `\b(((([*+\-=?^_{|}~\w])|([*+\-=?^_{|}~\w][*+\-=?^_{|}~\.\w]{0,}[*+\-=?^_{|}~\w]))[@]\w+([-.]\w+)*\.[A-Za-z]{2,8}))\b`
re, _ := regexp.Compile(reg)
re2, _ := regexp2.Compile(reg, 0)
fmt.Println(re.FindAllString(str, -1))
result, _ := re2.FindStringMatch(str)
fmt.Println(result.String())
}
The result of regexp is [email protected] [email protected]
, but the regexp2 result is 163.com和[email protected]
.
Hello, I was checking it out and it seems to fail a regular expression. For a given text like this one, the expression ((Art\.\s\d+)[\S\s]*?(?=Art\.\s\d+))
fails to match every Art. block in the text. I've tested the expression on this website and there it gives me the correct count of 12 matches.
Am I missing something? Maybe a multiline flag?
Compile method takes in regular expression and RegexOptions. https://godoc.org/github.com/dlclark/regexp2#Compile
I have checked the documentation and I could not find valid values for RegexOptions
.
Could some one please provide the values for RegexOptions
or point me to the documentation where I can find it?
Hi!
If pattern contains \_
, regexp2 fails to compile it. Example:
_, err := regexp2.Compile("^/legacy/([\w|\d|\-\_]+)/([\w|\d|\-\_]+)/.*", 0)
if err != nil {
fmt.Println(err)
}
Error is error parsing regexp: unrecognized escape sequence \_ in ^/legacy/([\w|\d|\-\_]+)/([\w|\d|\-\_]+)/.*)
This pattern works in regexp package.
thanks!
One more that was fuzzed during the night ;)
package main
import (
"fmt"
"runtime/debug"
"github.com/dlclark/regexp2"
)
var testCases = []struct {
r, s []byte
}{
{
r: []byte{0x30, 0x28, 0x3f, 0x3e, 0x28, 0x29, 0x2b, 0x3f, 0x30, 0x29, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x77},
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
},
{
r: []byte{0x28, 0x3f, 0x3e, 0x28, 0x3f, 0x3e, 0x29, 0x2b, 0x3f, 0x3e, 0x29, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x3e, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
},
}
func test(r, s []byte) (b bool) {
defer func() {
if r := recover(); r != nil {
fmt.Println(r)
debug.PrintStack()
b = true
}
}()
re, err := regexp2.Compile(string(r), regexp2.ECMAScript)
if err != nil {
return false
}
_, _ = re.FindStringMatch(string(s))
return false
}
func main() {
for _, c := range testCases {
fmt.Printf("Test case regex='%#v', string='%#v' panics\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
fmt.Println("#############################################################################")
if test(c.r, c.s) {
} else {
fmt.Printf("Test case regex='%#v', string='%#v' DOES NOT panic\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
}
}
}
panics with
Test case regex='[]byte{0x30, 0x28, 0x3f, 0x3e, 0x28, 0x29, 0x2b, 0x3f, 0x30, 0x29, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x77}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '0(?>()+?0)00000000w', '0000000000000000000'
#############################################################################
runtime error: index out of range [72] with length 72
goroutine 1 [running]:
runtime/debug.Stack(0x36, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc00015be38)
command-line-arguments/test.go:27 +0x97
panic(0x4f0b40, 0xc0001320e0)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*runner).backtrack(0xc000162000)
github.com/dlclark/regexp2/runner.go:1033 +0x246
github.com/dlclark/regexp2.(*runner).execute(0xc000162000, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:904 +0x9b
github.com/dlclark/regexp2.(*runner).scan(0xc000162000, 0xc0001340a0, 0x13, 0x14, 0x0, 0x0, 0x7fffffffffffffff, 0x13, 0x14, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc000160080, 0xc00015bd00, 0xffffffffffffffff, 0xc0001340a0, 0x13, 0x14, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b9710, 0x13, 0x13, 0x5b9730, 0x13, 0x13, 0x0)
command-line-arguments/test.go:36 +0x168
main.main()
command-line-arguments/test.go:46 +0x355
Test case regex='[]byte{0x28, 0x3f, 0x3e, 0x28, 0x3f, 0x3e, 0x29, 0x2b, 0x3f, 0x3e, 0x29, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x3e, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '(?>(?>)+?>)0000000000', '00000000000000>000000'
#############################################################################
runtime error: index out of range [32] with length 32
goroutine 1 [running]:
runtime/debug.Stack(0x36, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc00015be38)
command-line-arguments/test.go:27 +0x97
panic(0x4f0b40, 0xc000132160)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*runner).popcrawl(...)
github.com/dlclark/regexp2/runner.go:938
github.com/dlclark/regexp2.(*runner).uncapture(...)
github.com/dlclark/regexp2/runner.go:1467
github.com/dlclark/regexp2.(*runner).execute(0xc000162100, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:507 +0x408c
github.com/dlclark/regexp2.(*runner).scan(0xc000162100, 0xc000100120, 0x15, 0x18, 0x0, 0x0, 0x7fffffffffffffff, 0x15, 0x18, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc000160180, 0xc00015bd00, 0xffffffffffffffff, 0xc000100120, 0x15, 0x18, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b9750, 0x15, 0x15, 0x5b9770, 0x15, 0x15, 0x0)
command-line-arguments/test.go:36 +0x168
main.main()
command-line-arguments/test.go:46 +0x355
package parse
import (
"fmt"
"github.com/dlclark/regexp2"
"testing"
)
func TestJsonRe2(t *testing.T) {
text := `{
"code" : "0",
"message" : "success",
"responseTime" : 2,
"traceId" : "a469b12c7d7aaca5",
"returnCode" : null,
"result" : {
"total" : 0,
"list" : [ ]
}
}`
reg := `/(\{(?:(?>[^{}"'\/]+)|(?>"(?:(?>[^\\"]+)|\\.)*")|(?>'(?:(?>[^\\']+)|\\.)*')|(?>\/\/.*\n)|(?>\/\*.*?\*\/)|(?-1))*\})/`
r, err := regexp2.Compile(reg, regexp2.RE2|regexp2.Multiline|regexp2.ECMAScript)
if err != nil {
fmt.Println(err)
return
}
matchedStrings, err := r.FindStringMatch(text)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(matchedStrings)
}
output:
error parsing regexp: unrecognized grouping construct: (?-1 in `/(\{(?:(?>[^{}"'\/]+)|(?>"(?:(?>[^\\"]+)|\\.)*")|(?>'(?:(?>[^\\']+)|\\.)*')|(?>\/\/.*\n)|(?>\/\*.*?\*\/)|(?-1))*\})/`
but in https://regex101.com/,it is ok
Hello,
I'd just like to ask you if you have any plans to implement bulk replace functions to your regexp2 as the Go standard regex?
https://golang.org/pkg/regexp/#Regexp.ReplaceAll
func (re *Regexp) ReplaceAll(src, repl []byte) []byte
func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte
func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte
func (re *Regexp) ReplaceAllLiteralString(src, repl string) string
func (re *Regexp) ReplaceAllString(src, repl string) string
func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string
Thank you,
I want to get all matched string. Like return []stirng
or what, how could I do that.
In RE2 compatibility mode, regexp2
supports Python-style named capture groups (eg. (?P<name>re)
). But there doesn't appear to be support for Python-style named backreferences (eg. (?P=name)
).
Do you have any plans to support those? More info here. Thanks!
Extract from https://pkg.go.dev/regexp/syntax
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
I imagine this could be implemented in RE2 compat mode.
As part as an effort that includes packaging your library for Debian, I'm wondering if it would be possible to have more details or information about which particular files are covered by each original license?
In particular, could you provide some more details regarding these comments on ATTRIB
:
Some of this code is ported from dotnet/corefx, which was released under this license:
...
Small pieces of code are copied from the Go framework under this license:
...
I am aware it might be a bit difficult to retrieve that history, but any insight would be much appreciated in the hopes of making sure licenses and copyright are attributed as faithfully as possible. Thanks in advance!
re := regexp2.MustCompile(`(?m)^.*(?!/bin/bash)$`,0)
match,_ := re.FindStringMatch(string(passwd))
I'm trying to take all the string execpt the ones containing /bin/bash but actually the result is just the first line of /etc/passwd that contains /bin/bash
This is again from fuzzing:
package main
import (
"fmt"
"runtime/debug"
"github.com/dlclark/regexp2"
)
var testCases = []struct {
r, s []byte
}{
{
r: []byte{0x30, 0xbf, 0x30, 0x2a, 0x30, 0x30},
s: []byte{0xf0, 0xb0, 0x80, 0x91, 0xf7},
},
{
s: []byte{0xf3, 0x80, 0x80, 0x87, 0x80, 0x89},
r: []byte{0x30, 0xaf, 0xf3, 0x30, 0x2a},
},
}
func test(r, s []byte) (b bool) {
defer func() {
if r := recover(); r != nil {
fmt.Println(r)
debug.PrintStack()
b = true
}
}()
re, err := regexp2.Compile(string(r), regexp2.ECMAScript)
if err != nil {
return false
}
_, _ = re.FindStringMatch(string(s))
return false
}
func main() {
for _, c := range testCases {
fmt.Printf("Test case regex='%#v', string='%#v' panics\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
fmt.Println("#############################################################################")
if test(c.r, c.s) {
} else {
fmt.Printf("Test case regex='%#v', string='%#v' DOES NOT panic\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
}
}
}
will get you
est case regex='[]byte{0x30, 0xbf, 0x30, 0x2a, 0x30, 0x30}', string='[]byte{0xf0, 0xb0, 0x80, 0x91, 0xf7}' panics
string values '00*00', '𰀑'
#############################################################################
runtime error: index out of range [196625] with length 128
goroutine 1 [running]:
runtime/debug.Stack(0x3b, 0x0, 0x0)
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000113e38)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:27 +0x97
panic(0x4f0ac0, 0xc0001420a0)
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/panic.go:969 +0x166
github.com/dlclark/regexp2/syntax.(*BmPrefix).Scan(0xc0001602a0, 0xc000136078, 0x2, 0x2, 0x0, 0x0, 0x2, 0x7f9a4a2befb8)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/syntax/prefix.go:716 +0x3be
github.com/dlclark/regexp2.(*runner).findFirstChar(0xc000170000, 0xc000170000)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:1305 +0x4d3
github.com/dlclark/regexp2.(*runner).scan(0xc000170000, 0xc000136078, 0x2, 0x2, 0x0, 0xc000113d00, 0x7fffffffffffffff, 0x4, 0xfffd, 0x5)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:130 +0x128
github.com/dlclark/regexp2.(*Regexp).run(0xc00016e080, 0xc000113d00, 0xffffffffffffffff, 0xc000136078, 0x2, 0x2, 0x0, 0x0, 0x0)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/regexp.go:159
main.test(0x5ba04c, 0x6, 0x6, 0x5ba034, 0x5, 0x5, 0x0)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:36 +0x168
main.main()
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:46 +0x355
Test case regex='[]byte{0x30, 0xaf, 0xf3, 0x30, 0x2a}', string='[]byte{0xf3, 0x80, 0x80, 0x87, 0x80, 0x89}' panics
string values '00*', ''
#############################################################################
runtime error: index out of range [786439] with length 128
goroutine 1 [running]:
runtime/debug.Stack(0x3b, 0x0, 0x0)
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000113e38)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:27 +0x97
panic(0x4f0ac0, 0xc000142100)
/home/mstoykov/.gvm/gos/go1.14.9/src/runtime/panic.go:969 +0x166
github.com/dlclark/regexp2/syntax.(*BmPrefix).Scan(0xc000160540, 0xc0001360d0, 0x3, 0x4, 0x0, 0x0, 0x3, 0xc00016e100)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/syntax/prefix.go:716 +0x3be
github.com/dlclark/regexp2.(*runner).findFirstChar(0xc000170100, 0xc000170100)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:1305 +0x4d3
github.com/dlclark/regexp2.(*runner).scan(0xc000170100, 0xc0001360d0, 0x3, 0x4, 0x0, 0xc000113d00, 0x7fffffffffffffff, 0x5, 0xfffd, 0x6)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:130 +0x128
github.com/dlclark/regexp2.(*Regexp).run(0xc00016e180, 0xc000113d00, 0xffffffffffffffff, 0xc0001360d0, 0x3, 0x4, 0x0, 0x0, 0x0)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/dlclark/regexp2/regexp.go:159
main.test(0x5ba03c, 0x5, 0x5, 0x5ba054, 0x6, 0x6, 0x0)
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:36 +0x168
main.main()
/home/mstoykov/.gvm/pkgsets/go1.14.9/global/src/github.com/mstoykov/goja-regexp2-fuzzing/crashers/test.go:46 +0x355
I have more test cases but these ones were the shortest and just as readable :(
The code that caused the error:
Why nil ?
This should be the right result:
Sample code:
`package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
r, err := regexp2.Compile(`(?<=1234\.\*56).*(?=890)`, regexp2.Compiled)
if err != nil {
panic(err)
}
m, err := r.FindStringMatch(`1234.*567890`)
if err != nil {
panic(err)
}
fmt.Println(m)
}`
Hello dlclark,
I found one more issue, which is relevant to #5 .
I checked this with the brand-new Go 1.8, but I guess the version of the Go does not affect the issue.
Condition:
package main
import (
"github.com/dlclark/regexp2"
"github.com/k0kubun/pp"
)
func main() {
str := "猟な" // factor 1: the kanji + the Hiragana
re := regexp2.MustCompile(str, 0)
result, _ := re.ReplaceFunc(
"🍺な" + // factor 2: 4byte emoji + the same Hiragana as above
"なあ🍺な", // factor 3: the same Hiragana does not surround the 4byte emoji; if you remove "あ" from this, it works fine
func(m regexp2.Match) string {
return "࿗" + "1" + "࿘" + string(m.Capture.Runes()) + "࿌"
}, -1, -1)
pp.Println(result)
}
package main
import (
"github.com/dlclark/regexp2"
"github.com/k0kubun/pp"
)
func main() {
str := "猟な" // works fine with the kanji + trailing Japanese char ("な" in the case)
re := regexp2.MustCompile(str, 0)
result, _ := re.ReplaceFunc(
"な📍な"+ // works fine if the same "な" surrounds the 4byte emoji
"な✔️な"+
"な😏な"+
"な⚾️な"+
"な📣な"+
"な🍣な"+
"な🍺な"+
"な📍✔️😏⚾️📣🍣🍺な", func(m regexp2.Match) string {
return "࿗" + "1" + "࿘" + string(m.Capture.Runes()) + "࿌"
}, -1, -1)
pp.Println(result)
}
Best regards, 🙇
I'm trying to use this library to get all the named captured groups to a map[string]string.
This is my code:
caps := make(map[string]string)
re, err := regexp2.Compile(pattern, regexp2.RE2)
if err != nil {
panic(err)
}
names := re.GetGroupNames()
mat, err := re.FindStringMatch(text)
if err != nil {
panic(err)
}
if mat != nil {
gps := mat.Groups()
for i, value := range names {
if value != strconv.Itoa(i) {
if len(gps[i].Captures) > 0 {
caps[value] = gps[i].Captures[0].String()
}
}
}
fmt.Println(caps)
}
Is this the best way in term of performance to do it?
First it calls FindStringMatch(), then it calls Groups() and finally, a for
loop. Seem a little too many jobs to do. :D
Does this library support the xeger
functionality? For example I have the following regex that is not supported by standard regexp library.
(?!((?!a(b|c)z)|(?!a(c|d)z)))
I need to do something like
r, _ := regexp2.Compile(`(?!((?!a(b|c)z)|(?!a(c|d)z)))`, 0)
s, _ := r.GenerateMatchingString()
I need something like this that gives me a string that matches the regex, if any exists, for example:
acz
Is this functionality already implemented? I believe Fare has this feature. (https://github.com/moodmosaic/Fare/blob/master/Src/Fare/Xeger.cs)
Can we probably use those codes to add this feature? I am willing to contribute and add this feature if it is welcome.
Hi,
Thank you for the library. I needed negative lookbehinds and was disappointed to find them not supported in the standard Go regexp
package.
In the course of converting some code over to use your package, I had to modify some of the regexes to use Perl character classes instead of the ASCII classes defined here: https://github.com/google/re2/wiki/Syntax
Example: https://play.golang.org/p/MlCaJtyvQ7q
Copied below as well:
re := regexp.MustCompile(`^[[:digit:]]+$`)
if isMatch := re.MatchString(`12345667890`); isMatch {
fmt.Println("Matched regexp")
} else {
fmt.Println("No Match regexp")
}
re2 := regexp2.MustCompile(`^[[:digit:]]+$`, 0)
if isMatch, _ := re2.MatchString(`12345667890`); isMatch {
fmt.Println("Matched regexp2")
} else {
fmt.Println("No Match regexp2")
}
Output:
Matched regexp
No Match regexp2
It'd be nice to support these larger character classes as well to keep compatibility with the standard library's regexp
package.
Hello! First, thanks for this great library - this is an impressive feat!
I needed an equivalent function for https://golang.org/pkg/regexp/#Regexp.FindAllString which ideally would be a part of this library, but unfortunately doesn't exist today. I took a stab at implementing it (without the n
parameter):
func regexp2FindAllString(re *regexp2.Regexp, s string) []string {
var matches []string
for {
match, _ := re.FindStringMatch(s)
if match == nil {
break
} else {
matches = append(matches, match.String())
s = s[match.Index+match.Length:]
}
}
return matches
}
At first glance, this seemed correct and appeared to work - however I realized that it in fact is incompatible with unicode because match.Length
appears to report length in runes not bytes. I'm not sure whether or not Capture.Index
reports bytes or runes either, and the docs don't define this:
// the position in the original string where the first character of
// captured substring was found.
Index int
// the length of the captured substring.
Length int
From testing, it appears that Capture.Index
oddly is in bytes and not runes. A corrected implementation is:
func regexp2FindAllString(re *regexp2.Regexp, s string) []string {
var matches []string
for {
match, _ := re.FindStringMatch(s)
if match == nil {
break
} else {
matches = append(matches, match.String())
- s = s[match.Index+match.Length:]
+ s = s[match.Index+len(match.String()):]
}
}
return matches
}
This brings me to my points of feedback:
Index
in bytes and Length
in runes is an odd inconsistency, I imagine they should be the same.FindAllString
implementationThanks again for the great library!
I wander if regexp2 can provide the same APIs adapt to std.regexp.
So that I can change my rely between regexp2 & std.regexp easily by just change the expr text only.
This is not an issue, I hope that's OK.
Someone pointed out regexp2 to me and I has to test it with one of my test cases, and it worked beautifully!
https://github.com/kobi/RecreationalRegex/blob/master/Go/RegexMaze.go
Thanks!
1.31mins 54.98% 54.98% 1.31mins 55.00% github.com/dlclark/regexp2/syntax.CharSet.CharIn
0.25mins 10.34% 65.32% 0.25mins 10.36% github.com/dlclark/regexp2.(*runner).forwardcharnext
0.19mins 8.09% 73.41% 1.75mins 73.45% github.com/dlclark/regexp2.(*runner).findFirstChar
Have you done any basic benchmarks?
I'm trying to force a timeout as part of my unit testing. Unfortunately, the expression gets evaluated too quickly and never times out. Roughly, my code looks like:
https://play.golang.com/p/fuXQh3RdyuO
package main
import (
"github.com/dlclark/regexp2"
"testing"
"time"
)
var regex = regexp2.MustCompile(`\d{4}-\d{2}-\d{2}`, regexp2.None)
func init() {
regex.MatchTimeout = 1 * time.Second
}
func StringMatches(input string) (bool, error) {
return regex.MatchString(input)
}
func TestLastIndex(t *testing.T) {
originalTimeout := regex.MatchTimeout
regex.MatchTimeout = -1 * time.Nanosecond
result, err := StringMatches("2023-03-28")
if result == true {
t.Error("expected match false due to timeout")
}
if err == nil {
t.Error("expected timeout error")
}
regex.MatchTimeout = originalTimeout
}
The only major difference being that my regular expression is a more complicated date time string matcher with named groups.
Is there a way to force the evaluator to timeout for testing? Otherwise, I'm not sure how I can cover the error case of MatchString
/FindStringMatch
.
Hello, it's been a long time.
Today I found an issue regarding some special "4byte" emojis on ReplaceFunc().
You can inspect the above with http://r12a.github.io/apps/conversion/ like the following:
Please take a look at the following: You can reproduce the issue by uncommenting the str
assignment lines one by one.
As far as I checked, ReplaceFunc()'d get panic under the following condition:
package main
import (
"github.com/dlclark/regexp2"
"github.com/k0kubun/pp"
)
func main() {
str := "高" // panic: Japanese Kanji
// str := "は" // panic: Japanese Hiragana
// str := "パ" // panic: Japanese Katakana
// str := "[a-zA-Z0-9]{,2}" // works fine: Japanese Hiragana
// str := "峰起|烽起" // works fine: longer Japanese Hiragana (I wonder why)
// str := "フトレス" // panic: longer Japanese Katakana
// str := "ALLWAYS|Allways|allways|AllWays" // works fine: Alphabet
// str := "📍" // works fine: 4byte emoji
// str := "📍📍" // works fine: continuous 4byte emoji
// str := "✔️" // panic: 3byte emoji
// str := "✔️✔️" // panic: coutinuous 3byte emoji
// str := "📍️✔️" // works fine: 4 and 3byte emoji
// str := "️✔📍️" // works fine: 3 and 4byte emoji
// str := "📍️は️" // works fine: 4byte emoji and Hiragana
// str := "️は📍️" // works fine: Hiragana and 4byte emoji
re := regexp2.MustCompile(str, 0)
result, _ := re.ReplaceFunc("📍✔️😏⚾️📣🍣🍺🍺 <- continuous 4byte emoji 寿司ビール文字あり", func(m regexp2.Match) string {
return "࿗" + "࿘" + string(m.Capture.Runes()) + "࿌"
}, -1, -1)
pp.Println(result)
}
The following is a kind of control group that works fine. The key is that the target contains no "continuous 4byte emojis".
package main
import (
"github.com/dlclark/regexp2"
"github.com/k0kubun/pp"
)
func main() {
// All of the following patterns work fine perhaps because ""✔✔⚾⚾️ <- 3byte emoji 寿司ビール文字なし" contains no continuous 4byte emojis. You can check them by uncommenting them one by one.
str := "高"
// str := "は"
// str := "パ"
// str := "[a-zA-Z0-9]{,2}"
// str := "峰起|烽起"
// str := "フトレス"
// str := "ALLWAYS|Allways|allways|AllWays"
// str := "📍"
// str := "📍📍"
// str := "✔️"
// str := "✔️✔️"
// str := "📍️✔️"
// str := "️✔📍️"
// str := "📍️は️"
// str := "️は📍️"
re := regexp2.MustCompile(str, 0)
// The following target works fine: there's no continuous 4byte emojis
result, _ := re.ReplaceFunc("✔✔⚾⚾️ <- 3byte emoji 寿司ビール文字なし", func(m regexp2.Match) string {
return "࿗" + "࿘" + string(m.Capture.Runes()) + "࿌"
}, -1, -1)
pp.Println(result)
}
The issue looks a little bit similar to "sushi-beer" issue: https://gist.github.com/kamipo/37576ce436c564d8cc28
I hope you'd check and fix it.
Best regards, 🙇
Run the following example (https://go.dev/play/p/BDU6yN5NvEZ):
package main
import (
"log"
"regexp"
"time"
"github.com/dlclark/regexp2"
)
func main() {
url := "https://www.dhgate.com/product/magnetic-liquid-eyeliner-magnetic-false-eyelashes/481362313.html"
reg1 := regexp.MustCompile(`dhgate(?:.[a-z]+)+\/product\/`)
log.Println("start regexp match string...")
begin := time.Now()
reg1.MatchString(url)
log.Println("time taken:", time.Since(begin))
reg2 := regexp2.MustCompile(`dhgate(?:.[a-z]+)+\/product\/`, regexp2.IgnoreCase)
log.Println("start regexp2 match string...")
begin = time.Now()
reg2.MatchString(url)
log.Println("time taken:", time.Since(begin))
}
output:
2021/12/08 14:16:30 start regexp match string...
2021/12/08 14:16:30 time taken: 21.583µs
2021/12/08 14:16:30 start regexp2 match string...
regexp2 version is v1.4.0
Hope it helps to improve performance.
When trying to match (phrase.MatchString(X)
) messages like gg
(notice that these are not the regular spaces) against a phrase like regexp2.MustCompile("\\bcool (house)\\b", 0)
, the following error will be thrown:
panic: runtime error: index out of range [917504] with length 128
goroutine 1 [running]:
github.com/dlclark/regexp2/syntax.(*BmPrefix).Scan(0xc000180540, {0xc000b70948, 0x6, 0x0?}, 0x0?, 0x0, 0x6)
C:/Users/X/go/pkg/mod/github.com/dlclark/[email protected]/syntax/prefix.go:716 +0x3bb
github.com/dlclark/regexp2.(*runner).findFirstChar(0xc000623a00)
C:/Users/X/go/pkg/mod/github.com/dlclark/[email protected]/runner.go:1305 +0x366
github.com/dlclark/regexp2.(*runner).scan(0xc000623a00, {0xc000b70948?, 0x6, 0xc000b70948?}, 0x6?, 0x1, 0xc00008f8e8?)
C:/Users/X/go/pkg/mod/github.com/dlclark/[email protected]/runner.go:130 +0x1e5
github.com/dlclark/regexp2.(*Regexp).run(0xc0000f6200, 0xf4?, 0xffffffffffffffff, {0xc000b70948, 0x6, 0x6})
C:/Users/X/go/pkg/mod/github.com/dlclark/[email protected]/runner.go:91 +0xfa
github.com/dlclark/regexp2.(*Regexp).MatchString(0x10f9c40?, {0x108f0f4?, 0xc00008fb48?})
C:/Users/X/go/pkg/mod/github.com/dlclark/[email protected]/regexp.go:213 +0x45
main.main()
C:/Users/X/Desktop/GoRegExTests/test.go:127 +0xbdc
The error is only being thrown when:
a. The message contains those unicode characters
b. The RegExp contains a space and a group like (house)
The RegExp above is just a very basic example to demonstrate this problem.
Say I have a regex to tokenize some language..
# in python.
regex = re.compile(
"(?P<comment>#.*?$)|"
"(?P<newline>\n)|" # has to go ahead of the whitespace
"(?P<comma>,)|"
"(?P<double_quote_string>\".*?\")|"
"(?P<single_quote_string>'.*?')|"
"(?P<whitespace>[ \t\r\f\v]+)|" ... etc
Here you expect to get multiple matches for each group name when tokenizing a file and you want to keep the ordering of the tokens.
If I use the same approach using regexp2 can I go from match to group name? E.g. how do I get the last matched group name for a match? Is that possible?
Could you add more examples to the README? There's not a single runnable example of FindStringMatch, or FindNextMatch. I'm trying to use FindStringMatch to capture two capture groups in the below regexp, but the second one doesn't exist. Some more complex examples (find all matches for regexp, extract several capture groups from a match, regexps with lookaheads) on the README would be helpful for debugging. It looks like a really useful library (since it has support for lookahead expressions!) but I'm having a lot of trouble using it due to the lack of documentation.
package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
re := regexp2.MustCompile(`(\b\w+)=(.*?(?=\s\w+=|$))`, 0)
s := `timestamp=05/Dec/2018:14:39:41 -0500 foo=bar`
if matches, _ := re.FindStringMatch(s); matches != nil {
fmt.Printf("Group 0: %v\n", matches.String())
gps := matches.Groups()
fmt.Println(gps[1].Captures[0].String())
fmt.Println(gps[0].Captures[1].String()) //why is this capture group nil?
}
}
Here is a small script reproducing a panic that I found while fuzzing:
Notes:
[]byte
mostly because it makes the copying between output and program easierpackage main
import (
"fmt"
"runtime/debug"
"github.com/dlclark/regexp2"
)
var testCases = []struct {
r, s []byte
}{
{
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
r: []byte{0x28, 0x28, 0x29, 0x5c, 0x37, 0x28, 0x3f, 0x28, 0x29, 0x29},
},
{
r: []byte{0x28, 0x5c, 0x32, 0x28, 0x3f, 0x28, 0x30, 0x29, 0x29},
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
},
{
r: []byte{0x28, 0x3f, 0x28, 0x29, 0x29, 0x5c, 0x31, 0x30, 0x28, 0x3f, 0x28, 0x30, 0x29},
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
},
{
r: []byte{0x28, 0x29, 0x28, 0x28, 0x29, 0x5c, 0x37, 0x28, 0x3f, 0x28, 0x29, 0x29},
s: []byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30},
},
}
func test(r, s []byte) (b bool) {
defer func() {
if r := recover(); r != nil {
fmt.Println(r)
debug.PrintStack()
b = true
}
}()
re, err := regexp2.Compile(string(r), regexp2.ECMAScript|regexp2.Multiline)
if err != nil {
return false
}
_, _ = re.FindStringMatch(string(s))
return false
}
func main() {
for _, c := range testCases {
fmt.Println("#############################################################################")
if test(c.r, c.s) {
fmt.Printf("Test case regex='%#v', string='%#v' panics\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
} else {
fmt.Printf("Test case regex='%#v', string='%#v' DOES NOT panic\nstring values '%s', '%s'\n",
c.r, c.s, string(c.r), string(c.s),
)
}
}
}
Output is
#############################################################################
runtime error: index out of range [3] with length 3
goroutine 1 [running]:
runtime/debug.Stack(0x34, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000083e38)
command-line-arguments/test.go:36 +0x97
panic(0x4f0c20, 0xc0000162a0)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*Match).addMatch(0xc0000d6000, 0x3, 0x1, 0x0)
github.com/dlclark/regexp2/match.go:170 +0x31c
github.com/dlclark/regexp2.(*runner).capture(0xc0000d4000, 0x3, 0x1, 0x1)
github.com/dlclark/regexp2/runner.go:1420 +0x9e
github.com/dlclark/regexp2.(*runner).execute(0xc0000d4000, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:254 +0x276e
github.com/dlclark/regexp2.(*runner).scan(0xc0000d4000, 0xc000018150, 0x9, 0xc, 0x0, 0x0, 0x7fffffffffffffff, 0x9, 0xc, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc0000d2080, 0xc000083d00, 0xffffffffffffffff, 0xc000018150, 0x9, 0xc, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b91b8, 0xa, 0xa, 0x5b9188, 0x9, 0x9, 0x0)
command-line-arguments/test.go:45 +0x168
main.main()
command-line-arguments/test.go:52 +0x174
Test case regex='[]byte{0x28, 0x28, 0x29, 0x5c, 0x37, 0x28, 0x3f, 0x28, 0x29, 0x29}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '(()\7(?())', '000000000'
#############################################################################
runtime error: index out of range [2] with length 2
goroutine 1 [running]:
runtime/debug.Stack(0x34, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000083e38)
command-line-arguments/test.go:36 +0x97
panic(0x4f0c20, 0xc0000162c0)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*Match).addMatch(0xc0000d60e0, 0x2, 0x0, 0x1)
github.com/dlclark/regexp2/match.go:170 +0x31c
github.com/dlclark/regexp2.(*runner).capture(0xc0000d4100, 0x2, 0x0, 0x1)
github.com/dlclark/regexp2/runner.go:1420 +0x9e
github.com/dlclark/regexp2.(*runner).execute(0xc0000d4100, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:254 +0x276e
github.com/dlclark/regexp2.(*runner).scan(0xc0000d4100, 0xc0000181b0, 0x9, 0xc, 0x0, 0x0, 0x7fffffffffffffff, 0x9, 0xc, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc0000d2180, 0xc000083d00, 0xffffffffffffffff, 0xc0000181b0, 0x9, 0xc, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b9198, 0x9, 0x9, 0x5b91a8, 0x9, 0x9, 0x0)
command-line-arguments/test.go:45 +0x168
main.main()
command-line-arguments/test.go:52 +0x174
Test case regex='[]byte{0x28, 0x5c, 0x32, 0x28, 0x3f, 0x28, 0x30, 0x29, 0x29}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '(\2(?(0))', '000000000'
#############################################################################
runtime error: index out of range [1] with length 1
goroutine 1 [running]:
runtime/debug.Stack(0x34, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000083e38)
command-line-arguments/test.go:36 +0x97
panic(0x4f0c20, 0xc0000162e0)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*Match).addMatch(0xc0000d61c0, 0x1, 0x0, 0x1)
github.com/dlclark/regexp2/match.go:170 +0x31c
github.com/dlclark/regexp2.(*runner).capture(0xc0000d4200, 0x1, 0x0, 0x1)
github.com/dlclark/regexp2/runner.go:1420 +0x9e
github.com/dlclark/regexp2.(*runner).execute(0xc0000d4200, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:254 +0x276e
github.com/dlclark/regexp2.(*runner).scan(0xc0000d4200, 0xc0000281c0, 0xd, 0x10, 0x0, 0x0, 0x7fffffffffffffff, 0xd, 0x10, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc0000d2280, 0xc000083d00, 0xffffffffffffffff, 0xc0000281c0, 0xd, 0x10, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b9588, 0xd, 0xd, 0x5b9598, 0xd, 0xd, 0x0)
command-line-arguments/test.go:45 +0x168
main.main()
command-line-arguments/test.go:52 +0x174
Test case regex='[]byte{0x28, 0x3f, 0x28, 0x29, 0x29, 0x5c, 0x31, 0x30, 0x28, 0x3f, 0x28, 0x30, 0x29}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '(?())\10(?(0)', '0000000000000'
#############################################################################
runtime error: index out of range [4] with length 4
goroutine 1 [running]:
runtime/debug.Stack(0x34, 0x0, 0x0)
runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x22
main.test.func1(0xc000083e38)
command-line-arguments/test.go:36 +0x97
panic(0x4f0c20, 0xc000016320)
runtime/panic.go:969 +0x166
github.com/dlclark/regexp2.(*Match).addMatch(0xc0000d62a0, 0x4, 0x1, 0x0)
github.com/dlclark/regexp2/match.go:170 +0x31c
github.com/dlclark/regexp2.(*runner).capture(0xc0000d4300, 0x4, 0x1, 0x1)
github.com/dlclark/regexp2/runner.go:1420 +0x9e
github.com/dlclark/regexp2.(*runner).execute(0xc0000d4300, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:254 +0x276e
github.com/dlclark/regexp2.(*runner).scan(0xc0000d4300, 0xc000018210, 0xc, 0xc, 0x0, 0x0, 0x7fffffffffffffff, 0xc, 0xc, 0x4490be)
github.com/dlclark/regexp2/runner.go:144 +0x1c3
github.com/dlclark/regexp2.(*Regexp).run(0xc0000d2380, 0xc000083d00, 0xffffffffffffffff, 0xc000018210, 0xc, 0xc, 0x0, 0x0, 0x0)
github.com/dlclark/regexp2/runner.go:91 +0xf0
github.com/dlclark/regexp2.(*Regexp).FindStringMatch(...)
github.com/dlclark/regexp2/regexp.go:159
main.test(0x5b91c8, 0xc, 0xc, 0x5b91d8, 0xc, 0xc, 0x0)
command-line-arguments/test.go:45 +0x168
main.main()
command-line-arguments/test.go:52 +0x174
Test case regex='[]byte{0x28, 0x29, 0x28, 0x28, 0x29, 0x5c, 0x37, 0x28, 0x3f, 0x28, 0x29, 0x29}', string='[]byte{0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30}' panics
string values '()(()\7(?())', '000000000000'
the following codes fails
package main
import (
"fmt"
"github.com/dlclark/regexp2"
)
func main() {
regex := regexp2.MustCompile("<style", regexp2.IgnoreCase|regexp2.Singleline)
match, err := regex.FindStringMatch(sample)
if err != nil {
panic(err)
}
if match != nil {
t, err := regex.Replace(sample, "xxx", match.Index, -1)
if err != nil {
panic(err)
}
fmt.Printf("%s", t)
}
}
var sample = "<title>错<style"
if i search some words/regex successfully, and then replace something from match.Index instead of -1, the codes fails.
however, if removed the Chinese character 错
, the codes succeeds.
so, in such scenario, what should beginning index be if I want to replace all and don't want to replace from -1(begining)
exp, _ := regexp2.Compile("[\u{00061}-\u{0007A}]", regexp2.ECMAScript)
_, err := exp.MatchString(val)
fmt.Println(err.Error())
will end up with this error message
error parsing regexp: [}-u] range in reverse order in [\\u{0007A}-\\u{00061}]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.