Comments (2)
Thanks for the request!
I've attempted a basic feature for this -- check out tag v1.0.1 (519dd65), it includes a basic smoke test. Apparently the longer names (like "Katakana") aren't technically Unicode categories, but are instead "Scripts." Because of this the go runtime stores them in a separate list so they were unavailable to the \p{}
syntax. I've updated the regexp2 code to join the unicode.Categories
and unicode.Scripts
maps so now all the known unicode scripts should work with \p
as expected.
Unfortunately I don't know enough about your example pattern or the characters involved to know the proper behavior you're expecting. If this new version doesn't work for you please give me a simple test similar to TestUnicodeScriptSets
that doesn't pass and I'll fix it up.
Once again, thanks for the heads up. My day-to-day is all ASCII so the unicode matching is an area that I need as much help as I can get!
from regexp2.
Tried v1.0.1 and worked fine for my small test. Thank you very much for your kindness and quick update!
Now I can use most of .NET regexp syntax I need. (Yes, the names are actually 'Scripts' as you specified :-)
FYI
I've been looking for regex engine for Go language that meets the following requirement:
- 'Scripts' such as 'Katakana' or 'Hiragana' are available as character classes
- Look-ahead/behind (both positive and negative) with quantifiers, such as
(?<=[a-zA-Z])blahblar(?=[a-zA-Z])
or(?<![a-zA-Z])blahblar(?![a-zA-Z])
, are fully supported (I recognize that such expressions might be inefficient in some cases, but it's been very convenient for me in .NET environment)
I've been familiar with .NET regex with the advanced features, which are missing in almost all other regex engines except 'onigmo'. While onigmo is very fast and almost equivalent to .NET framework, it is available only for ruby by now.
Thus I adopted rubex library for my Go app by now, but the engine for rubex is oniguruma, the previous version of onigmo. Full look ahead/behind with quantifier is missing in oniguruma.
Once I tried to port onigmo to rubex, but it is cgo-based (bridging c and go) and very hacky for me to implement. Instead, your regexp2 is based on c# .NET code and much cleaner than rubex.
Your porting regexp2 from .NET source is a good news from heaven for me. I really appreciate your work. :-)
from regexp2.
Related Issues (20)
- ecmascript: cannot include class \s in character range
- error parsing regexp: unrecognized grouping construct: (?-1 HOT 1
- Support for Python-style named backreference
- Panic on 32bit architectures HOT 12
- Why the replacement interface for []rune is not supported HOT 1
- Force timeout for testing? HOT 3
- Leaking go routines using `fastclock` HOT 6
- The matching results of strings containing Chinese characters are incorrect HOT 2
- No support for \p{unicode char class} that is supported by the Go stdlib regex package HOT 2
- No support for full unicode that is supported by the ECMAScript regex HOT 1
- xeger functionality HOT 1
- Question: Does this library support "categories" HOT 1
- Match loop cause high CPU usage HOT 2
- TestDeadline fails with go 1.21.3 HOT 4
- "cannot use []*syntax.regexNode as type []*struct" when using regexp2 inside traefik plugin HOT 1
- Line Terminator (Dollar sign) does not match as expected HOT 1
- Unable to see matches for a positive lookahead regex HOT 1
- MatchString() timeout stuck HOT 1
- FR: support Marshal/Unmarshal HOT 3
- FindStringMatch returns wrong index when using unicode characters HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from regexp2.