Comments (6)
PCRE2 already has support for different "newlines", but this does not change when PCRE2_MULTILINE is set. You can choose between CR (only), LF (only), CR+LF (i.e. two characters), any of the previous, any Unicode newline sequence, or NUL. A default can be set when PCRE2 is built, but this can be overridden by a function call and this in turn can be overridden within the pattern string. If you were to set ANYCRLF as the newline, it would almost agree with your "not s" mode, except that a CR followed by a LF would count as just one newline, not two. It sounds as if you have full control over the regex. In that case, when you are going to set the "m" option, you could also set LF as the only linefeed. So my suggestion is:
Default: start the pattern with (*ANYCRLF) which will give you correct "." behaviour, that is, "." will not match CR or LF.
If the "s" option is wanted, start the pattern with (?s) and "." will match any character.
If the "m" option is wanted, start the pattern with (*LF)(?m) and "." will match any except LF.
If both options are wanted, start with (*LF)(?ms).
That seems to me to give you the wanted behaviour, except that in the default case CRLF counts as just one newline. Making PCRE2 recognize either CR or LF as a newline, but treat CRLF as two newlines would require a new newline mode.
from pcre2.
The problem with your approach is, that in "m" mode without "s" you want to set (*LF)(?m), and a single dot will match CR, which it shouldn't according to the standard. The problem you mentioned at the end is also present.
My colleague suggested:
Always set (*LF)
Transform the regex that "." is never generated but:
no "s" mode: dot is transformed to [^\r\n]
in "s" mode: dot is transformed to [\s\S]
Would that work?
from pcre2.
It's somewhat inconsistent to have "." not match CR or LF while at the same time only recognizing LF as newline. However, I think your approach would work, though for "s" mode you could just set PCRE2_DOTALL (or (?s)) which would be more efficient.
from pcre2.
Yeah, thanks!
I really want to know why W3C decided to do it that way. The XML people should be very clever, shouldn't they? Do you have a clue?
The replacement syntax (i.e. substitute) is even more difficult to get used to. They don't accept ${num}, only $num, and if there are only 22 groups, then $223
will be equivalent to PCRE's ${22}3
. Replacing by ${2}23
is impossible in this case in XPATH.
from pcre2.
Who knows? Reading the doc suggests to me that they thought about "^" and "$" completely separately from "." whereas PCRE ties them all to the concept of a logical "newline". The replacement rules seem totally weird.
from pcre2.
Thanks!
from pcre2.
Related Issues (20)
- MSVC warnings with 10.43 HOT 1
- Allow unlimited subpattern name length HOT 4
- SunOS-5.11-SPARC - "src/pcre2.h", line 949: warning: no explicit type given HOT 5
- pcre2 makes software crash when GDS mitigation is forced for older CPUs HOT 21
- Invalid size 0 may lead to undefined behavior / infinite loop HOT 1
- Atomic group must not increase stack depth HOT 1
- Probable thread-safety issue in pcre2 10.43+ HOT 9
- No load/store-on-condition 2 facility HOT 4
- How can I get PCRE with version under 8.45? HOT 1
- Lift compile time default maximum pattern length HOT 1
- Grapheme cluser (`\X`) selector capturing multiple character HOT 2
- Alternative branch should match shared prefix only once HOT 3
- Might be a problem found during the metamorphosis test HOT 4
- Test suite fails when targeting i686 HOT 7
- Signing of git objects HOT 3
- Quantifier `a{,7}` not supported HOT 2
- Coverity defect: Illegal address computation HOT 5
- PCRE2 10.44 Test 8 (Internal offsets and code size) fails on 32-bit platform HOT 1
- Long-term maintenance of PCRE2 HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pcre2.