pcre2project / pcre2 Goto Github PK
View Code? Open in Web Editor NEWPCRE2 development is now based here.
License: Other
PCRE2 development is now based here.
License: Other
This is #2625 in the old Bugzilla, submitted by Rich Siegel.
RS: Given this text represented as UTF-16:
this is a test
Search using this pattern, which as written should not match any ASCII characters:
[\x{00FF}-\x{FFEE}]
If the pattern was compiled with PCRE2_CASELESS turned on, pcre2_match() will return a match at the first "s" in the subject text, even though that is outside the explicit range of characters. (And the uppercase version "S" would be, as well.)
Further testing shows that "k" and "K" are matching as well, presumably with the same underlying cause.
ZH: This is expected. Please check:
http://www.unicode.org/Public/12.1.0/ucd/CaseFolding.txt
Examples:
017F; C; 0073; # LATIN SMALL LETTER LONG S
212A; C; 006B; # KELVIN SIGN
In other words, a caseless 0x212A codepoint matches to K.
RS: Thank you! I'm a little ashamed to admit that Unicode case folding didn't even occur to me. This may pose more of a UI and/or documentation challenge, since I'm quite certain that it won't occur to end users either. :-)
Does it make sense to consider a flag in the PCRE2_EXTRA* space which would limit case folding to the ASCII range when PCRE2_CASELESS is specified? (I'm not yet advocating for it; I can see some clear limitations and disadvantages, and trying to express all of the possible variations could rapidly turn into a snake pit.)
PH: I think the best way of doing this would be to add PCRE2_ASCII_CASELESS to the main options, because having two separated flags seems very untidy. There are only two bits left in the main options, so I am slightly reluctant, but on the other hand leaving them unused just in case something more important comes along could last for ever. Zoltan, what do you think? Implementing this would need changes to JIT as well as the interpreters.
On further reflection, I've changed my mind and think that a PCRE2_EXTRA option would be better, as you suggested, partly because there may be a number of variations needed. And indeed, some additions to the documentation.
ZH:
I feel this is a non trivial change, and it can be easy done on pattern level.
The issue here is caseless, and you can temporarily disable it:
(?-i:[\x{00FF}-\x{FFEE}])
Or use an assertion on which separates ascii from the rest.
I try to crosscompile the shared library on macOS. My system uses an Intel processor and I want to compile for arm.
When I pass the --host
argument to configure
the shared library creation will be disabled:
$ ./configure --enable-jit --enable-pcre2-32 --disable-pcre2-8 --prefix=/Users/sbarex/Downloads/pcre2-32 --host arm64-apple-macos11 --enable-shared --disable-static
checking for a BSD-compatible install... /usr/local/bin/ginstall -c
checking whether build environment is sane... yes
checking for arm64-apple-macos11-strip... no
checking for strip... strip
checking for a race-free mkdir -p... /usr/local/bin/gmkdir -p
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for arm64-apple-macos11-gcc... no
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for wchar.h... yes
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking whether _XOPEN_SOURCE should be defined... no
checking for arm64-apple-macos11-ar... no
checking for arm64-apple-macos11-lib... no
checking for arm64-apple-macos11-link... no
checking for ar... ar
checking the archiver (ar) interface... ar
checking for int64_t... yes
checking build system type... x86_64-apple-darwin21.2.0
checking host system type... aarch64-apple-macos11
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/local/bin/gsed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
checking if the linker (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) is GNU ld... no
checking for BSD- or MS-compatible name lister (nm)... no
checking for arm64-apple-macos11-dumpbin... no
checking for arm64-apple-macos11-link... no
checking for dumpbin... no
checking for link... link -dump
checking the name lister (nm) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 786432
checking how to convert x86_64-apple-darwin21.2.0 file names to aarch64-apple-macos11 format... func_convert_file_noop
checking how to convert x86_64-apple-darwin21.2.0 file names to toolchain format... func_convert_file_noop
checking for /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld option to reload object files... -r
checking for arm64-apple-macos11-objdump... no
checking for objdump... objdump
checking how to recognize dependent libraries... unknown
checking for arm64-apple-macos11-dlltool... no
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for arm64-apple-macos11-ar... ar
checking for archiver @FILE support... no
checking for arm64-apple-macos11-strip... strip
checking for arm64-apple-macos11-ranlib... no
checking for ranlib... ranlib
checking command to parse nm output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for arm64-apple-macos11-mt... no
checking for mt... no
checking if : is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... yes
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) supports shared libraries... no
checking dynamic linker characteristics... no
checking how to hardcode library paths into programs... unsupported
checking whether stripping libraries is possible... no
checking if libtool supports shared libraries... no
checking whether to build shared libraries... no
checking whether to build static libraries... yes
checking whether ln -s works... yes
checking whether the -Werror option is usable... yes
checking for simple visibility declarations... yes
checking for __attribute__((uninitialized))... yes
checking for limits.h... yes
checking for sys/types.h... (cached) yes
checking for sys/stat.h... (cached) yes
checking for dirent.h... yes
checking for windows.h... no
checking for sys/wait.h... yes
checking for an ANSI C-conforming const... yes
checking for size_t... yes
checking for bcopy... yes
checking for memfd_create... no
checking for memmove... yes
checking for mkostemp... yes
checking for realpath... yes
checking for secure_getenv... no
checking for strerror... yes
checking for zlib.h... yes
checking for gzopen in -lz... yes
checking for bzlib.h... yes
checking for libbz2... yes
checking for the pthreads library -lpthreads... no
checking whether pthreads work without any flags... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... no
checking for PTHREAD_PRIO_INHERIT... yes
checking whether Intel CET is enabled... no
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating libpcre2-8.pc
config.status: creating libpcre2-16.pc
config.status: creating libpcre2-32.pc
config.status: creating libpcre2-posix.pc
config.status: creating pcre2-config
config.status: creating src/pcre2.h
config.status: creating src/config.h
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing script-chmod commands
config.status: executing delete-old-chartables commands
pcre2-10.39 configuration summary:
Install prefix ..................... : /Users/sbarex/Downloads/pcre2-32
C preprocessor ..................... :
C compiler ......................... : gcc
Linker ............................. : /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
C preprocessor flags ............... :
C compiler flags ................... : -O2 -fvisibility=hidden
Linker flags ....................... :
Extra libraries .................... :
Build 8-bit pcre2 library .......... : no
Build 16-bit pcre2 library ......... : no
Build 32-bit pcre2 library ......... : yes
Include debugging code ............. : no
Enable JIT compiling support ....... : yes
Use SELinux allocator in JIT ....... : unsupported
Enable Unicode support ............. : yes
Newline char/sequence .............. : lf
\R matches only ANYCRLF ............ : no
\C is disabled ..................... : no
EBCDIC coding ...................... : no
EBCDIC code for NL ................. : n/a
Rebuild char tables ................ : no
Internal link size ................. : 2
Nested parentheses limit ........... : 250
Heap limit ......................... : 20000000 kibibytes
Match limit ........................ : 10000000
Match depth limit .................. : MATCH_LIMIT
Build shared libs .................. : no
Build static libs .................. : yes
Use JIT in pcre2grep ............... : yes
Enable callouts in pcre2grep ....... : yes
Enable fork in pcre2grep callouts .. : yes
Initial buffer size for pcre2grep .. : 20480
Maximum buffer size for pcre2grep .. : 1048576
Link pcre2grep with libz ........... : no
Link pcre2grep with libbz2 ......... : no
Link pcre2test with libedit ........ : no
Link pcre2test with libreadline .... : no
Valgrind support ................... : no
Code coverage ...................... : no
Fuzzer support ..................... : no
Use %zu and %td .................... : auto
Also if i try to configure an host with the same CPU architecture (like x86_64-apple-macos10.15
) the shared library cannot be created.
If I remove the --host
argument the shared library is build (for current arch):
$ ./configure --enable-jit --enable-pcre2-32 --disable-pcre2-8 --prefix=/Users/sbarex/Downloads/pcre2-32 -enable-shared --disable-static
checking for a BSD-compatible install... /usr/local/bin/ginstall -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/local/bin/gmkdir -p
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for wchar.h... yes
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking whether _XOPEN_SOURCE should be defined... no
checking for ar... ar
checking the archiver (ar) interface... ar
checking for int64_t... yes
checking build system type... x86_64-apple-darwin21.2.0
checking host system type... x86_64-apple-darwin21.2.0
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/local/bin/gsed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
checking if the linker (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) is GNU ld... no
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 786432
checking how to convert x86_64-apple-darwin21.2.0 file names to x86_64-apple-darwin21.2.0 format... func_convert_file_noop
checking how to convert x86_64-apple-darwin21.2.0 file names to toolchain format... func_convert_file_noop
checking for /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for archiver @FILE support... no
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... no
checking if : is a manifest tool... no
checking for dsymutil... dsymutil
checking for nmedit... nmedit
checking for lipo... lipo
checking for otool... otool
checking for otool64... no
checking for -single_module linker flag... yes
checking for -exported_symbols_list linker flag... yes
checking for -force_load linker flag... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... yes
checking for gcc option to produce PIC... -fno-common -DPIC
checking if gcc PIC flag -fno-common -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) supports shared libraries... yes
checking dynamic linker characteristics... darwin21.2.0 dyld
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
checking whether ln -s works... yes
checking whether the -Werror option is usable... yes
checking for simple visibility declarations... yes
checking for __attribute__((uninitialized))... yes
checking for limits.h... yes
checking for sys/types.h... (cached) yes
checking for sys/stat.h... (cached) yes
checking for dirent.h... yes
checking for windows.h... no
checking for sys/wait.h... yes
checking for an ANSI C-conforming const... yes
checking for size_t... yes
checking for bcopy... yes
checking for memfd_create... no
checking for memmove... yes
checking for mkostemp... yes
checking for realpath... yes
checking for secure_getenv... no
checking for strerror... yes
checking for zlib.h... yes
checking for gzopen in -lz... yes
checking for bzlib.h... yes
checking for libbz2... yes
checking whether pthreads work with -pthread... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... -D_THREAD_SAFE
checking for PTHREAD_PRIO_INHERIT... yes
checking whether Intel CET is enabled... no
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating libpcre2-8.pc
config.status: creating libpcre2-16.pc
config.status: creating libpcre2-32.pc
config.status: creating libpcre2-posix.pc
config.status: creating pcre2-config
config.status: creating src/pcre2.h
config.status: creating src/config.h
config.status: src/config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing script-chmod commands
config.status: executing delete-old-chartables commands
pcre2-10.39 configuration summary:
Install prefix ..................... : /Users/sbarex/Downloads/pcre2-32
C preprocessor ..................... :
C compiler ......................... : gcc
Linker ............................. : /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
C preprocessor flags ............... :
C compiler flags ................... : -D_THREAD_SAFE -pthread -O2 -fvisibility=hidden
Linker flags ....................... :
Extra libraries .................... :
Build 8-bit pcre2 library .......... : no
Build 16-bit pcre2 library ......... : no
Build 32-bit pcre2 library ......... : yes
Include debugging code ............. : no
Enable JIT compiling support ....... : yes
Use SELinux allocator in JIT ....... : unsupported
Enable Unicode support ............. : yes
Newline char/sequence .............. : lf
\R matches only ANYCRLF ............ : no
\C is disabled ..................... : no
EBCDIC coding ...................... : no
EBCDIC code for NL ................. : n/a
Rebuild char tables ................ : no
Internal link size ................. : 2
Nested parentheses limit ........... : 250
Heap limit ......................... : 20000000 kibibytes
Match limit ........................ : 10000000
Match depth limit .................. : MATCH_LIMIT
Build shared libs .................. : yes
Build static libs .................. : no
Use JIT in pcre2grep ............... : yes
Enable callouts in pcre2grep ....... : yes
Enable fork in pcre2grep callouts .. : yes
Initial buffer size for pcre2grep .. : 20480
Maximum buffer size for pcre2grep .. : 1048576
Link pcre2grep with libz ........... : no
Link pcre2grep with libbz2 ......... : no
Link pcre2test with libedit ........ : no
Link pcre2test with libreadline .... : no
Valgrind support ................... : no
Code coverage ...................... : no
Fuzzer support ..................... : no
Use %zu and %td .................... : auto
From pcre2test
and the docs, I don't see a way to name the whole string match, i.e. begin the pattern with a ?<name>
variant, which would be useful. I realise it's possible to wrap the entire expression in a named subpattern, but that creates an unnecessary extra match group.
It seems like it would be a backward-compatible change as beginning a pattern with ?
currently produces an error like the following, so there can't be any valid patterns that would begin with those sequences.
Failed: error 109 at offset 0: quantifier does not follow a repeatable item
Previously when I installed PCRE from ftp.pcre.org I would verify the signature with https://ftp.pcre.org/pub/pcre/Public-Key . Now I am migrating to PCRE2 -- where can I find the key to verify the signature for the PCRE2 github releases?
Please could you republish the public key for pcre - without this it is impossible to verify pcre releases to prevent supply chain vulnerabilities.
It was previously on ftp.pcre.org at https://ftp.pcre.org/pub/pcre/Public-Key - I see other failures in the wild.http://exim.mirror.iphh.net/ftp/pcre/Public-Key is not a valid key - the one we need is 45F68D54BBE23FB3039B46E59766E084FB0F43D8
eg I imported that key, and it
C02DC0SHMD6W:web-agents alex.levin$ gpg --import ~/Downloads/Public-Key
gpg: key 9766E084FB0F43D8: public key "Philip Hazel [email protected]" imported
gpg: Total number processed: 1
gpg: imported: 1
C02DC0SHMD6W:web-agents alex.levin$ gpg --verify libs/pcre2-10.39.tar.gz.sig ph.gpg
gpg: Signature made Fri 29 Oct 17:07:03 2021 BST
gpg: using RSA key 45F68D54BBE23FB3039B46E59766E084FB0F43D8
gpg: BAD signature from "Philip Hazel [email protected]" [unknown]
This is bug 2793 from the old Bugzilla, posted by Thomas Tempelmann, who, after some discussion, provided a proposed patch. Here is some relevant discussion and the patch:
Here's what probably happens:
At this point in the loop, instead of going on with step 2, it goes back to step 1, where it again searches ahead for 5 MB until it runs into the first "E".
I can think of several remedies:
Change the fast scan to include searching all possible options. In my example, it has to scan for both "e" and "E". I assume this benefits by using a specialized CPU instructions that can scan for a byte (because, if not, you'd simply do a loop where you get one byte and check it then against both "e" and "E")? So, what you'd need to do is to use that search operation in small ranges, e.g. over 1000 bytes, looking for "e", and then the same 1000 bytes looking for "E". If none hits, move forward. But if one hits, the nearer one is processed (and the farther one's position can be cached so that you won't need to search again for it until you've moved there).
But currently, instead, I suspect it's searching for "E" and eventually gives up when it reaches the cut-off point you mentioned. But then repeats the same long search again and again.
So, the next possible optimization, which may be much easier to implement than the first suggestion, is to simply cache the point at which the "E" was found or not, and then not repeat looking for "E" before that point.
Actually, can you tell me where this happens (if you don't have time to look now, can you give me some pointers where to look)? I like to try the caching myself, it shouldn't be too hard I hope.
Thru the macro option to suppress the fast scan, I located the relevant code areas.
Around line 6800 in pcre2_match.c the comment explains that, in caseless mode, it does indeed consider looking for both cases.
Alright, it's as I suspected:
First off, the code already does what I suggested to do: Scan for both "e" and "E" and then process the nearer one.
The "bug" is that the found locations are not cached, so the next time both chars are searched again from the current position, even if it has already been determined that there's no such char for a while.
Adding some caching for both found locations should fix this.
1.
Change
BOOL memchr_not_found_first_cu;
BOOL memchr_not_found_first_cu2;
into
PCRE2_SPTR memchr_found_first_cu;
PCRE2_SPTR memchr_found_first_cu2;
Change
memchr_not_found_first_cu = FALSE;
memchr_not_found_first_cu2 = FALSE;
into
memchr_found_first_cu = NULL;
memchr_found_first_cu2 = NULL;
Change
if (!memchr_not_found_first_cu)
{
pp1 = memchr(start_match, first_cu, end_subject - start_match);
if (pp1 == NULL) memchr_not_found_first_cu = TRUE;
else cu2size = pp1 - start_match;
}
/* If pp1 is not NULL, we have arranged to search only as far as pp1,
to see if the other case is earlier, so we can set "not found" only
when both searches have returned NULL. */
if (!memchr_not_found_first_cu2)
{
pp2 = memchr(start_match, first_cu2, cu2size);
memchr_not_found_first_cu2 = (pp2 == NULL && pp1 == NULL);
}
into
if (start_match <= memchr_found_first_cu) {
pp1 = memchr_found_first_cu;
if (pp1 == end_subject) {
pp1 = NULL;
}
} else {
pp1 = memchr(start_match, first_cu, cu2size);
if (pp1 == NULL) {
memchr_found_first_cu = end_subject;
} else {
memchr_found_first_cu = pp1;
}
}
/* If pp1 is not NULL, we have arranged to search only as far as pp1,
to see if the other case is earlier, so we can set "not found" only
when both searches have returned NULL. */
if (start_match <= memchr_found_first_cu2) {
pp2 = memchr_found_first_cu2;
if (pp2 == end_subject) {
pp2 = NULL;
}
} else {
pp2 = memchr(start_match, first_cu2, cu2size);
if (pp2 == NULL) {
memchr_found_first_cu2 = end_subject;
} else {
memchr_found_first_cu2 = pp2;
}
}
This means two changes to the algorithm:
Instead of using a flag to tell whether memchr() found something, it'll now store the last found position and re-use that as long as the start_match pointer is still behind.
The size parameter for the second memchr() is not getting reduced any more (formerly, if the first found a location, the second one would be limited to search until there). Since we now cache each location, there's no need to shorten the searches any more.
The exim
mailer daemon switched to pcre2
in version 4.95. Ever since, the exim
daemon crashes with a Bus Error
on SPARC on Linux:
glaubitz@gcc202:~/exim$ ./src/build-Linux-sparc64/exim
Bus error
glaubitz@gcc202:~/exim$
Bisecting the issue lead to the change which switched exim
from pcre
to pcre2
(Exim/exim@22ed7a5).
Running the exim
binary in gdb
, lead to the following backtrace:
(gdb) bt
#0 pcre2_general_context_create_8 (private_malloc=0x1000004f680 <function_store_malloc>,
private_free=0x1000004f650 <function_store_free>, memory_data=0x0) at src/pcre2_context.c:123
#1 0x00000100000517a8 in main ()
(gdb)
Access to a SPARC machine running Linux or Solaris can be obtained through the GCC Compile Farm, see: https://gcc.gnu.org/wiki/CompileFarm
yocto dunfell, a LTS version of yocto, is complaining about a missing release:
WARNING: libpcre2-native-10.34-r0 do_fetch: Failed to fetch URL https://github.com/PhilipHazel/pcre2/releases/download/pcre2-10.34/pcre2-10.34.tar.bz2, attempting MIRRORS if available
I noted that this release has been removed from github.
I don't know what the cause of this might be. I have a thirdparty port of pcre2 to the Meson build system (for embedding into other projects using meson) and as part of the port we try to run the testsuite.
For reference, here is the meson build description: https://github.com/mesonbuild/wrapdb/blob/pcre2/subprojects/packagefiles/pcre2/meson.build
In our Github CI, everything works fine on Ubuntu. On Windows using MSVC, I get this error instead:
https://github.com/mesonbuild/wrapdb/runs/4637121959?check_suite_focus=true#step:6:261
Test 8: "Internal offsets and code size tests"
failed comparison: fc /n D:\a\wrapdb\wrapdb\subprojects\pcre2-10.39\testdata\testoutput8-8-2 testout8\testoutput8-8-2
Do you have any idea what the problem might be? Suggestions for figuring out the problem?
Is this an expected failure? There isn't currently any publicly visible CI for pcre2 (and an open PR only adds some for linux) so it is difficult to know for sure...
This would enable expansion of the core team easily in the future, and reduce the "bus factor".
This is a first, test issue to initialize the issue tracker for the PCRE2 repo.
For me the installation of debugger pdb
files in MSVC does not work. I do not know if this is caused by some build flags that I'm using or if this is a general issue(?)
I looked a bit at CMakeLists.txt and was under the impression that the library names in the PDB install command are incorrect. Here is a patch that I found makes the install work for me. If this works and is sensible, I would be more than happy if this could be included (no copyright attached):
IF(MSVC AND INSTALL_MSVC_PDB)
- INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2.pdb
- ${PROJECT_BINARY_DIR}/pcre2posix.pdb
+ INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2-8.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-16.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-32.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-posix.pdb
DESTINATION bin
CONFIGURATIONS RelWithDebInfo)
- INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2d.pdb
- ${PROJECT_BINARY_DIR}/pcre2posixd.pdb
+ INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2-8d.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-16d.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-32d.pdb
+ ${PROJECT_BINARY_DIR}/pcre2-posixd.pdb
DESTINATION bin
CONFIGURATIONS Debug)
ENDIF(MSVC AND INSTALL_MSVC_PDB)
Disclaimer: I know it is affecting legacy PCRE, but because the bug is heavy, but PCRE is still using in many applications (e. g. in exim or nginx) and since it is not possible to report a bug in exim bug tracker for PCRE and even pcre.org suggests to report bug here...
Non-jit'ed pcre_exec
can cause SF/SO if called with greedy RE containing two char-tokens group with most used quantifiers matching large block of input strings (ca. 12K for x86 and ca. 34K for x64).
It seems to enter endless recursion, so can cause stack overflow or segmentation fault conditionally.
Simplest PoC using pcregrep
:
for 32-bit compiled engine it is enough to supply 12k buffer which would match RE:
$ printf '%10000s' | pcregrep -c --no-jit '(?:\s\s)+'
1
$ printf '%12000s' | pcregrep -c --no-jit '(?:\s\s)+'
Segmentation fault
$ printf '%10000s' | tr ' ' x | pcregrep -c --no-jit '(?:\w\w)+'
1
$ printf '%12000s' | tr ' ' x | pcregrep -c --no-jit '(?:\w\w)+'
Segmentation fault
for 64-bit version one need to supply 34k buffer which would match:
$ printf '%30000s' | pcregrep -c --no-jit '(?:\s\s)+'
1
$ printf '%34000s' | pcregrep -c --no-jit '(?:\s\s)+'
Segmentation fault
$ printf '%30000s' | tr ' ' x | pcregrep -c --no-jit '(?:\w\w)+'
1
$ printf '%34000s' | tr ' ' x | pcregrep -c --no-jit '(?:\w\w)+'
Segmentation fault
To cause this the REs must contain 2 tokens in repeatable group like \s.
or .\w
and similar and does not segfault with 1, 3 or 4 tokens (in that simplest form). The quantifier can be *
but the group must match long input anyway, for example this would segfault also:
printf '%34000s' x | pcregrep -c --no-jit '(?:\s.)*x'
It is only affecting non-jit compiled (studied) REs.
This is #2680 in the old Bugzilla, submitted by Nam Nguyen.
RunGrepTest fails on Test 132 for me on OpenBSD.
./testdata/grepoutput expects an 'a' at the end.
---------------------------- Test 132 -----------------------------
match 1:
a
match 2:
b
---
a
RC=0
I actually get no 'a' when I run RunGrepTest:
---------------------------- Test 132 -----------------------------
match 1:
a
match 2:
b
---
RC=0
Manually running this test I can't see how it produces 'a' after the '---':
$ (pcre2grep -m1 -A3 '^match'; echo '---'; head -1) < testdata/grepinput
match 1:
a
match 2:
b
---
PH: The idea of this test is to check that the standard input is left in the right place when pcre2grep stops because it has reached the -m limit. The "a" line is generated by the "head -1" command when run under Linux (which is all I have):
$ (./pcre2grep -m1 -A3 '^match'; echo '---'; head -1) < testdata/grepinput
match 1:
a
match 2:
b
---
a
Looks like this is yet another Linux/BSD difference. Sigh. I think the relevant code is around line 2589 in pcre2test.c:
/* If the -m option set a limit for the number of matched or non-matched
lines, check it here. A limit of zero means that no matching is ever done.
For stdin from a file, set the file position. */
if (count_limit >= 0 && count_matched_lines >= count_limit)
{
if (frtype == FR_PLAIN && filename == stdin_name && !is_file_tty(handle))
(void)fseek(handle, (long int)filepos, SEEK_SET);
rc = (count_limit == 0)? 1 : 0;
break;
}
This is #2756 in the old Bugzilla, submitted by S. Shuck.
The DFA example in the docs demonstrating finding every match does not work as expected (details omitted).
PH: This is not a bug, but a misunderstanding. You used match_data_create_from_code() to set up a match data block. As your pattern contains no capturing parentheses, this will create a block with a very small ovector (enough to hold just the whole match, no captured groups). However, when you use the DFA matcher, the ovector is used in a different way, as explained in the pcre2api page:
"On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for
\fBpcre2_match()\fP, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support capturing."
As your example should yield 3 matches, the ovector is not big enough, and therefore the yield is zero. If you change the match data creation to create a match data block with at least 3 ovector pairs, your example should return 3.
SS: Thanks for the insight. I'm unblocked for the moment.
The docs for pcre2_match_data_create_from_pattern() says "The ovector is created to be exactly the right size to hold all the substrings a pattern might capture." I guess I could have figured out that this number is not computable in the general case for DFA matching. Nevertheless, this sentence is false without a disclaimer about this case.
PH: Yes, I've noted that the documentation needs clarification, but it's too late for 10.37, which has been released today. I'll update the doc in due course - I suspect that DFA matching is in practice not used very much.
This is bug 2792 from the old Bugzilla, posted by firas. Perl used to allow \K in lookarounds, but it now throws an error. PCRE2 currently supports \K in positive lookarounds, and ignores it in negative ones. However, naive implementations can cause loops. After some discussion on the old list, the following was my (PH) conclusion:
I should have looked more closely at the code in pcre2demo. It has special code to deal with this case. Here is the comment:
/* If the previous match was not an empty string, there is one tricky case to
consider. If a pattern contains \K within a lookbehind assertion at the
start, the end of the matched string can be at the offset where the match
started. Without special action, this leads to a loop that keeps on matching
the same substring. We must detect this case and arrange to move the start on
by one character. The pcre2_get_startchar() function returns the starting
offset that was passed to pcre2_match(). */
OK, so now all is understood (pcre2test no doubt does the same). Perhaps the best thing to do here is to forbid \K in assertions, but to implement a new option in the PCRE2_EXTRA series to allow the current implementation. Then anyone who really needs the current behaviour can get it. We can put lots of warnings in the docs.
This was #2770 in the old Bugzilla. The poster said:
You notice that pcre2test gives an error when attempting to use the delimiter within \Q..\E, but accepted the pattern when I escaped it as: [-+*/]
My (PH) response was this:
This is perhaps a lack of detail in the documentation, no more. You are using pcre2test, which is a program for testing the PCRE2 library, and running small regex tests. The way it works is to identify a pattern by delimiters, before passing it to the library for interpretation. It makes no interpretation of the pattern itself, except that, if '\' is encountered, the next character is not checked for being the delimiter. This is an easy fudge for simple cases. I see no reason why pcre2test should implement sophisticated regex parsing such as \Q...\E interpretation itself. Note also that pcre2test is not intended for use in any kind of production situation. Sometime before the next release I will take a look at the documentation to see if it can be made more clear.
Hi, sorry for stupid non pcre2 questions here ;-)
In the releases, there's a .sig
file where I can find a verification signature for the tar-balls. Trying to verify it failed until now, because I'm unable to find an importable key for it yet. GPG itself claims
gpg: key 4AEE18F83AFDEB23: new key but contains no user ID - skipped
for a random keyserver I tried. Loading the key from github tells me:
$ curl https://github.com/PhilipHazel.gpg
-----BEGIN PGP PUBLIC KEY BLOCK-----
Note: This user hasn't uploaded any GPG keys.
=twTO
-----END PGP PUBLIC KEY BLOCK-----%
which seems odd, because you've verified commits and signed tar-balls. Any hints?
Hi,
I'm a developer in the Qt project which uses pcre2. I think I found an issue. Do demonstrate this, I'll use your oss-fuzz image with a changed fuzz target which mimics how Qt uses your library. I did not change anything in pcre2 itself. Should any of my steps be incorrect, please let me know.
python infra/helper.py build_image pcre2
python infra/helper.py build_fuzzers --engine libfuzzer --sanitizer address --architecture x86_64 pcre2
python infra/helper.py reproduce pcre2 pcre2_fuzzer <input_file>
Running: /testcase
=================================================================
==18==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6030000001a1 at pc 0x00000058a03f bp 0x7ffd1c26fab0 sp 0x7ffd1c26faa8
READ of size 1 at 0x6030000001a1 thread T0
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
#0 0x58a03e in get_ucp /src/pcre2/src/pcre2_compile.c
#1 0x56f0a6 in parse_regex /src/pcre2/src/pcre2_compile.c:3152:14
#2 0x56452e in pcre2_compile_8 /src/pcre2/src/pcre2_compile.c:10147:13
#3 0x55e3df in LLVMFuzzerTestOneInput /src/pcre2/src/pcre2_fuzzsupport.c:68:23
#4 0x455283 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) cxa_noexception.cpp
#5 0x440ec2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
#6 0x44671c in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) cxa_noexception.cpp
#7 0x46f522 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
#8 0x7fce9188c0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#9 0x41f67d in _start (/out/pcre2_fuzzer+0x41f67d)DEDUP_TOKEN: get_ucp--parse_regex--pcre2_compile_8
0x6030000001a1 is located 0 bytes to the right of 17-byte region [0x603000000190,0x6030000001a1)
allocated by thread T0 here:
#0 0x52510d in __interceptor_malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
#1 0x436e97 in operator new(unsigned long) cxa_noexception.cpp
#2 0x440ec2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
#3 0x44671c in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) cxa_noexception.cpp
#4 0x46f522 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
#5 0x7fce9188c0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)DEDUP_TOKEN: __interceptor_malloc--operator new(unsigned long)--fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long)
SUMMARY: AddressSanitizer: heap-buffer-overflow /src/pcre2/src/pcre2_compile.c in get_ucp
Shadow bytes around the buggy address:
0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff8000: fa fa 00 00 00 fa fa fa 00 00 00 00 fa fa 00 00
0x0c067fff8010: 00 fa fa fa 00 00 00 fa fa fa 00 00 00 fa fa fa
0x0c067fff8020: fd fd fd fa fa fa 00 00 00 00 fa fa 00 00 01 fa
=>0x0c067fff8030: fa fa 00 00[01]fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==18==ABORTING`
I'd appreciate if you could have a look into this. If I call your code incorrectly, please tell me so I can correct this in Qt.
Hello,
the docs say matching has the following option flag:
PCRE2_ENDANCHORED Pattern can match only at end of subject
but it seems to be undefined. PCRE2_ANCHORED on the other hand works as expected, so I wanted to check if this is a known issue.
Recently I've installed PHP8.1.0RC6 and Laravel 9 to try out some of the new features. I was unable to load any pages using Laravel, as Apache would crash and I would get a connection reset message.
I've posted a bug report on Laravel's repository here. We've tracked down the issue to a regex. I was suggested to post the bug on php.net, which I did submit. After further investigation, it appears that the issue is related to a recent patch to this repository.
The detailed bug description can be found in the links below, but just as a short explanation, the following piece of code will crash Apache ( using 2.4.51 ) when pcre.jit = 1
but does not cause a problem when it's executed via CLI:
var_dump(
preg_match(
'(([\\r\\n]{1,1000})|([^\\S\\r\\n]{1,1000})|(\\\\)|(\')|(")|(\\#)|(\\$)|(([^(\\s\\\\\'"\\#\\$)]|\\(|\\)){1,1000}))A',
'Laravel',
$matches
)
);
Bug report on Laravel repository:
laravel/framework#39716
Bug report on bugs.php.net:
https://bugs.php.net/bug.php?id=81647
When PHP is compiled with pcre2-10.39 (currently packaged version), but pcre2-10.36 (previously packaged version), there's a BC-break:
=== pcre2-10.35 ===
PHP Warning: preg_match(): Compilation failed: unrecognised compile-time option bit(s) at offset 0 in Command line code on line 1
=== pcre2-10.36 ===
PHP Warning: preg_match(): Compilation failed: unrecognised compile-time option bit(s) at offset 0 in Command line code on line 1
=== pcre2-10.37 ===
PHP Warning: preg_match(): Compilation failed: unrecognised compile-time option bit(s) at offset 0 in Command line code on line 1
=== pcre2-10.38 ===
=== pcre2-10.39 ===
This breaks just too many things, so perhaps either the change need to be reverted/fixed or SONAME bumped.
I manually bisected the problem to be introduced in the 21c2669 commit:
ondrej@calcifer:~/Projects/tmp/pcre2 ((eea410b...))$
PHP Warning: preg_match(): Compilation failed: unrecognised compile-time option bit(s) at offset 0 in Command line code on line 1
vs nothing printed here:
ondrej@calcifer:~/Projects/tmp/pcre2 ((21c2669...))$
e.g. PHP compiled with pcre2 that includes 21c2669 doesn't run without warning when linked with pcre2 that doesn't include 21c2669.
Hi guys like title exist a guide for install PCRE2 in GCC codeblocks mingw in windows ?? thanks at all
This is #2725 from the old Bugzilla.
PH: It is documented at the end of pcre2pattern.3 that COMMIT, PRUNE, and SKIP are confined within a subroutine call in PCRE2, and just cause it to fail to match. I cannot remember why this is so. Subroutine calls appeared in PCRE before they did in Perl, so it might be that this behaviour dates from then, but it might also be because Perl has exhibited some conflicting behaviour in the past.
PH: Experiments show certain inconsistencies in Perl, which documents that (*ACCEPT) stays within a subroutine call, but is not explicit about the others, though it does state that a subroutine is processed as an independent subpattern. For the moment, we are not going to change anything in PCRE, partly because though this is an easy change in the interpreter, it is a substantial upgrade for the JIT.
OnlineCop wrote: The language of pcre2pattern.3 states:
(*SKIP)
This verb, when given without a name, is like (*PRUNE), except that if the pattern is unanchored, the "bumpalong" advance is not to the next character, but to the position in the subject where (*SKIP) was encountered. (*SKIP) signifies that whatever text was matched leading up to it cannot be part of a successful match if there is a later mismatch.
(*FAIL) in a group called as a subroutine has its normal effect: it forces an immediate backtrack.
(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when triggered by being backtracked to in a group called as a subroutine. There is then a backtrack at the outer level.
There is no mention of (*SKIP) or (*PRUNE) being unable to modify the bumpalong of the outer level.
Perl appears to modify the bumpalong before the subroutine match fails, which (like PCRE2) then causes a backtrack at the outer level.
I believe that all other verbs, including (*ACCEPT), are fine to stay as-is, and the only change here being that (*SKIP) should be able to modify the outer level's bumpalong advance.
If I run the following:
tar -xf pcre2-10.39.tar.bz2
cd pcre2-10.39
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/tmp/pcre2test -G Ninja ..
ninja
ninja install
ninja install
prints the following:
-- Installing: /tmp/pcre2test/man/man3/pcre2syntax.3
-- Installing: /tmp/pcre2test/man/man3/pcre2unicode.3
-- Up-to-date: /tmp/pcre2test/man/man3/pcre2unicode.3
This is because pcre2unicode.3 is listed twice in cmake_install.cmake
. It's also listed twice in install_manifest.txt
, which means that if you uninstall using xargs rm < install_manifest.txt
as recommended by CMake, you get an error.
Hi!
There is an error when matching the following pattern:
a(.|\s)*?asdf
against:
a b b bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbf
Specifically: a, followed by newline, followed by 16 spaces, followed by b, space, b, space, 35 b's, then an f.
pcregrep 'a(.|\s)*?asdf' returns:
pcregrep: pcre_exec() gave error -8 while matching this text:
a b b bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbf
pcregrep: Error -8, -21 or -27 means that a resource limit was exceeded.
pcregrep: Check your regex for nested unlimited loops.
Using pcre2 10.37_1, as installed on macOS 11.6 via Homebrew.
Am I missing something obvious? This should work, right?
perl doesn't match '\n' with '.' unless the "s" modifier is provided and regardless of what the input separator is as shown by:
$ printf '\n\na\0' | perl -ne 'BEGIN { $/="\0" } /(?<=\n)(.*)$/ and print $1' | od -c | head -1
0000000 a \0
$ printf '\n\na\0' | perl -ne 'BEGIN { $/="\0" } /(?<=\n)(.*)$/s and print $1' | od -c | head -1
0000000 \n a \0
GNU grep (that uses the old PCRE) shows a similar behaviour when using NUL as a line delimiter (-z is for NUL separated input, and also sets the line terminator of output, just like perl's "-0 -l") as shown by GNU grep
$ printf '\n\na\0' | ggrep -Pzo '(?<=\n).*$' | od -c | head -1
0000000 a \0
but PCRE2 does not, if the newline is not LF (or a compatible ANY or ANYCRLF) and that is actually validated by the testsuite (set 2) and IMHO makes more sense, but that will prevent grep (that has been recently updated to pcre2 on its unreleased version) to use PCRE2_NEWLINE_NUL as this change of behaviour might be considered a regression.
$ printf '\n\na\0' | pcre2grep -o -NNUL '(?<=\n).*$' | od -c | head -1
0000000 \n a \n
the documentation explicly says that no changes on the matching are expected when the newline definition is changed, and when newline is '\n' an equivalent "s" mode is provided through PCRE2_DOTALL making the result the same than perl for the modes that have '\n' as a valid new line delimiter, but not if CR or NUL are used, so there is at least a possibility this might be a "bug"?
FWIW, confirmed at least it is not a regression, as 8.x, while not having NUL, behaves the same when using CR and which is consistent with the behaviour observed in 10.x
Trying to install an agent on my servers but i have two servers with the same error and i just can't seem to figure it out :S Could u help?
I've installed packages: wget git vim unzip make gcc build-essential php php-cli php-common libapache2-mod-php apache2-utils inotify-tools libpcre2-dev zlib1g-dev libz-dev libssl-dev libevent-dev libssl-dev
I also have: pcre2-10.32 and pcre2-10.39 within /ossec-hids/scr/external
I've tried 'make clean' within /src
I downloaded the ossec-hids from: git clone https://github.com/ossec/ossec-hids.git and I'm using version 3.6.0 C:
I'm still pretty new to networking and so i'm sure I'm overlooking something C:
5- Installing the system
- Running the Makefile
cc -I./external/compat -DMAX_AGENTS=2048 -DOSSECHIDS -DDEFAULTDIR=\"/var/ossec\" -DUSER=\"ossec\" -DREMUSER=\"ossecr\" -DGROUPGLOBAL=\"ossec\" -DMAILUSER=\"ossecm\" -DLinux -DINOTIFY_ENABLED -DHAVE_SYSTEMD -DZLIB_SYSTEM -DUSE_PCRE2_JIT -DLIBOPENSSL_ENABLED -DCLIENT -Wall -Wextra -I./ -I./headers/ client-agent/agentd.o client-agent/config.o client-agent/event-forward.o client-agent/intcheck_op.o client-agent/main.o client-agent/notify.o client-agent/receiver.o client-agent/receiver-win.o client-agent/sendmsg.o client-agent/start_agent.o os_crypto.a config.a shared.a os_net.a os_regex.a os_xml.a os_zlib.a -lm -lpthread -lsystemd -lpcre2-8 -lssl -lcrypto -lz ./external/compat/imsg.c ./external/compat/imsg-buffer.c -o ossec-agentd
/usr/bin/ld: cannot find -lsystemd
collect2: error: ld returned 1 exit status
make: *** [Makefile:1018: ossec-agentd] Error 1
I usually prefer to support ptr:NULL, length:0
for string manipulations (all string.h
functions support it) but it looks like this is not the case for pcre2. I wouldn't mind supporting it though.
Originally posted by @zherczeg in #53 (comment)
Hi,
I am not sure if this is the right place to raise this issue.
On CentOS 7, when we install pcre2 with the below command.
yum install pcre2
It doesn't; install pcre2.h
file along with other files like .so
.
Am I doing something wrong? or its a bug in CentOS packaging?
Also, I tried installing using rpm
, same issue no luck.
https://centos.pkgs.org/7/centos-x86_64/pcre2-10.23-2.el7.x86_64.rpm.html
As this page shows under Files header it also doesn't install pcre2.h file.
Thanks in advance for help.
This is issue 2794 from the old Bugzilla, posted by Thomas Tempelmann. This is the proposed patch:
Assuming this is line 3330:
char buffer[FNBUFSIZ];
Then please rename "buffer" into "childpath" for better readability.
Then insert right after 3352 ("sprintf(..."):
#if 1 // <-- replace with test for Linux and BSD (macOS/Darwin)
// prevent endless recursion due to a symlink pointing to a parent dir (Bug 2794)
char resolvedpath[PATH_MAX];
if (realpath(childpath, resolvedpath) == NULL)
continue; // this path is invalid - we can skip processing this
BOOL isSame = strcmp(pathname, resolvedpath) == 0;
if (isSame)
continue; // we have a recursion
strlcat(resolvedpath, "/", sizeof(resolvedpath));
size_t rlen = strlen(resolvedpath);
BOOL contained = strncmp(pathname, resolvedpath, rlen) == 0;
if (contained)
continue; // we have a recursion
resolvedpath[rlen-1] = 0; // removes the added "/"
strlcpy(childpath, resolvedpath, sizeof(childpath));
#endif
I've tested this to work successfully on macOS with my screwed-up symlink.
The tricky part is to tell whether the resolved path is pointing back to where we already were.
With the first strcmp() I check whether it equals the parent current directory path. This assumes, though, that "pathname" is also already resolved - but if the user passed an unresolved path that points to itself, this recursion detection will fail the first time, but then, in the recursion, the paths will be the same (because I replace childpath with the resolved path further down) and thus the recursion will be stopped. You could just as well also resolve the path at the top of the function, but that's wasteful, IMO.
The second strcmp then checks of the resolved path exists as a the current path's parent or their parent. I do this by adding a "/" to the resolved path so that I do not mismatch the case where the resolved is "/a/b" and the parent is "/a/b2".
Hi,
I don't suppose there's any chance of a porting guide from pcre1 to pcre2, is there, please?
I know you want to be shot of pcre1; I've recently filed bugs against the outstanding packages in Debian which still Build against pcre1, and there are a lot of responses of the form "is there any guidance on porting to pcre2?" I don't feel I have deep enough knowledge of the two libraries (especially the older one) to do so myself, but I think having something to point folk at might help in getting more of the remaining ~200(!) packages that still need old-pcre ported, which in turn will make it plausible for me to drop old-pcre from Debian...
Thanks :)
Hi!
This regex:
(((?>[^()]+)|(?R))*)
Causes pcregrep/pcre2grep to bail immediately with error -46:
> pcre2grep '(((?>[^()]+)|(?R))*)' test.txt
pcre2grep: pcre2_match() gave error -46 while matching this text:
...
I have a function that is searching for the nearest named group by name by measuring the distance from the nametable and last capture_last callout structure field.
It seems to be buggy.
Like how do I get the most recent (actually only one normally in the current subroutine) named group when duplicate names are allowed?
with JIT enabled, dc5f966 triggers a bug:
re /(?P<size>\d+)m|M/
with data 4M
will not match, while it should return 'M' (which it does without JIT)
I tried to install some lua modules for OpenResty, in the installation process, it download something from ftp.pcre.org, and the connection could not be established.
I tried many time, and the error is:
#5 594.7 curl: (7) Failed to connect to ftp.pcre.org port 443 after 76416 ms: Connection refused
Please see more detail in openresty/docker-openresty#193
The fix in OpenResty: openresty/docker-openresty@a3a1a3a
Hello!
I am trying compile on Windows by conan.
Library: pcre/8.45
Operating System: Windows 10 (x86)
Compiler version: VS 15
Conan version: conan 1.43.0
Python version: Python 3.8.0
Conan profile:
C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools>conan profile show default
Configuration for profile default:
[settings]
os=Windows
os_build=Windows
arch=x86
arch_build=x86
compiler=Visual Studio
compiler.version=15
build_type=Debug
Error: Full log
pcregrep.obj : error LNK2019: nierozpoznany zewnŕtrzny symbol _BZ2_bzopen@8 przywo│any w funkcji _grep_or_recurse [C:\Users\mhanu\.conan\data\pcre\8
.45\_\_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolder\source_subfolder\pcregrep.vcxproj]
pcregrep.obj : error LNK2019: nierozpoznany zewnŕtrzny symbol _BZ2_bzread@12 przywo│any w funkcji _pcregrep [C:\Users\mhanu\.conan\data\pcre\8.45\_\
_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolder\source_subfolder\pcregrep.vcxproj]
pcregrep.obj : error LNK2019: nierozpoznany zewnŕtrzny symbol _BZ2_bzclose@4 przywo│any w funkcji _grep_or_recurse [C:\Users\mhanu\.conan\data\pcre\
8.45\_\_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolder\source_subfolder\pcregrep.vcxproj]
pcregrep.obj : error LNK2019: nierozpoznany zewnŕtrzny symbol _BZ2_bzerror@8 przywo│any w funkcji _grep_or_recurse [C:\Users\mhanu\.conan\data\pcre\
8.45\_\_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolder\source_subfolder\pcregrep.vcxproj]
C:\Users\mhanu\.conan\data\pcre\8.45\_\_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolder\bin\pcregrep.exe : fatal error LNK1120: licz
ba nierozpoznanych elementˇw zewnŕtrznych: 4 [C:\Users\mhanu\.conan\data\pcre\8.45\_\_\build\752948ce548bd345ffbb13d47bc67547d791cc18\build_subfolde
r\source_subfolder\pcregrep.vcxproj]
pcre/8.45:
pcre/8.45: ERROR: Package '752948ce548bd345ffbb13d47bc67547d791cc18' build failed
I'm forwarding this from https://bugs.php.net/81424, but it seems that this is actually a PCRE2 issue. Consider the following (bad) regex:
/[^{};\/\n]+\{\}/
When run on a large string (e.g. https://pastebin.com/WVBR4f9T), with PCRE2 10.34 JIT this was fast; with PCRE2 10.35 and later it is more than hundred times slower.
If the regex is rewritten to use a lookbehind assertion (/(?<![{};\/\n]+)\{\}/
), performance with the different PCRE2 versions is on par, so you may not consider this something to be fix-worthy. :)
There is no performance regression without JIT, so I wonder whether this regex isn't jitted anymore as of PCRE2 10.35.
This issue records several potential upgrades to the handling of character classes in PCRE2. This could be a lot of work in both the interpreters and the JIT.
The current code in the compiler has been hacked into an untidy mess and the compiled code is also messy. A revised implementation is needed that is more uniform and can better handle Unicode characters so as to make matching more efficient. For example, bitmaps could be used for runs of characters other than just 0-0xFF. Or some better coding scheme could be devised.
Perl has an experimental extended class feature as in this example:
/(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/
Any new compiled format should be able to handle such things.
Named capture groups are supported for a while.
How do you think about a software extension for the possibility to declare the binding of regular expressions to identifiers before such identifiers would be reused at other places?
This is #2767 from the old Bugzilla.
i found the source code have no such export functions to make the setting
pcre2_set_max_name_count
pcre2_set_max_name_length
i think this maybe need for some scenes, hope you to add these two export functions
as well as pcre2_config Options.
Hi,
i wanted to ask if there is a rough estimation of when 10.39 with the fix is going to be released?
We are currently using PHP 7.4 packaged from Sury (https://deb.sury.org/) which includes libpcre2-8-0 in version 10.38.
Because of the bug some of our regex do not longer work like this one:
<?php
$matches = [];
preg_match('/(^.*phv-0*([1-9]\d{4,})(\D|$).*$)|((\D|^)([1-9]\d{7})(\D|$))/is', 'Mein Vertrag 12345678', $matches);
var_dump($matches);
It would be super cool if you could release the fix (which seems to be already implemented/merged).
Came up with another project which led to it being an issue with the library itself as it also fails with pcre2test. For a full discussion see this bug:
firasdib/Regex101#1704 (comment)
php/php-src#7994
Basically given "aa" [a]{2} fails, [a]{1,3} works.
Sorry took a weird path to end up at this place as didn't realized this was just a library everybody used lol.
Hello.
I have a problem with download from conan-center.
pcre/8.45: Configuring sources in /root/.conan/data/pcre/8.45/_/_/source
ERROR: Error downloading file https://ftp.pcre.org/pub/pcre/pcre-8.45.tar.gz: 'HTTPSConnectionPool(host='ftp.pcre.org', port=443): Max retries exceeded with url: /pub/pcre/pcre-8.45.tar.gz (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f2dd13fa640>, 'Connection to ftp.pcre.org timed out. (connect timeout=60.0)'))'
Target: https://ftp.pcre.org/pub/pcre
This is a consequence of old Bugzilla #2785, by Jan-Willem Blokland . The patch in that issue was applied, but the minimum CMake version in CMakeLists.txt needs updating (it is very old). These are the final comments:
JWB: Interesting that you see this Deprecation Warning. It turns out that CMake deprecated version older than 2.8.12:
Compatibility with versions of CMake older than 2.8.12 is now deprecated and will be removed from a future version. Calls to cmake_minimum_required() or cmake_policy() that set the policy version to an older value now issue a deprecation diagnostic.
For more details see https://cmake.org/cmake/help/v3.20/release/3.19.html. We could decide to increase the minimum required version to something more recent, like version 3.0 to avoid this warning. If we do so, I am willing to make another update to the CMake build configuration.
PH: Yes, I think we can usefully up the number to 3.0.0, which was released in 2014, so it's unlikely to catch anybody. I will do it sometime.
Regarding which Unicode properties are supported, the manual says:
The property names represented by xx above are limited to the Unicode script names, the general category properties, "Any", which matches any character (including newline), and some special PCRE properties (described in the next section). Other Perl properties such as "InMusicalSymbols" are not currently supported by PCRE. Note that \P{Any} does not match any characters, so always causes a match failure.
We have users who want support for the Bidi_Control
property (semgrep/semgrep#3974), which is supported by Perl (also, by Go's regexp library). I'm not familiar with any of these implementations and I'm wondering why PCRE doesn't support all Unicode properties. Is it because they were added late and PCRE needs to catch up or for a technical reason?
Note that we're using PCRE from OCaml for which there hasn't been an effort to migrate to pcre2. So if we extend PCRE2 with support for more Unicode properties, we'll be unable to use it from OCaml unless we also port these changes to the old PCRE or we change the OCaml bindings to support the new API. It's really a separate issue but I thought I should mention it.
Hi
https://ftp.pcre.org/ is always timeout and my server cannot complete the installation (CWP7)
Please help
This is #2762 in the old Bugzilla, submitted by Milian Wolff.
This is probably related to BUG 2621 except that I'm running with PCRE2 version 10.37 2021-05-26 and the specific issue from that bug doesn't reproduce anymore.
Instead, I'm running into the following reduced issue:
works:
printf '%s\n%s\n' '/\/([^\/]+)\/\d+/' '/A/B/0' | pcre2test
PCRE2 version 10.37 2021-05-26
/\/([^\/]+)\/\d+/
/A/B/0
0: /B/0
1: B
does not work:
printf '%s\n%s\n' '/\/([^\/]+)\/\d+/' '/A/B/0' | pcre2test -jit
PCRE2 version 10.37 2021-05-26
/\/([^\/]+)\/\d+/
/A/B/0
No match
slight changes to the pattern make the issue go away
Also please note that bugzilla is missing an entry for version 10.37, as such I selected N/A for now.
This is a low-priority feature request, but one that would make sense to be done eventually.
https://github.com/PhilipHazel/pcre2/tree/master/src/sljit
Based on the files in this folder, it seems that currently PCRE supports many architectures, including x86, ARM, PPC, MIPS, SPARC, and S390X. RISC-V is a new open source ISA supported by several Linux distros, and it would be nice if PCRE supported RISC-V.
Specifically, I am only interested in 64-bit RISC-V aka RV64 (32-bit and 128-bit exist too), and as for extensions, I recommend just supporting the general purpose extensions, so -march=rv64g
(or -march=rv64gc
). The abbreviation of RISC-V to just RV is very common and is the preferred option in some situations, so feel free to use that name.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.