mattn / mruby-onig-regexp Goto Github PK
View Code? Open in Web Editor NEWmrbgem of 鬼雲's Regular Expression
mrbgem of 鬼雲's Regular Expression
On a clean mruby build from #958d5b7, with mruby-onig-regexp as the only gem added to default gemset, executing the sample code given in readme gives a segfault.
Working on OS X 10.8.4 with latest devtools.
Am I the only one?
Segmentation fault occurs in the following test when MRB_UTF8_STRING is defined.
assert_raise(ArgumentError) { "\xf0".gsub(/[^a]/,"X") }
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x107ffffe0)
* frame #0: 0x000000019a034328 libsystem_platform.dylib`_platform_memmove + 536
frame #1: 0x0000000100044ac8 mrbtest`str_init_embed(s=0x00000001200080f0, p="", len=-3) at string.c:53:10
frame #2: 0x0000000100044d68 mrbtest`str_new(mrb=0x0000000102008200, p="", len=-3) at string.c:121:12
frame #3: 0x0000000100044d0c mrbtest`mrb_str_new(mrb=0x0000000102008200, p="", len=-3) at string.c:160:24
frame #4: 0x00000001000938c8 mrbtest`str_substr(mrb=0x0000000102008200, str=(w = 4831871456), beg=4, len=-3) at mruby_onig_regexp.c:105:10
frame #5: 0x000000010008ff7c mrbtest`match_data_post_match(mrb=0x0000000102008200, self=(w = 4831871936)) at mruby_onig_regexp.c:646:10
frame #6: 0x000000010005510c mrbtest`mrb_funcall_with_block(mrb=0x0000000102008200, self=(w = 4831871936), mid=1118, argc=0, argv=0x000000016fdfd958, blk=(w = 0)) at vm.c:561:13
frame #7: 0x00000001000549c4 mrbtest`mrb_funcall_argv(mrb=0x0000000102008200, self=(w = 4831871936), mid=1118, argc=0, argv=0x000000016fdfd958) at vm.c:577:10
frame #8: 0x0000000100054924 mrbtest`mrb_funcall(mrb=0x0000000102008200, self=(w = 4831871936), name="post_match", argc=0) at vm.c:374:10
frame #9: 0x0000000100092f38 mrbtest`onig_match_common(mrb=0x0000000102008200, reg=0x000000010180a7e0, match_value=(w = 4831871936), str=(w = 4831873760), pos=0) at mruby_onig_regexp.c:199:16
frame #10: 0x000000010009056c mrbtest`string_gsub(mrb=0x0000000102008200, self=(w = 4831873760)) at mruby_onig_regexp.c:794:8
frame #11: 0x00000001000594f8 mrbtest`mrb_vm_exec(mrb=0x0000000102008200, proc=0x0000000120008b40, pc="8\U00000002") at vm.c:1636:18
frame #12: 0x0000000100056cf8 mrbtest`mrb_vm_run(mrb=0x0000000102008200, proc=0x000000012000fa10, self=(w = 4831919648), stack_keep=0) at vm.c:1131:12
frame #13: 0x0000000100055d30 mrbtest`mrb_top_run(mrb=0x0000000102008200, proc=0x000000012000fa10, self=(w = 4831919648), stack_keep=0) at vm.c:3040:12
frame #14: 0x0000000100037310 mrbtest`load_irep(mrb=0x0000000102008200, proc=0x000000012000fa10, c=0x0000000000000000) at load.c:681:10
frame #15: 0x0000000100037224 mrbtest`mrb_load_irep_cxt(mrb=0x0000000102008200, bin="RITE0300", c=0x0000000000000000) at load.c:689:10
frame #16: 0x00000001000373a0 mrbtest`mrb_load_irep(mrb=0x0000000102008200, bin="RITE0300") at load.c:701:10
frame #17: 0x00000001000058a0 mrbtest`GENERATED_TMP_mrb_mruby_enum_ext_gem_test(mrb=0x0000000100809800) at gem_test.c:588:3
frame #18: 0x0000000100004ef8 mrbtest`mrbgemtest_init(mrb=0x0000000100809800) at mrbtest.c:54:5
frame #19: 0x0000000100003d40 mrbtest`main(argc=1, argv=0x000000016fdff718) at driver.c:304:3
frame #20: 0x00000001002b10f4 dyld`start + 520
Hi,
I'm having some issues splitting strings with Regexps in mruby-onig-regexp
The following:
"<%= 1 + 1 %>".split(/(<%=)|(%>)/)
doesn't split and returns ["<%= 1 + 1 %>"]. The same expression splits fine in MRI. I checked that mruby-onig-regexp substitutes String#split with a Regexp-aware implementation, so either I'm doing something seriously wrong or maybe there's a bug in mruby-onig-regexp
thanks,
Ricardo
I found a bug (strange error raised):
$ ./bin/mruby -e 'p "\xf0".gsub(/[^a]/,"X")'
trace:
[0] -e:1
-e:1: string size too big (ArgumentError)
When using onig_regexp_gsub and OnigRegexp#new, same error is raised.
$ ./bin/mruby -e 'p "\xf0".onig_regexp_gsub(OnigRegexp.new("[^a]"),"X")'
trace:
[0] -e:1
-e:1: string size too big (ArgumentError)
I used 'mingw-w64-x86_64-onigmo-6.2.0-1-any.pkg.tar '.
An error on the way.
./libtool: line 1727: lib: command not found
make[2]: *** [libonigmo.la] Error 127
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
rake aborted!
make -j1 failed
./mruby/build/mrbgems/mruby-onig-regexp/mrbgem.rake:47:in `run_command'
./mruby/build/mrbgems/mruby-onig-regexp/mrbgem.rake:70:in `block (2 levels) in bundle_onigmo'
./mruby/build/mrbgems/mruby-onig-regexp/mrbgem.rake:56:in `chdir'
./mruby/build/mrbgems/mruby-onig-regexp/mrbgem.rake:56:in `block in bundle_onigmo'
Here is my environment.
Hardware Overview:
Model Name: Mac mini
Model Identifier: Macmini8,1
Processor Name: 6-Core Intel Core i7
Processor Speed: 3.2 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 12 MB
Hyper-Threading Technology: Enabled
Memory: 64 GB
System Firmware Version: 1715.60.5.0.0 (iBridge: 19.16.10647.0.0,0)
OS Loader Version: 540.60.2~89
clang -v
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: x86_64-apple-darwin21.2.0
Thread model: posix
Interl MacでWindows64bit用のコンパイルができませんでした。
When I put conf.enable_cxx_api
into my build_config.rb
, building mruby-onig-regexp
fails because it tries to build onigmo with a C++ compiler, but onigmo can not be built with a C++ compiler (e.g., it uses not
as a field name in a struct, but that is a keyword in C++).
I see two ways to fix this: (1) build this mrbgem with CC even if the C++ API is enabled or (2) try to change Onigmo to build with a C++ compiler.
Is there a way to free regex in variable that should not be change through out lifetime? It appear I couldn't get the match working.
Working:
/^[A]{1}[0-9]{6}[A-Z]{1}$/.match?("A123456A".freeze) ? "" : res << {"IC" => "err1"}
Wanted:
ic = Regex.new("/^[A]{1}[0-9]{6}[A-Z]{1}$").freeze
$cat t.rb
"Text\nFoo".gsub(/^/, ' ')
$ ruby t.rb
" Text\n Foo"
$ mruby t.rb
" Text\n ext\n xt\n t\n \n Foo"
[koji@macbookpro:~/work/mruby/mruby]$ mirb
mirb - Embeddable Interactive Ruby Shell
This is a very early version, please test and report errors.
Thanks :)
> Regexp
=> OnigRegexp
> "foobar" =~ /bar/
Segmentation fault: 11
I'm using Mac OSX Moutain Lion(10.8), clang is,
[koji@macbookpro:~/work/mruby/mruby]$ clang --version
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Hi @mattn,
I'm getting at the moment the following error while building:
build/mrbgems/mruby-onig-regexp/src/mruby_orig_regexp.c:10:23: fatal error: onigposix.h: No such file or directory
Does it maybe make sense to integrate oniguruma directly into the GEM? What is your plan about that?
Regards
Daniel
"hello".gsub(//, ".")
# Expect => ".h.e.l.l.o."
# Actual => infinite loop
https://github.com/ruby/ruby/blob/745f4dd5b834a00f1cc201adb72ea3c8c8d4decb/string.c#L5005-L5006
Hi, @mattn
rake all test was failed with mruby
Please see travis test.
https://travis-ci.org/matsumoto-r/mrbgem_test_ci/builds/22588120#L2034-L2040
Fail: $1 to $9
- Assertion[1] Failed: Expected to be equal
Expected: "aaab"
Actual: nil
- Assertion[2] Failed: Expected to be equal
Expected: "b"
Actual: nil
[koji@macbookpro:~/work/mruby/mruby]$ mirb
mirb - Embeddable Interactive Ruby Shell
This is a very early version, please test and report errors.
Thanks :)
> Regexp
=> OnigRegexp
> "foobar" =~ /bar/
Segmentation fault: 11
I'm using Mac OSX Moutain Lion(10.8), clang is,
[koji@macbookpro:~/work/mruby/mruby]$ clang --version
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Every since #52, I've been having problems building this library in my mruby-cli project. It can't seem to find "oniguruma.h". I'm using mruby 1.2.0. Reverting to the previous commit fixes it for me.
CC build/mrbgems/mruby-onig-regexp/src/mruby_onig_regexp.c -> build/host/mrbgems/mruby-onig-regexp/src/mruby_onig_regexp.o
/home/mruby/code/mruby/build/mrbgems/mruby-onig-regexp/src/mruby_onig_regexp.c:44:23: fatal error: oniguruma.h: No such file or directory
#include "oniguruma.h"
Hi.
I'm not very familiar with oniguruma, so please excuse any possible misunderstandings.
Your gem always causes segfaults/aborts for me (even the example) in mruby's gc heap cleanup. Valgrind reports multiple invalid writes in "regcomp" and "regexec". From what I have seen, oniguruma defines the type regex_t
and the above functions, but those same types/symbols are also defined in the posix header "regex.h", and when I trace my program, I see that indeed at eg. "regcomp" the CPU does not jump into libonig but into libc.
Now, I'm sure that the regex_t
type defined in the posix header is bigger than the one in "onigposix.h", so what I think happens is that the gem allocates a too small regex_t
and regcomp
/regexec
defined in libc access and write to unallocated memory.
This is how regex_t
looks in onigposix.h:
typedef struct {
void* onig; /* Oniguruma regex_t* */
size_t re_nsub;
int comp_options;
} regex_t;
And this is how it looks in the posix header regex.h:
struct re_pattern_buffer
{
unsigned char *__REPB_PREFIX(buffer);
unsigned long int __REPB_PREFIX(allocated);
unsigned long int __REPB_PREFIX(used);
reg_syntax_t __REPB_PREFIX(syntax);
char *__REPB_PREFIX(fastmap);
size_t re_nsub;
unsigned __REPB_PREFIX(can_be_null) : 1;
unsigned __REPB_PREFIX(regs_allocated) : 2;
unsigned __REPB_PREFIX(fastmap_accurate) : 1;
unsigned __REPB_PREFIX(no_sub) : 1;
unsigned __REPB_PREFIX(not_bol) : 1;
unsigned __REPB_PREFIX(not_eol) : 1;
unsigned __REPB_PREFIX(newline_anchor) : 1;
};
typedef struct re_pattern_buffer regex_t;
I am using oniguruma 5.9.4
Edit: I just tried it out. In onig_regexp_init
:
reg = malloc(sizeof(struct mrb_onig_regexp));
this allocates 32 bytes.
But if I test sizeof(regex_t) with "regex.h" posix header, I get 64 bytes size.
(I am on a 64 bit Linux system btw.)
Just discovered a bug when used in comparing against regex?
Did not match, should be match
if /jpg/.match?(env["HTTP_ACCEPT"])
Working, match as expected
if /jpg/.match(env["HTTP_ACCEPT"])
Some method in mruby, like String#slice
should be able to use OnigRegexp
. For example,
# in MRI
>> "123hello456".slice(/[a-z]+/) #=> "hello"
# in mruby
> "123hello456".slice(/[a-z]+/)
(mirb):1: Regexp class not implemented (NotImplementedError)
> "123hello456".slice(OnigRegexp.new("/[a-z]+/"))
(mirb):2: Regexp class not implemented (NotImplementedError)
It seems they are using mrb_regexp_check.
Any ideas?
Would it be possible to support ranges?
For example, in URI we can split the parts with destructuring assignment:
scheme, host, port, path, query = $~[1..-1]
However, this will result in invalid MatchData index type: 1..-1
in mruby.
According to ISO spec, if second argument of Regexp#initialize
is other than Integer
, nil
and false
, flag should be REG_ICASE
.
Therefore, Regexp.new("abc", "im")
should be equal to Regexp.new("abc", Regexp::IGNORECASE)
not to Regexp.new("abc", Regexp::IGNORECASE | Regexp::MULTILINE)
.
CRuby's Example
a = Regexp.new("ab.*END", "im")
p a.match("ab end\n end")[0]
# => "ab end" ('.' does not match "\n").
Enable MRB_UTF8_STRING
on mrbconf.h
"あいうえお".split("")
#=> ["あ", "い", "う", "え", "お"]
"あいうえお".onig_regexp_split(OnigRegexp.new(""))
#=> Assertion failed: (((((const mrb_value*)((struct RArray*)((result).value.p))->ptr)[i - 1]).tt == MRB_TT_STRING)), function string_split, file /Users/ksss/src/github.com/ksss/mruby-onig-regexp/src/mruby_onig_regexp.c, line 857.
# Expect => ["あ", "い", "う", "え", "お"]
See #64
I'm trying to cross compile this gem. Unfortunately, it's not success. Do you have some steps to do that? Thanks
ref #90
I want to discuss about group name.
mruby-onig-regexp can use non-word char in group name when linked by oniguruma.
But it can't use when linked by onigmo.
link with | oniguruma.so | onigmo.so | bundled onigmo |
---|---|---|---|
non-word | OK | NG | NG |
So, mgem can't use non-word char in group name because it depends on mruby-onig-regexp. (e.g. mruby-uri)
CRuby can use non-word char in group name.
$ ruby -e 'p Regexp.new("bad(?<aa-bb>.*)").match("badboy")["aa-bb"]'
"boy"
I investigated the background.
'-'
) in group name.I think, we have some solutions.
ONIG_OPTION_ALLOW_NON_WORD_CHAR_IN_CAPTURE_GROUP
.diff --git a/mrbgem.rake b/mrbgem.rake
index 0d37b12..1404353 100644
--- a/mrbgem.rake
+++ b/mrbgem.rake
@@ -66,6 +66,7 @@ MRuby::Gem::Specification.new('mruby-onig-regexp') do |spec|
_pp 'autotools', oniguruma_dir
run_command e, './autogen.sh' if File.exists? 'autogen.sh'
run_command e, "./configure --disable-shared --enable-static #{host}"
+ run_command e, "patch -p1 < #{dir}/onigmo-#{version}.patch"
run_command e, "make -j#{$rake_jobs || 1}"
else
run_command e, 'cmd /c "copy /Y win32 > NUL"'
diff --git a/onigmo-6.1.3.patch b/onigmo-6.1.3.patch
new file mode 100644
index 0000000..4163913
--- /dev/null
+++ b/onigmo-6.1.3.patch
@@ -0,0 +1,13 @@
+diff --git a/regparse.c b/regparse.c
+index 431aad9..54563ac 100644
+--- a/regparse.c
++++ b/regparse.c
+@@ -2509,7 +2509,7 @@ get_name_end_code_point(OnigCodePoint start)
+ # ifdef RUBY
+ # define ONIGENC_IS_CODE_NAME(enc, c) TRUE
+ # else
+-# define ONIGENC_IS_CODE_NAME(enc, c) ONIGENC_IS_CODE_WORD(enc, c)
++# define ONIGENC_IS_CODE_NAME(enc, c) TRUE
+ # endif
+
+ # ifdef USE_BACKREF_WITH_LEVEL
I think mruby's regexp should support non-word char in group name.
Because I think it is better that the specification of mruby as close as possible to CRuby.
But I have to fork mruby-onig-regexp to resolve the problem perfectly.
How do you think?
メールにて不具合報告
■ テストコード
'abc'.slice!(/./m)
■ エラー内容 (mruby 1.4.0)
trace (most recent call last):
[0] test.rb:1
[1] c:\mruby\mrbgems\mruby-string-ext\mrblib\string.rb:207:in slice!
c:\mruby\mrbgems\mruby-string-ext\mrblib\string.rb:207: undefined method '<' (NoMethodError)
■ 呼び出し元 onig_regexp.rb
def slice!(*args)
if args.size < 2
result = slice(*args) ### ←この先でエラー
■ 該当コード string.rb (mruby 1.4.0)
def slice!(arg1, arg2=nil)
raise FrozenError, "can't modify frozen String" if frozen?
raise "wrong number of arguments (for 1..2)" if arg1.nil? && arg2.nil?
if !arg1.nil? && !arg2.nil?
idx = arg1
idx += self.size if arg1 < 0 #### ←ここでエラー
■ 該当コード string.rb (mruby 2.0.1)
def slice!(arg1, arg2=nil)
(中略)
if arg1.kind_of?(Range)
(略)
elsif arg1.kind_of?(String)
validated = true
else
idx = arg1
idx += self.size if arg1 < 0 #### ←ここでエラー
mrubyもrubyも最近始めたばかりで、私はド素人なのですが
本家のString.slice!がarg1を文字列か数値でしか判断していないので、
arg1がRegexpだった場合、それを呼ばずに別の処理が必要ではないでしょうか。
Compiling using Visual Studio 16.0, throws the following unresolved external symbols:
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_free referenced in function onig_regexp_free
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_search referenced in function onig_match_common
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_new referenced in function create_onig_region
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_free referenced in function match_data_free
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_copy referenced in function match_data_copy
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_name_to_backref_number referenced in function append_replace_str
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_get_encoding referenced in function onig_regexp_inspect
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_get_options referenced in function onig_regexp_casefold_p
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_version referenced in function onig_regexp_version
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigEncodingASCII
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigEncodingUTF_8
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigSyntaxRuby
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigDefaultSyntax
mrbtest.exe : fatal error LNK1120: 15 unresolved externals
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_error_code_to_str referenced in function onig_match_common
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_new referenced in function onig_regexp_initialize
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_free referenced in function onig_regexp_free
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_search referenced in function onig_match_common
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_new referenced in function create_onig_region
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_free referenced in function match_data_free
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_region_copy referenced in function match_data_copy
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_name_to_backref_number referenced in function append_replace_str
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_get_encoding referenced in function onig_regexp_inspect
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_get_options referenced in function onig_regexp_casefold_p
libmruby.lib(mruby_onig_regexp.obj) : error LNK2019: unresolved external symbol onig_version referenced in function onig_regexp_version
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigEncodingASCII
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigEncodingUTF_8
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigSyntaxRuby
libmruby.lib(mruby_onig_regexp.obj) : error LNK2001: unresolved external symbol OnigDefaultSyntax
Visual studio version: 16 (VS 2019)
Arch: x86_64
Mruby version: 2.1.2
Build config:
MRuby::Build.new do |conf|
if ENV['VisualStudioVersion'] || ENV['VSINSTALLDIR']
conf.toolchain :visualcpp
else
conf.toolchain :gcc
end
conf.gembox 'default'
conf.gem github: 'mattn/mruby-onig-regexp'
conf.enable_test
if ENV['DEBUG'] == 'true'
conf.enable_debug
conf.cc.defines = %w[MRB_ENABLE_DEBUG_HOOK]
conf.gem core: 'mruby-bin-debugger'
end
"".slice(/^.*$/)
In ruby this returns empty string (""
), but in mruby nil
is returned.
oniguruma name
https://github.com/kkos/oniguruma/blob/master/README.ja
My build machine has libonig installed for reasons unrelated to mruby. When I build my project, it links against that version instead of the one that comes with the gem. Since the installed libonig is a shared library and the project is intended to produce a distributable binary, this is a problem.
I would like a way to force the gem to always use its own version of the library even if there's a version present on the system.
I've coded the change and will submit a pull request.
How can I integrate this mrbgem to run in IOS and Android device? I can compiled all code without errors. But when I execute the code for test on my Macbook, the error "ld: symbol(s) not found for architecture x86_64" occurs. Thanks!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.