ohler55 / ox Goto Github PK
View Code? Open in Web Editor NEWRuby Optimized XML Parser
Home Page: http://www.ohler.com/ox
License: MIT License
Ruby Optimized XML Parser
Home Page: http://www.ohler.com/ox
License: MIT License
This is just an idea/suggestion. I was thinking more about what I wrote in #40, and the following suggestion is a bit in conflict with the suggested API change.
What if Ox::Element
has extra methods that provide filters on its node. There is already one filter method: #text
. It would be nice to also have #elements
, #comments
, etc.
that just provides something like:
def elements
nodes.select { |node| node.is_a? Ox::Element }
end
We could then have:
doc = Ox.load("<foo><!-- nice comment --><bar/>some text</foo>")
#=> #<Ox::Element:0x00000001975aa8 @value="foo", @nodes=[#<Ox::Comment:0x00000001975a30 @value="nice comment">, #<Ox::Element:0x00000001975828 @value="bar">, "some text"]>
doc.text
#=> "some text"
doc.elements
#=> [#<Ox::Element:0x00000001975828 @value="bar">]
doc.comments
#=> [<Ox::Comment:0x00000001975a30 @value="nice comment">]
etcetera.
Clearly this isn't valid XML... but Ox shouldn't segfault either.
mcarpenter@ubuntu:/tmp$ uname -a
Linux ubuntu 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
mcarpenter@ubuntu:/tmp$ ruby --version
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]
mcarpenter@ubuntu:/tmp$ gem list ox
*** LOCAL GEMS ***
ox (1.8.0)
mcarpenter@ubuntu:/tmp$ irb
ruby-1.9.2-p180 :001 > require 'ox'
=> true
ruby-1.9.2-p180 :002 > Ox.parse('<foo></foo>')
=> #<Ox::Element:0x000000019184e8 @value="foo", @nodes=[]>
ruby-1.9.2-p180 :003 > Ox.parse('<foo></foo><foo></foo>')
(irb):3: [BUG] Segmentation fault
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]
-- control frame ----------
c:0024 p:---- s:0086 b:0086 l:000085 d:000085 CFUNC :parse
c:0023 p:0017 s:0082 b:0082 l:000a98 d:000081 EVAL (irb):3
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:000eb8 d:000057 BLOCK /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:000eb8 d:000044 BLOCK /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:000eb8 d:000eb8 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:000af8 d:000018 BLOCK /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC :catch
c:0004 p:0183 s:0011 b:0011 l:000af8 d:000af8 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0142 s:0006 b:0006 l:000ec8 d:000318 EVAL /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/bin/irb:16
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000ec8 d:000ec8 TOP
---------------------------
-- Ruby level backtrace information ----------------------------------------
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/bin/irb:16:in `<main>'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69:in `start'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69:in `catch'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:70:in `block in start'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:155:in `eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `catch'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `block in each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `loop'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in `block (2 levels) in each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:156:in `block in eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:273:in `signal_status'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:159:in `block (2 levels) in eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/context.rb:254:in `evaluate'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80:in `evaluate'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80:in `eval'
(irb):3:in `irb_binding'
(irb):3:in `parse'
-- C level backtrace information -------------------------------------------
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_vm_bugreport+0x61) [0x7f9452562101]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x5f24e) [0x7f945244c24e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_bug+0xa5) [0x7f945244d075]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x10b874) [0x7f94524f8874]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f94520644a0]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x300c6) [0x7f945241d0c6]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0x14041) [0x7f9450176041]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0x8f88) [0x7f945016af88]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(ox_parse+0x133) [0x7f945016bd33]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0xee23) [0x7f9450170e23]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16ac53) [0x7f9452557c53]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_f_eval+0xbf) [0x7f94525580ff]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16eebf) [0x7f945255bebf]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_rescue2+0x16b) [0x7f94524535bb]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16134e) [0x7f945254e34e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16f82e) [0x7f945255c82e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_catch_obj+0xc6) [0x7f945254fa16]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x162ace) [0x7f945254face]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16f82e) [0x7f945255c82e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_catch_obj+0xc6) [0x7f945254fa16]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x162ace) [0x7f945254face]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_iseq_eval_main+0xb1) [0x7f945255d631]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x65292) [0x7f9452452292]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(ruby_exec_node+0x1d) [0x7f945245314d]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(ruby_run_node+0x1e) [0x7f945245540e]
irb(main+0x4b) [0x40082b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f945204f76d]
irb() [0x400859]
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Aborted (core dumped)
mcarpenter@ubuntu:/tmp$
mcarpenter@ubuntu:/tmp$ gdb `which ruby` core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /var/cache/ruby-rvm/rubies/ruby-1.9.2-p180/bin/ruby...done.
[New LWP 3688]
[New LWP 3689]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `irb '.
Program terminated with signal 6, Aborted.
#0 0x00007f9452064425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0 0x00007f9452064425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f9452067b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f945244d07a in rb_bug (fmt=0x7f945258bc68 "Segmentation fault") at error.c:253
#3 0x00007f94524f8874 in sigsegv (sig=<optimized out>, info=<optimized out>, ctx=<optimized out>) at signal.c:613
#4 <signal handler called>
#5 rb_ary_push_1 (ary=1, item=26341040) at array.c:728
#6 0x00007f9450176041 in add_element (pi=0x7ffff0e83400, ename=<optimized out>, attrs=<optimized out>, hasChildren=1) at gen_load.c:329
#7 0x00007f945016af88 in read_element (pi=0x7ffff0e83400) at parse.c:388
#8 0x00007f945016bd33 in ox_parse (xml=<optimized out>, pcb=<optimized out>, endp=0x0, options=<optimized out>) at parse.c:160
#9 0x00007f9450170e23 in to_gen (self=<optimized out>, ruby_xml=26341240) at ox.c:413
#10 0x00007f945255b5a6 in vm_call_cfunc (me=0x15cc910, blockptr=0x0, recv=19428600, num=1, reg_cfp=0x7f9452a13828, th=<optimized out>)
at vm_insnhelper.c:402
#11 vm_call_method (th=<optimized out>, cfp=0x7f9452a13828, num=<optimized out>, blockptr=0x0, flag=<optimized out>, id=<optimized out>, me=0x15cc910,
recv=19428600) at vm_insnhelper.c:524
#12 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#13 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#14 0x00007f9452557c53 in eval_string_with_cref (self=19615680, src=26313040, scope=19613560, cref=0x0, file=0x12b3ce8 "(irb)", line=3) at vm_eval.c:1028
#15 0x00007f94525580ff in eval_string (line=<optimized out>, file=<optimized out>, scope=<optimized out>, src=<optimized out>, self=19615680)
at vm_eval.c:1070
#16 rb_f_eval (argc=4, argv=<optimized out>, self=19615680) at vm_eval.c:1118
#17 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f5cc0, blockptr=0x0, recv=19615680, num=4, reg_cfp=0x7f9452a13930, th=<optimized out>)
at vm_insnhelper.c:402
#18 vm_call_method (th=<optimized out>, cfp=0x7f9452a13930, num=<optimized out>, blockptr=0x0, flag=<optimized out>, id=<optimized out>, me=0x12f5cc0,
recv=19615680) at vm_insnhelper.c:524
#19 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#20 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#21 0x00007f945255bebf in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x0, argc=0, self=<optimized out>, block=<optimized out>, th=<optimized out>)
at vm.c:558
#22 vm_yield (th=<optimized out>, argv=0x0, argc=0) at vm.c:588
#23 rb_yield_0 (argv=0x0, argc=0) at vm_eval.c:740
#24 loop_i () at vm_eval.c:798
#25 0x00007f94524535bb in rb_rescue2 (b_proc=0x7f945255bbe0 <loop_i>, data1=0, r_proc=0, data2=0) at eval.c:646
#26 0x00007f945254e34e in rb_f_loop (self=19593640) at vm_eval.c:826
#27 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f6740, blockptr=0x7f9452a13c18, recv=19593640, num=0, reg_cfp=0x7f9452a13bf0, th=<optimized out>)
at vm_insnhelper.c:402
#28 vm_call_method (th=<optimized out>, cfp=0x7f9452a13bf0, num=<optimized out>, blockptr=0x7f9452a13c18, flag=<optimized out>, id=<optimized out>,
me=0x12f6740, recv=19593640) at vm_insnhelper.c:524
#29 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#30 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#31 0x00007f945255c82e in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x7ffff0e8a3c8, argc=1, self=19593640, block=<optimized out>, th=0x126d190)
at vm.c:558
#32 vm_yield (th=0x126d190, argv=0x7ffff0e8a3c8, argc=1) at vm.c:588
#33 rb_yield_0 (argv=0x7ffff0e8a3c8, argc=1) at vm_eval.c:740
#34 catch_i (tag=4052238, data=<optimized out>) at vm_eval.c:1458
#35 0x00007f945254fa16 in rb_catch_obj (tag=4052238, func=0x7f945255c570 <catch_i>, data=0) at vm_eval.c:1533
#36 0x00007f945254face in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1509
#37 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f63c0, blockptr=0x7f9452a13d20, recv=19593640, num=1, reg_cfp=0x7f9452a13cf8, th=<optimized out>)
at vm_insnhelper.c:402
#38 vm_call_method (th=<optimized out>, cfp=0x7f9452a13cf8, num=<optimized out>, blockptr=0x7f9452a13d20, flag=<optimized out>, id=<optimized out>,
me=0x12f63c0, recv=19593640) at vm_insnhelper.c:524
#39 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#40 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#41 0x00007f945255c82e in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x7ffff0e8a888, argc=1, self=19546480, block=<optimized out>, th=0x126d190)
at vm.c:558
#42 vm_yield (th=0x126d190, argv=0x7ffff0e8a888, argc=1) at vm.c:588
#43 rb_yield_0 (argv=0x7ffff0e8a888, argc=1) at vm_eval.c:740
#44 catch_i (tag=3218702, data=<optimized out>) at vm_eval.c:1458
#45 0x00007f945254fa16 in rb_catch_obj (tag=3218702, func=0x7f945255c570 <catch_i>, data=0) at vm_eval.c:1533
#46 0x00007f945254face in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1509
#47 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f63c0, blockptr=0x7f9452a13ed8, recv=19546480, num=1, reg_cfp=0x7f9452a13eb0, th=<optimized out>)
at vm_insnhelper.c:402
#48 vm_call_method (th=<optimized out>, cfp=0x7f9452a13eb0, num=<optimized out>, blockptr=0x7f9452a13ed8, flag=<optimized out>, id=<optimized out>,
me=0x12f63c0, recv=19546480) at vm_insnhelper.c:524
#49 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#50 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#51 0x00007f945255d631 in rb_iseq_eval_main (iseqval=19535800) at vm.c:1388
#52 0x00007f9452452292 in ruby_exec_internal (n=0x12a17b8) at eval.c:214
#53 0x00007f945245314d in ruby_exec_node (n=0x12a17b8) at eval.c:261
#54 0x00007f945245540e in ruby_run_node (n=0x12a17b8) at eval.c:254
#55 0x000000000040082b in main (argc=2, argv=0x7ffff0e8afb8) at main.c:35
(gdb) quit
mcarpenter@ubuntu:/tmp$
Not critical but just noting that Ox fails to install as a macruby gem, for example:
$ macgem install ox
Fetching: ox-1.9.4.gem (100%)
Building native extensions. This could take a while...
ERROR: Error installing ox:
ERROR: Failed to build gem native extension.
/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/bin/macruby extconf.rb
>>>>> Creating Makefile for MacRuby version 1.9.2 on universal-darwin10.0 <<<<<
creating Makefile
make
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o base64.o -c base64.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache.o -c cache.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache8.o -c cache8.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache8_test.o -c cache8_test.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache_test.o -c cache_test.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I. -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o dump.o -c dump.c
In file included from dump.c:39:
ox.h:190: error: expected specifier-qualifier-list before ‘rb_encoding’
dump.c: In function ‘dump_obj’:
dump.c:598: warning: initialization discards qualifiers from pointer target type
dump.c:838: warning: initialization discards qualifiers from pointer target type
dump.c: In function ‘dump_gen_nodes’:
dump.c:1108: warning: initialization discards qualifiers from pointer target type
make: *** [dump.o] Error 1
Gem files will remain installed in /Library/Ruby/Gems/MacRuby/0.12/gems/ox-1.9.4 for inspection.
Results logged to /Library/Ruby/Gems/MacRuby/0.12/gems/ox-1.9.4/ext/ox/gem_make.out
I posted this in another issue but realized it was closed, and because it may not be related figured I'd open a clean one up.
I've found sporadic seg faults occurring ever since I switched from a Nokogiri to Ox Sax Parser. I'm running Ruby 1.9.3-p125
and have found the same issue with Ox 1.9.2
, 1.9.3
, and 2.0.0
.
I'm parsing XML within a Rails app and when I made the switch over to Ox these segfaults started occurring, but never in the same place. Sometimes they would happen in ActiveSupport, other times elsewhere in the app. I dug into the .crash file for each seg fault however and found that in every instance the thread ends with the following trace:
0 libsystem_kernel.dylib 0x00007fff92ab0ce2 __pthread_kill + 10
1 libsystem_c.dylib 0x00007fff972447d2 pthread_kill + 95
2 libsystem_c.dylib 0x00007fff97235a7a abort + 143
3 ruby 0x000000010bec6ed4 rb_bug + 212
4 ruby 0x000000010bf8f62f sigsegv + 127
5 libsystem_c.dylib 0x00007fff97296cfa _sigtramp + 26
6 ruby 0x000000010bee3bd9 gc_marks + 345
7 ruby 0x000000010bee40bd garbage_collect + 253
8 ruby 0x000000010bee4796 vm_xmalloc + 150
So Ruby's GC hits a bad spot in memory eventually if Ox is used I guess?. I'm wondering if there are some memory management conflicts between the native Ox extension and Ruby?
Hi Pete,
Thanks for your constant support with Ox. We have found a potential issue on 32-bit Intel machines. Here is the environment and test case:
Environment:
Test case:
$ irb -r ox
ruby-1.9.2-p180 :001 > Ox
=> Ox
ruby-1.9.2-p180 :002 > t = Time.now
=> 2011-10-03 16:10:20 +0900
ruby-1.9.2-p180 :003 > x = Ox.dump t
=> "1317625820.584365\n"
ruby-1.9.2-p180 :004 > Ox.parse_obj x
=> 1943-09-15 12:56:12 +0900
Martin & Eric
I'm running ruby 1.9.3p194 (2012-04-20) [i386-mingw32] from rubyinstaller.org with the DevKit on Windows 7 64 bit.
When I try to install the ox gem (gem install ox), the build of the native extension fails:
C:\Users\Thomas>gem install ox
Temporarily enhancing PATH to include DevKit...
Building native extensions. This could take a while...
ERROR: Error installing ox:
ERROR: Failed to build gem native extension.
D:/Ruby/Ruby193/bin/ruby.exe extconf.rb
Creating Makefile for ruby version 1.9.3 <<<<<
creating Makefile
make
generating ox-i386-mingw32.def
compiling base64.c
compiling cache.c
cache.c: In function 'ox_cache_new':
cache.c:62:5: warning: implicit declaration of function 'bzero'
cache.c:62:5: warning: incompatible implicit declaration of built-in function 'b
zero'
compiling cache8.c
compiling cache8_test.c
cache8_test.c:35:5: warning: large integer implicitly truncated to unsigned type
cache8_test.c:37:5: warning: large integer implicitly truncated to unsigned type
compiling cache_test.c
compiling dump.c
dump.c: In function 'dump_time_xsd':
dump.c:507:15: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:509:26: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:510:25: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:512:26: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:513:25: error: 'struct tm' has no member named 'tm_gmtoff'
make: *** [dump.o] Error 1
Gem files will remain installed in D:/Ruby/Ruby193/lib/ruby/gems/1.9.1/gems/ox-1
.5.9 for inspection.
Results logged to D:/Ruby/Ruby193/lib/ruby/gems/1.9.1/gems/ox-1.5.9/ext/ox/gem_m
ake.out
Is there something wrong with my build environment? Other gems' extensions build fine.
Full details here: https://gist.github.com/1421798
Hi Peter!
It seems that SAX-parser has some bugs since 1.8.7 version.
Here is the test script:
require 'ox'
class Sample < ::Ox::Sax
def start_element(name); puts "start: #{name}"; end
def end_element(name); puts "end: #{name}"; end
def attr(name, value); puts " #{name} => #{value}"; end
def text(value); puts "text #{value}"; end
end
handler = Sample.new()
Ox.sax_parse(handler, ARGF)
And that's the way you can reproduce the bug:
$ wget "http://www.benzocenter.ru/yam/market.xml"
$ ruby -Ilib test.rb < market.xml
With ox-1.8.7 I get:
test.rb:11:in `sax_parse': invalid format, element start and end names do not match at line 4221, column 10 (Ox::ParseError)
from test.rb:11:in `<main>'
With ox-1.8.8 I get:
ruby 1.9.3p392 (2013-02-22 revision 39386) [i686-linux]
-- Control frame information -----------------------------------------------
c:0004 p:---- s:0012 b:0012 l:000011 d:000011 CFUNC :sax_parse
c:0003 p:0076 s:0007 b:0007 l:001a04 d:002588 EVAL test.rb:11
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001a04 d:001a04 TOP
-- Ruby level backtrace information ----------------------------------------
test.rb:11:in `<main>'
test.rb:11:in `sax_parse'
-- C level backtrace information -------------------------------------------
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x1814da) [0xb765e4da] vm_dump.c:796
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x52ae3) [0xb752fae3] error.c:258
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(rb_bug+0x44) [0xb75307d4] error.c:277
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x11064c) [0xb75ed64c] signal.c:609
[0xb771740c]
/lib/i386-linux-gnu/libc.so.6(+0x13eafb) [0xb745bafb] time.c:198
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe053) [0xb6d9b053] sax.c:796
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(ox_sax_parse+0x2f4) [0xb6d9c4c4] sax.c:257
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0x9a53) [0xb6d96a53] ox.c:640
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x16c0d5) [0xb76490d5] vm_insnhelper.c:317
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x17ad97) [0xb7657d97] vm_insnhelper.c:404
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x1714ef) [0xb764e4ef] insns.def:1018
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x176b1c) [0xb7653b1c] vm.c:1236
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(rb_iseq_eval_main+0xb5) [0xb7659755] vm.c:1478
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x567d4) [0xb75337d4] eval.c:204
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(ruby_exec_node+0x24) [0xb7534644] eval.c:251
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(ruby_run_node+0x36) [0xb7536616] eval.c:244
ruby() [0x8048658]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb73364d3] enumerator.c:162
ruby() [0x8048681]
With ox-1.8.9 I get:
test.rb:7: [BUG] Segmentation fault
ruby 1.9.3p392 (2013-02-22 revision 39386) [i686-linux]
-- Control frame information -----------------------------------------------
c:0006 p:0012 s:0021 b:0018 l:000017 d:000017 METHOD test.rb:7
c:0005 p:---- s:0014 b:0014 l:000013 d:000013 FINISH
c:0004 p:---- s:0012 b:0012 l:000011 d:000011 CFUNC :sax_parse
c:0003 p:0076 s:0007 b:0007 l:001504 d:0016b0 EVAL test.rb:11
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001504 d:001504 TOP
-- Ruby level backtrace information ----------------------------------------
test.rb:11:in `<main>'
test.rb:11:in `sax_parse'
test.rb:7:in `text'
I'm on Ubuntu-12.04, ruby-1.9.3p392
I'm frequently getting errors as listed below when trying to use locate
with a newly added Element
. For example:
require 'ox'
include Ox
doc = Document.new
elem = Element.new('Element')
doc.locate('Element')
# =>
/Users/josh/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/ox-1.8.0/lib/ox/element.rb:181:in `alocate': private method `select' called for nil:NilClass (NoMethodError)
from /Users/josh/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/ox-1.8.0/lib/ox/element.rb:124:in `locate'
from /Users/josh/Desktop/temp.rb:8:in `<main>'
If Elements
were initialized with @nodes = []
instead of @nodes = nil
this wouldn't be a problem, is there a reason they're initialized this way?
is there a way for Ox.dump() to output the xml as a one-liner without any indentation/newline ?
I could not find anything so I ended relying on the following but I feel dirty doing things like that xD
Ox.dump(xml, indent: 0).gsub("\n", "")
I am using Ruby 1.9.3p194 on x86_64 and I have an XML file that crashes Ruby when using SAX parsing:
% ruby -rox -e 'File.open("broken.xml", "r") { |io| Ox.sax_parse(Ox::Sax.new, io) }'
-e:1: [BUG] Segmentation fault
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0007 p:---- s:0021 b:0021 l:000020 d:000020 CFUNC :sax_parse
c:0006 p:0033 s:0016 b:0016 l:002558 d:000015 BLOCK -e:1
c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC :open
c:0003 p:0019 s:0006 b:0006 l:002558 d:002318 EVAL -e:1
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:002558 d:002558 TOP
-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `open'
-e:1:in `block in <main>'
-e:1:in `sax_parse'
-- C level backtrace information -------------------------------------------
/usr/lib/libruby-1.9.1.so.1.9(+0x158379) [0x2b29b4b32379]
/usr/lib/libruby-1.9.1.so.1.9(+0x5a4d9) [0x2b29b4a344d9]
/usr/lib/libruby-1.9.1.so.1.9(rb_bug+0xb3) [0x2b29b4a34cc3]
/usr/lib/libruby-1.9.1.so.1.9(+0xf922f) [0x2b29b4ad322f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030) [0x2b29b4dff030]
/lib/x86_64-linux-gnu/libc.so.6(+0x1120e6) [0x2b29b59e50e6]
/usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so(+0x98a5) [0x2b29b61748a5]
/usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so(+0x9887) [0x2b29b6174887]
-- Other runtime information -----------------------------------------------
* Loaded script: -e
* Loaded features:
0 enumerator.so
1 /usr/lib/ruby/1.9.1/x86_64-linux/enc/encdb.so
2 /usr/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
3 /usr/lib/ruby/1.9.1/rubygems/defaults.rb
4 /usr/lib/ruby/1.9.1/x86_64-linux/rbconfig.rb
5 /usr/lib/ruby/1.9.1/rubygems/deprecate.rb
6 /usr/lib/ruby/1.9.1/rubygems/exceptions.rb
7 /usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb
8 /usr/lib/ruby/1.9.1/rubygems/custom_require.rb
9 /usr/lib/ruby/1.9.1/rubygems.rb
10 /usr/lib/ruby/vendor_ruby/ox/version.rb
11 /usr/lib/ruby/vendor_ruby/ox/error.rb
12 /usr/lib/ruby/vendor_ruby/ox/hasattrs.rb
13 /usr/lib/ruby/vendor_ruby/ox/node.rb
14 /usr/lib/ruby/vendor_ruby/ox/comment.rb
15 /usr/lib/ruby/vendor_ruby/ox/instruct.rb
16 /usr/lib/ruby/vendor_ruby/ox/cdata.rb
17 /usr/lib/ruby/vendor_ruby/ox/doctype.rb
18 /usr/lib/ruby/vendor_ruby/ox/element.rb
19 /usr/lib/ruby/vendor_ruby/ox/document.rb
20 /usr/lib/ruby/vendor_ruby/ox/bag.rb
21 /usr/lib/ruby/vendor_ruby/ox/sax.rb
22 /usr/lib/ruby/1.9.1/x86_64-linux/date_core.so
23 /usr/lib/ruby/1.9.1/date/format.rb
24 /usr/lib/ruby/1.9.1/date.rb
25 /usr/lib/ruby/1.9.1/time.rb
26 /usr/lib/ruby/1.9.1/x86_64-linux/stringio.so
27 /usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so
28 /usr/lib/ruby/vendor_ruby/ox.rb
29 /usr/lib/ruby/1.9.1/x86_64-linux/enc/iso_8859_1.so
The XML file broken.xml
contains the following XML:
<?xml version="1.0" encoding="Windows-1252" standalone="yes" ?>
<AVXML>
<SIGNONMSGRS>
<DTSERVER>2013-02-21T12:13:21</DTSERVER>
<APPID>ACCOUNTVIEW</APPID>
<APPVER>0901-</APPVER>
</SIGNONMSGRS>
<ERRORS>
<ERROR>
<NUMBER>10000</NUMBER>
<DATE>2013-02-21T12:13:21</DATE>
<MESSAGE>Bericht mag maximaal 15.000.000 tekens bevatten. </MESSAGE>
</ERROR>
</ERRORS>
</AVXML>
Other XML files that are produced by this application seem to be handled fine, so there is something specific going on here that I cannot see.
P.S. I am so sorry for filing 4 tickets in one day :)
Ruby has no way to garbage-collect unused symbols. That means that if input is provided (from a user, say) that has a huge number of attributes, it's a big memory leak.
For JSON parsers, there is usually an option to have the keys be strings (which can be GC'd) instead.
Ox should have that, or some other way to prevent the memory leak on suspect input.
I am getting the following errors after installing Ox 1.5.5 on Mac OS X (ruby 1.9.2)
dyld: lazy symbol binding failed: Symbol not found: _stpncpy
Referenced from: /Users/tom/.rvm/gems/ruby-1.9.2-p290@multi_xml/gems/ox-1.5.5/ext/ox/ox.bundle
Expected in: flat namespace
dyld: Symbol not found: _stpncpy
Referenced from: /Users/tom/.rvm/gems/ruby-1.9.2-p290@multi_xml/gems/ox-1.5.5/ext/ox/ox.bundle
Expected in: flat namespace
Not sure what is going on here, the gem installed without errors. Any ideas?
I can't upgrade to the newest version of ox (1.5.5). Getting this build-error:
[...]
Installing ox (1.5.5) with native extensions Unfortunately, a fatal error has occurred. Please report this error to the Bundler issue tracker at https://github.com/carlhuda/bundler/issues so that we can fix it. Thanks!
/home/cjk/.rbenv/versions/1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/installer.rb:552:in `rescue in block in build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError)
/home/cjk/.rbenv/versions/1.9.3-p0/bin/ruby extconf.rb
Creating Makefile for ruby version 1.9.3 <<<<<
creating Makefile
make
compiling cache8.c
compiling obj_load.c
compiling dump.c
compiling parse.c
compiling ox.c
compiling cache8_test.c
compiling cache.c
compiling gen_load.c
compiling sax.c
compiling base64.c
compiling cache_test.c
linking shared-object ox.so
make install
/bin/install -c -m 0755 ox.so /home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5/lib
make: /bin/install: Command not found
make: *** [/home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5/lib/ox.so] Error 127
Gem files will remain installed in /home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5 for inspection.
[...]
Hi Peter!
Today I've tried to run OX under JRuby and got issues with encoding. Then I've run test/sax_test.rb
and found it broken - 5 failing tests.
My ruby is: jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_26) [linux-amd64-java]
is it commented because it does not work ?
I was curious because it seems a nice feature :)
Sorry, two issues in a row. Ox doesn't seem to support processing instructions. I believe they're supposed to be parsed as processing instruction nodes. I was attempting to parse an .icml
file and was getting errors related to these. They are used in .icml
files to denote special characters that aren't supported by xml.
Here's an example:
require 'ox'
xml = <<-END
<root>
<element>Here some text with a <?PITarget PIContent?> processing instruction.</element>
</root>
END
p Ox.parse(xml)
#=> /Users/josh/Desktop/temp.rb:9:in `parse': invalid format, document not
#=> terminated at line 2, column 47 [parse.c:583] (SyntaxError)
#=> from /Users/josh/Desktop/temp.rb:9:in `<main>'
I'm running into a bug with this strange scenario. This is badly written ruby but I figure it shouldn't be throwing an error anyways. The problem appears to be with declaring the para
variable twice. It seems to affect the dump()
function as well.
I was previously having a memory leak related to this same problem, the system would thrash on memory over a period of just a few seconds.
require 'ox'
include Ox
def make_table
para = Element.new('Paragraph')
char = Element.new('Character')
table = Element.new('Table')
para = Element.new('Paragraph')
table << para
char << table
para << char
para
end
puts dump(make_table)
Here's the error it throws:
/Users/jvoigts1/Desktop/temp2.rb:17: [BUG] Bus Error
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin12.2.0]
-- Control frame information -----------------------------------------------
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC :dump
c:0003 p:0063 s:0007 b:0006 l:000168 d:0022a8 EVAL /Users/jvoigts1/Desktop/temp2.rb:17
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000168 d:000168 TOP
-- Ruby level backtrace information ----------------------------------------
/Users/jvoigts1/Desktop/temp2.rb:17:in `<main>'
/Users/jvoigts1/Desktop/temp2.rb:17:in `dump'
-- C level backtrace information -------------------------------------------
See Crash Report log file under ~/Library/Logs/CrashReporter or
/Library/Logs/CrashReporter, for the more detail of.
-- Other runtime information -----------------------------------------------
* Loaded script: /Users/jvoigts1/Desktop/temp2.rb
* Loaded features:
0 enumerator.so
1 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/enc/encdb.bundle
2 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/enc/trans/transdb.bundle
3 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/rbconfig.rb
4 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/compatibility.rb
5 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb
6 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/deprecate.rb
7 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/errors.rb
8 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/version.rb
9 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb
10 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb
11 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb
12 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb
13 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_gem.rb
14 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb
15 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems.rb
16 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/path_support.rb
17 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb
18 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/version.rb
19 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/error.rb
20 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/hasattrs.rb
21 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/node.rb
22 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/comment.rb
23 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/instruct.rb
24 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/cdata.rb
25 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/doctype.rb
26 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/element.rb
27 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/document.rb
28 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/bag.rb
29 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/sax.rb
30 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/date_core.bundle
31 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/date/format.rb
32 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/date.rb
33 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/time.rb
34 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/stringio.bundle
35 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/ext/ox/ox.bundle
36 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox.rb
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Abort trap: 6
It would be fantastic to be able to pass options in when calling .parse
instead of having to define them prior to parsing the document:
For example, including the following option passing methodology:
Ox.parse('<?xml version="1.0"?><foo>bar</foo>', {:symbolize_keys => false})
in addition to the current way options are passed to Ox:
Ox.default_options = Ox.default_options.merge(:symbolize_keys => false)
Thoughts on this? I'll provide code if this seems desirable.
Ox currently doesn't seem to care about (Ruby) encodings. I am not sure where to start this, so here a use case.
Let's create 2 XML documents:
x1 = %(<?xml version="1.0" encoding="ISO-8859-1" ?><tag key="value">Français</tag>).encode("ISO-8859-1")
# => "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?><tag key=\"value\">Fran\xE7ais</tag>"
x1.encoding
# => #<Encoding:ISO-8859-1>
x2 = %(<?xml version="1.0" encoding="UTF-8" ?><tag key="value">Français</tag>)
# => "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><tag key=\"value\">Fran\xC3\xA7ais</tag>"
x2.encoding
# => #<Encoding:UTF-8>
With Ox:
class OH < ::Ox::Sax
def start_element(name)
puts "EL: #{name} (#{name.encoding})"
end
def end_element(name)
end
def attr(key, value)
puts "AT: #{key} => #{value} (#{key.encoding} => #{value.encoding})"
end
def text(value)
puts "TX: #{value} (#{value.encoding})"
end
end
::Ox.sax_parse OH.new, StringIO.new(x1)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
# AT: encoding => ISO-8859-1 (US-ASCII => ISO-8859-1)
# EL: tag (US-ASCII)
# AT: key => value (US-ASCII => ISO-8859-1)
# TX: Fran�ais (ISO-8859-1)
::Ox.sax_parse OH.new, StringIO.new(x2)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
# AT: encoding => UTF-8 (US-ASCII => UTF-8)
# EL: tag (US-ASCII)
# AT: key => value (US-ASCII => UTF-8)
# TX: Français (UTF-8)
Now the same with Nokogiri:
class NH
def self.parse(io)
root = Nokogiri::XML(io).root
puts "EL: #{root.name} (#{root.name.encoding})"
root.attributes.each do |key, value|
puts "AT: #{key} => #{value.value} (#{key.encoding} => #{value.value.encoding})"
end
puts "TX: #{root.text} (#{root.text.encoding})"
end
end
NH.parse StringIO.new(x1)
# => EL: tag (UTF-8)
# AT: key => value (UTF-8 => UTF-8)
# TX: Français (UTF-8)
NH.parse StringIO.new(x2)
# => EL: tag (UTF-8)
# AT: key => value (UTF-8 => UTF-8)
# TX: Français (UTF-8)
As you can see, Nokogiri encodes everything correctly to Encoding.default_external
while Ox's encodings are a little "random".
It gets a lot worse with non-ASCII attributes:
x1 = %(<?xml version="1.0" encoding="ISO-8859-1" ?><tag Português="Español">Français</tag>).encode("ISO-8859-1")
x2 = %(<?xml version="1.0" encoding="UTF-8" ?><tag Português="Español">Français</tag>)
NH.parse StringIO.new(x1)
# => EL: tag (UTF-8)
# AT: Português => Español (UTF-8 => UTF-8)
# TX: Français (UTF-8)
NH.parse StringIO.new(x2)
# Same as above
::Ox.sax_parse OH.new, StringIO.new(x1)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
# AT: encoding => ISO-8859-1 (US-ASCII => ISO-8859-1)
# EL: tag (US-ASCII)
# EncodingError: invalid encoding symbol
Any ideas? Cheers, dim
Ox 1.9.2, ruby 1.9.3 p374 on Cygwin
irb(main):001:0> require 'ox'
=> true
irb(main):002:0> Ox.parse('<?xml version="1.0" encoding="UTF-8"?><a><b><c><d></d></c></b></a>')
SystemStackError: stack level too deep
from /usr/lib/ruby/1.9.1/irb/workspace.rb:80
Maybe IRB bug!
irb(main):003:0>
Looks fine on other platforms, and also fine on Cygwin with only 3-deep tags:
irb(main):001:0> require 'ox'
=> true
irb(main):002:0> Ox.parse('<?xml version="1.0" encoding="UTF-8"?><a><b><c></c></b></a>')
=> #<Ox::Document:0x802f080c @attributes={:version=>"1.0", :encoding=>"UTF-8"}, @nodes=[#<Ox::Element:0x802f071c @value="a", @nodes=[#<Ox::Element:0x802f06e0 @value="b", @nodes=[#<Ox::Element:0x802f06a4 @value="c", @nodes=[]>]>]>]>
irb(main):003:0>
I don't think this is environmental but might be nice to get confirmation from another Cygwin user before going beserk on this one.
I had a lot of trouble with XML documents (returned by a 3rd party) that had node names with dashes in them.
Here's a sample IRB session that demonstrates the bug:
irb(main):001:0> require 'ox'
true
irb(main):002:0> xml = <<-EOS
irb(main):003:0" <?xml version="1.0"?>
irb(main):004:0" <xml-response>
irb(main):005:0" <nodashesnode>hihi</nodashesnode>
irb(main):006:0" <clear-tradeline>
irb(main):007:0" <supplier-tradeline>
irb(main):008:0" <clear-tradeline-reason-code-description>hi</clear-tradeline-reason-code-description>
irb(main):009:0" <some-dashed-node></some-dashed-node>
irb(main):010:0" <nodashesnode></nodashesnode>
irb(main):011:0" </supplier-tradeline>
irb(main):012:0" </clear-tradeline>
irb(main):013:0" </xml-response>
irb(main):014:0" EOS
"<?xml version=\"1.0\"?>\n<xml-response>\n <nodashesnode>hihi</nodashesnode>\n <clear-tradeline>\n <supplier-tradeline>\n <clear-tradeline-reason-code-description>hi</clear-tradeline-reason-code-description>\n <some-dashed-node></some-dashed-node>\n <nodashesnode></nodashesnode>\n </supplier-tradeline>\n </clear-tradeline>\n</xml-response>\n"
irb(main):015:0> ox_doc = Ox.parse xml
#<Ox::Document:0x101e2fd68
attr_reader :attributes = {
:version => "1.0"
},
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fca0
attr_accessor :value = "xml-response",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fc28
attr_accessor :value = "nodashesnode",
attr_reader :nodes = [
[0] "hihi"
]
>,
[1] #<Ox::Element:0x101e2fb88
attr_accessor :value = "clear-tradeline",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fb10
attr_accessor :value = "supplier-tradeline",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fa98
attr_accessor :value = "clear-tradeline-reason-code-description",
attr_reader :nodes = [
[0] "hi"
]
>,
[1] #<Ox::Element:0x101e2f9f8
attr_accessor :value = "some-dashed-node",
attr_reader :nodes = []
>,
[2] #<Ox::Element:0x101e2f980
attr_accessor :value = "nodashesnode",
attr_reader :nodes = []
>
]
>
]
>
]
>
]
>
irb(main):016:0> ox_doc.locate 'clear-tradeline-reason-code-description'
[]
irb(main):017:0> ox_doc.locate 'nodashesnode'
[]
irb(main):018:0> ox_doc.locate 'xml-response'
[
[0] #<Ox::Element:0x101e2fca0
attr_accessor :value = "xml-response",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fc28
attr_accessor :value = "nodashesnode",
attr_reader :nodes = [
[0] "hihi"
]
>,
[1] #<Ox::Element:0x101e2fb88
attr_accessor :value = "clear-tradeline",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fb10
attr_accessor :value = "supplier-tradeline",
attr_reader :nodes = [
[0] #<Ox::Element:0x101e2fa98
attr_accessor :value = "clear-tradeline-reason-code-description",
attr_reader :nodes = [
[0] "hi"
]
>,
[1] #<Ox::Element:0x101e2f9f8
attr_accessor :value = "some-dashed-node",
attr_reader :nodes = []
>,
[2] #<Ox::Element:0x101e2f980
attr_accessor :value = "nodashesnode",
attr_reader :nodes = []
>
]
>
]
>
]
>
]
I'm having some difficulties with Ox in that it isn't picking up whitespace in elements with nothing but whitespace in them.
For example, in the sample document below, the <element>
with only whitespace is returning nil
for its text
value, when you would expect it to return a space character.
require 'ox'
xml = <<-END
<?xml?>
<root>
<element>Hello, this is</element>
<element> </element>
<element>a sentence. </element>
</root>
END
doc = Ox.parse(xml)
p doc.root.element(0).text #=> "Hello, this is"
p doc.root.element(1).text #=> nil
p doc.root.element(2).text #=> "a sentence. "
The following fails for me using ox 2.0.0 (tolerant and normal mode) under both Ruby 1.9.3 and 2.0.0 (with 32-bit and 64-bit kernels if that is useful, tested under Ubuntu).
ruby -rox -ropen-uri -e 'Ox.sax_parse(Ox::Sax.new, open("http://go.alphashare.com/external/external.php?__method=external_xmlfeed&__feed=kyr&__company_serial=376"))'
Please see [https://gist.github.com/pgeraghty/5431830] for more information.
I have checked with a handler and it does actually appear to get all the way through to the final end_element call.
I have stacks of real estate related XML feeds to test this with if it can help improve your fantastically fast parser.
Currently, Node#inspect
is way too verbose. When playing/testing with Ox documents in IRB or using p
to debug some program, Ox produces pages and pages of text.
This is mainly due to the expansion of @nodes
, maybe #inspect
could be changed to print something like:
doc = Ox.load('<foo bar="baz"><quux>meh</quux><garply/></foo>')
#=> #<Ox::Element:0x00000001bbcaa0 2 nodes, value: "foo", attributes: {:bar=>"baz"}>
IMO #inspect
doesn't have to dump the entire object, as there already is Ox.dump
and it is also easy enough to do doc.nodes.inspect
if you really want to know.
We used ox parser, its performance is outstanding.
But we found there is a failed test case, just like below:
<?xml version="1.0"?><abcdefghijklmnop></abcdefghijklmnop>
As we guess, the reason maybe is that the code doesn't handle 16-letters length element name correctly.
You could have a try to test this. Waiting for your reply...
Code to reproduce:
https://gist.github.com/glebtv/5754313
Backtrace:
https://gist.github.com/glebtv/5754318
File is a discogs.com data dump:
http://www.discogs.com/data/discogs_20130601_artists.xml.gz
Versions:
https://gist.github.com/glebtv/5754327
Is parsing from a Zlib stream supported at all?
Seems to work for some part of the file, and segfault at about 10% into it.
I'm unable to bundle this gem. Using: OSX 10.8.2
ruby 1.9.3-p194
Any ideas?
~/Sites/current/tabeso$ gem install ox -v '1.8.1'
Building native extensions. This could take a while...
ERROR: Error installing ox:
ERROR: Failed to build gem native extension.
/Users/jeremy/.rvm/rubies/ruby-1.9.3-p194/bin/ruby extconf.rb
>>>>> Creating Makefile for ruby version 1.9.3 on x86_64-darwin10.8.0 <<<<<
creating Makefile
make
compiling base64.c
compiling cache.c
compiling cache8.c
compiling cache8_test.c
compiling cache_test.c
compiling dump.c
compiling gen_load.c
compiling obj_load.c
compiling ox.c
compiling parse.c
compiling sax.c
sax.c:111:7: error: expected parameter declarator
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
/usr/include/secure/_common.h:38:63: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
/usr/include/secure/_common.h:38:63: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
/usr/include/secure/_common.h:38:54: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:27: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:4: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:111:4: note: expanded from macro 'stpncpy'
? __builtin___stpncpy_chk (dest, src, len, __darwin_obsz (dest)) \
^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:3: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
sax.c:111:7: error: conflicting types for '__builtin_object_size'
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
/usr/include/secure/_string.h:61:56: note: '__builtin_object_size' is a builtin with type 'unsigned long (const void *, int)'
return __builtin___memcpy_chk (__dest, __src, __len, __darwin_obsz0(__dest));
^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
sax.c:111:7: error: definition of builtin function '__builtin_object_size'
char *stpncpy(char *dest, const char *src, size_t n) {
^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
((__darwin_obsz0 (dest) != (size_t) -1) \
^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
^
sax.c:112:25: error: use of undeclared identifier 'src'
size_t cnt = strlen(src) + 1;
^
sax.c:114:9: error: use of undeclared identifier 'n'
if (n < cnt) {
^
sax.c:115:8: error: use of undeclared identifier 'n'
cnt = n;
^
sax.c:117:19: error: use of undeclared identifier 'src'
strncpy(dest, src, cnt);
^
/usr/include/secure/_string.h:124:37: note: expanded from macro 'strncpy'
? __builtin___strncpy_chk (dest, src, len, __darwin_obsz (dest)) \
^
sax.c:117:19: error: use of undeclared identifier 'src'
strncpy(dest, src, cnt);
^
/usr/include/secure/_string.h:125:34: note: expanded from macro 'strncpy'
: __inline_strncpy_chk (dest, src, len))
^
11 errors generated.
make: *** [sax.o] Error 1
Gem files will remain installed in /Users/jeremy/.rvm/gems/ruby-1.9.3-p194@tabeso/gems/ox-1.8.1 for inspection.
Results logged to /Users/jeremy/.rvm/gems/ruby-1.9.3-p194@tabeso/gems/ox-1.8.1/ext/ox/gem_make.out
I like the new API. I played with it a bit and looks good! I like it.
In the other issue mentioned that you could do doc.foo[] . I think that would make a lot of sense. I had some elements that all had the same name, at the same level, and I was a little surprised when it only returned the first one. If it could return the list of elements that match, that would be even cooler.
Thanks for the rad library!
The following (turkish) text....
"G 270 CDI Aç1\u0001k araç"
... is converted by Ox into the following text inside an XML-tag:
"G 270 CDI Aç1&�k araç"
Which the xml-parser signals as bad encoding.
Using v1.5.4 of the ox-gem.
Hi!
I think SAX parser should use encoding from XML declaration - it would be useful for parsing user-generated files with unknown encoding.
Here is the failing test for test/sax_test.rb:
def test_sax_non_utf8_encoding
if RUBY_VERSION.start_with?('1.8')
assert(true)
else
xml = %{<?xml version="1.0" encoding="Windows-1251"?>
<top>тест</top>
}
handler = AllSax.new()
input = StringIO.new(xml)
Ox.sax_parse(handler, input)
content = handler.calls.assoc(:text)[1]
assert_equal('Windows-1251', content.encoding.to_s)
assert_equal('тест', content.encode('UTF-8'))
end
end
What do you think about it?
Normally, if sax_parse is given an ascii-8bit string (containing utf-8 encoded data), if the xml declaration specifies utf-8, it will correctly interpret the contents as utf-8 and yield utf-8 encoded nodes. But in the presence of the standalone attribute (or other garbage attributes) it seems to fail to parse the encoding and yields ascii-8bit nodes.
The test code below will print ascii-8bit. If you remove the standalone attribute, it will print utf8
# encoding: utf-8
require 'ox'
class Handler
attr_accessor :stack
def initialize()
@stack = []
end
def doc
@stack[0]
end
def attr(name, value)
unless @stack.empty?
append(name, value)
end
end
def text(value)
append('__content__', value)
end
def cdata(value)
append('__content__', value)
end
def start_element(name)
if @stack.empty?
@stack.push(Hash.new)
end
h = Hash.new
append(name, h)
@stack.push(h)
end
def end_element(name)
@stack.pop()
end
def error(message, line, column)
raise Exception.new("#{message} at #{line}:#{column}")
end
def append(key, value)
key = key.to_s
h = @stack.last
if h.has_key?(key)
v = h[key]
if v.is_a?(Array)
v << value
else
h[key] = [v, value]
end
else
h[key] = value
end
end
end
str = %{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<label>©</label>}
str.force_encoding 'ascii-8bit'
handler = Handler.new
Ox.sax_parse(handler, StringIO.new(str), :convert_special => true)
p handler.doc['label']['__content__'].encoding
Hello, Peter!
I have a problem with jRuby versions 1.7.x
Please, see gist for more details...
On 32 bits architecture, the de-serialization of Integers serialized on 64 bits architecture returns invalid data if it's big enough. Check the following example.
On 64 bits architecture:
$ irb -r Ox
ruby-1.9.2-p180 :001 > Ox.dump 1234567890 # Fixnum
=> "<i>1234567890</i>\n"
ruby-1.9.2-p180 :002 > Ox.parse_obj "<i>1234567890</i>\n" # OK
=> 1234567890
ruby-1.9.2-p180 :003 > Ox.parse_obj "<j>1234567890</j>\n" # OK
=> 1234567890
On 32 bits architecture:
$ irb -r Ox
ruby-1.9.2-p180 :001 > Ox.dump 1234567890 # Bignum
=> "<j>1234567890</j>\n"
ruby-1.9.2-p180 :002 > Ox.parse_obj "<j>1234567890</j>\n" # OK
=> 1234567890
ruby-1.9.2-p180 :003 > Ox.parse_obj "<i>1234567890</i>\n" # Fail
=> -912915758
Because Ox serialize Integers either as Fixnum or Bignum, and because the size of the Integer a Fixnum can hold depends on the machine, we have a problem of portability when we exchange data between 64 and 32 bits architectures.
I get this when calling Ox.parse(str) on a really small xml document with no more than 2 nested levels, the really annoying things is that I can only reproduce this on our test server it runs perfectly fine on my machine...
It runs perfectly on the same test server with ruby 2.0.0 unfortunately we are not entirely ready yet to deploy this application with this ruby version.
The only things I am sure about is that the error happens at the C level, the ruby error only shows the "Ox.parse" line as the last executed line, I tried patching the interpreter to increase fiber stack size since we use them but it changed nothing.
I also tried writing a minimal reproduction case but of course it works... The problem only occurs inside our application...
Do you have any idea about what could cause this or what I could try to solve this ?
Edit: Here is what my xml looks like:
<root>
<sub />
</root>
If I remove the internal sub element it works so it indeed looks like a stack overflow but I don't see how it can possibly overflow with this, the application is a simple rack server and the line before the crash there is only 15 lines in the caller array.
Is it possible ?
When dumping processing instructions, they are printed with a carriage return after the instruction. This should probably not be the case since the "content" of a processing instruction is within its angle brackets.
Here's some sample printed output from a project I'm working on:
<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/FootnoteReference">
<Content><?ACE 4?>
</Content>
</CharacterStyleRange>
With the following XML : <test>é</test>
, Ox.parse
will parse "é" correctly, but Ox.sax_parse
with :convert_special => true
will parse "\351".
For example :
require "ox"
# With Ox.parse
Ox.parse("<test>StringWithAccenté</test>").nodes.first
=> "StringWithAccenté"
# With Ox.sax_parse
class Handler < ::Ox::Sax
def text(value); puts value.inspect; end
end
Ox.sax_parse(
Handler.new,
StringIO.new("<test>StringWithAccenté</test>"),
:convert_special => true
)
=> "StringWithAccent\351" (Ruby 1.8)
=> "StringWithAccent\xE9" (Ruby 1.9)
In our case we don't call Ox, directly but use MultiXML which relies on Ox.sax_parse. The escaped 'é' is coming from an external API.
Is it a bug or the expected behaviour ?
Installing the gem fails with a NoMethodError on my system (Ubuntu 12.04) .
extconf.rb:9:in
[]' for nil:NilClass (NoMethodError)
It seems like my RUBY_DESCRIPTION constant causes the error, since it has only four elements separated by spaces and therefore setting the platform variable in line 9 fails.
My RUBY_DESCRIPTION output looks like this:
ruby 1.9.3p194 (2012-04-20) [x86_64-linux]
Specifically, this spec, which verifies that an invalid XML document raises an error, is failing: https://github.com/sferik/multi_xml/blob/20fe5f8cf5bff610035d40c63e14c59de4a1b562/spec/parser_shared_example.rb#L32-L40
I believe it was caused by f0a2dfe, since this was the only commit between 1.8.5 and 1.8.6 and I've verified that specs pass on 1.8.5.
IMHO, parsing invalid XML (e.g. <open></close>
) should not raise a SyntaxError
for the same reason articulated in ohler55/oj#39.
I'm trying to write a harvester for the arxiv.org OAI-PMH data. There's quite a bit of this, so it seemed sensible to use Ox for efficiency reasons. However, I soon noticed I was missing about 150k papers, and after investigating Ox (2.0.1) seems to be the culprit:
[2] pry(main)> require 'arxivsync'; parser=ArxivSync::Parser.new; Ox.sax_parse(parser, File.open("/home/mispy/arxiv/2013-05-25T12:45:48+10:00_436115|406001")); parser.models.count
=> 499
[3] pry(main)> require 'nokogiri'; Nokogiri(File.open("/home/mispy/arxiv/2013-05-25T12:45:48+10:00_436115|406001")).css('metadata').count
=> 1000
It seems the SAX parser will abruptly close the outer tag after certain metadata elements, discarding those remaining. The element in question looks like this:
<metadata>
<arXiv xsi:schemaLocation='http://arxiv.org/OAI/arXiv/ http://arxiv.org/OAI/arXiv.xsd' xmlns='http://arxiv.org/OAI/arXiv/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<id>1302.1147</id><created>2013-02-05</created><authors><author><keyname>Lin</keyname><forenames>Chang-shou</forenames></author><author><keyname>Zhang</keyname><forenames>Lei</forenames></author></authors><title>On Liouville systems at critical parameters, Part 1: one bubble</title><categories>math.AP</categories><msc-class>35J60, 35J55</msc-class><license>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</license><abstract> In this paper we consider bubbling solutions to the general Liouville system:
\label{abeq1} \Delta_g u_i^k+\sum_{j=1}^n a_{ij}\rho_j^k(\frac{h_j
e^{u_j^k}}{\int h_j e^{u_j^k}}-1)=0\quad\text{in}M, i=1,...,n (n\ge 2) where
$(M,g)$ is a Riemann surface, and $A=(a_{ij})_{n\times n}$ is a constant
non-negative matrix and $\rho_j^k\to \rho_j$ as $k\to \infty$. Among other
things we prove the following sharp estimates. The location of the blowup
point. The convergence rate of $\rho_j^k-\rho_j$, $j=1,..,n$. These results are
of fundamental importance for constructing bubbling solutions. It is
interesting to compare the difference between the general Liouville system and
the SU(3) Toda system on estimates (1) and (2).
</abstract></arXiv>
</metadata>
Bisect debugging led us to conclude that the effect is conditional on the inclusion of two spaces after the opening tag. This is necessary but not sufficient to reproduce the bug; other metadata elements defiantly flaunt their two spaces with no such disastrous repercussions.
The XML file in question can be found here, and a stripped-down version of the SAX parser which reproduces the bug follows:
class Parser < ::Ox::Sax
attr_accessor :count
def initialize
@count = 0
end
def start_element(name)
@count += 1 if name == :metadata
end
end
It may just be that I missed it when I was browsing the code base, but it would be nice implement a convenience method on Ox::Element that enables easy access to the Element content.
For example, with XML like the following:
<?xml version="1.0"?>
<foo>bar</foo>
It would be fantastic if one could retrieve the content for foo
without having to do something like:
element.nodes.first
#=> "bar"
Would there be any chance to implement something like the following:
element.content
#=> "bar"
Thoughts?
CDATA is usually just used as a way to escape text, but there is no real nice API to reach it in Ox. For example, compare <foo>bar</foo>
and <quux><![CDATA[<garply>nice</garply>]]></quux>
when trying to reach the text/literal data:
foo_xml.text
#=> "bar"
quux_xml.nodes.first.value
#=> "<garply>nice</garply>"
I am aware that <quux>
could have had multiple CDATA nodes, but the same holds for <foo>
containing mixed strings and elements, then #text
also returns the first string node. Additionally, I should also have checked that quux_xml.nodes.first
is even a CData
node.
On input like this: <p>“</p>
Ox.parse changes it to: <Ox:Element ... @nodes=[##x201c;], @value="p">
And then Ox.dump outputs it that same same way:<p>##x201c;</p>
I've tried it on on Ubuntu 12.10, 12.04 and 10.04, with Ruby both 1.9.3 and 1.8.7, and always gotten the same result.
Hello.
I have some trouble:
If xml file present BOM chars (<U+FEFF>) parser is crash:
ERROR -- : invalid format, expected < at line 1, column 1
(SyntaxError)
In the processes of evaluating ox, I plugged it into my existing tests/benchmarks. I'm running into an issue with a particular large file (~8.4MB) that always raises the following error:
`readpartial': end of file reached (EOFError)
The test passes with flying colors if I pass Ox.sax_parse a File object (e.g. File.open('...', 'r')). If I pass Ox.sax_parse the same file, already read into memory as a string, wrapped in a StringIO object, then it fails with the error.
The exact same tests pass in ruby 1.8.7 (but fail in 1.9.2 and 1.9.3).
Thoughts?
Hi Peter!
I use ox to parse large user generated XML files containing a lots of domain logic. Sometimes these files are valid syntatically, but are not valid in terms of domain model. And I need to show verbose error messages that'll contain line and column number where the error occurs.
Is it possible to add some trace methods to Ox::Sax? Something like this (just a suggestion):
class Sax < ::Ox::Sax
def start_element(name)
if name != 'node'
puts "Unknown element #{name} at line #{__line__}, column #{__column__}"
end
end
end
Hi,
We are trying to implement an Ox parser for Nori and we've encountered a limitation due to the way Ox implements start_element
in the Sax parser.
In Nokogiri, start_element
has a second argument attrs
that contains all the attributes for the given element. However, in Ox, attributes are parsed separately and individually in attr_value
. This creates a problem as there is no place to perform aggregated actions for all the attributes. end_element
wouldn't work as it's executed in the reversed order.
Any suggestions on a workaround?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.