GithubHelp home page GithubHelp logo

ox's Introduction

Ox gem

A fast XML parser and Object marshaller as a Ruby gem.

CI

Installation

gem install ox

Documentation

Documentation: http://www.ohler.com/ox

Source

GitHub repo: https://github.com/ohler55/ox

RubyGems repo: https://rubygems.org/gems/ox

Support

Get supported Ox with a Tidelift Subscription. Security updates are supported.

Links of Interest

Ruby XML Gem Comparison for a performance comparison between Ox, Nokogiri, and LibXML.

Fast Ruby XML Serialization to see how Ox can be used as a faster replacement for Marshal.

Fast JSON parser and marshaller on RubyGems: https://rubygems.org/gems/oj

Fast JSON parser and marshaller on GitHub: https://github.com/ohler55/oj

Release Notes

See CHANGELOG.md

Description

Optimized XML (Ox), as the name implies was written to provide speed optimized XML and now HTML handling. It was designed to be an alternative to Nokogiri and other Ruby XML parsers in generic XML parsing and as an alternative to Marshal for Object serialization.

Unlike some other Ruby XML parsers, Ox is self contained. Ox uses nothing other than standard C libraries so version issues with libXml are not an issue.

Marshal uses a binary format for serializing Objects. That binary format changes with releases making Marshal dumped Object incompatible between some versions. The use of a binary format make debugging message streams or file contents next to impossible unless the same version of Ruby and only Ruby is used for inspecting the serialize Object. Ox on the other hand uses human readable XML. Ox also includes options that allow strict, tolerant, or a mode that automatically defines missing classes.

It is possible to write an XML serialization gem with Nokogiri or other XML parsers but writing such a package in Ruby results in a module significantly slower than Marshal. This is what triggered the start of Ox development.

Ox handles XML documents in three ways. It is a generic XML parser and writer, a fast Object / XML marshaller, and a stream SAX parser. Ox was written for speed as a replacement for Nokogiri, Ruby LibXML, and for Marshal.

As an XML parser it is 2 or more times faster than Nokogiri and as a generic XML writer it is as much as 20 times faster than Nokogiri. Of course different files may result in slightly different times.

As an Object serializer Ox is up to 6 times faster than the standard Ruby Marshal.dump() and up to 3 times faster than Marshal.load().

The SAX like stream parser is 40 times faster than Nokogiri and more than 13 times faster than LibXML when validating a file with minimal Ruby callbacks. Unlike Nokogiri and LibXML, Ox can be tuned to use only the SAX callbacks that are of interest to the caller. (See the perf_sax.rb file for an example.)

Ox is compatible with Ruby 2.3, 2.4, 2.5, 2.6, 2.7, 3.0.

Object Dump Sample:

require 'ox'

class Sample
  attr_accessor :a, :b, :c

  def initialize(a, b, c)
    @a = a
    @b = b
    @c = c
  end
end

# Create Object
obj = Sample.new(1, "bee", ['x', :y, 7.0])
# Now dump the Object to an XML String.
xml = Ox.dump(obj)
# Convert the object back into a Sample Object.
obj2 = Ox.parse_obj(xml)

Generic XML Writing and Parsing:

require 'ox'

doc = Ox::Document.new

instruct = Ox::Instruct.new(:xml)
instruct[:version] = '1.0'
instruct[:encoding] = 'UTF-8'
instruct[:standalone] = 'yes'
doc << instruct

top = Ox::Element.new('top')
top[:name] = 'sample'
doc << top

mid = Ox::Element.new('middle')
mid[:name] = 'second'
top << mid

bot = Ox::Element.new('bottom')
bot[:name] = 'third'
bot << 'text at bottom'
mid << bot

other_elements = Ox::Element.new('otherElements')
other_elements << Ox::CData.new('<sender>John Smith</sender>')
other_elements << Ox::Comment.new('Director\'s commentary')
# other_elements << Ox::DocType.new('content')
other_elements << Ox::Raw.new('<warning>Be carefull with this! Direct inject into XML!</warning>')
top << other_elements


xml = Ox.dump(doc)

# xml =
# <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
# <top name="sample">
#   <middle name="second">
#     <bottom name="third">text at bottom</bottom>
#   </middle>
#   <otherElements>
#     <![CDATA[<sender>John Smith</sender>]]>
#     <!-- Director's commentary -->
#     <warning>Be carefull with this! Direct inject into XML!</warning>
#   </otherElements>
# </top>

HTML Parsing:

Ox can be used to parse HTML with a few options changes. HTML is often loose in regard to conformance. For HTML parsing try these options.

Ox.default_options = {
    mode:   :generic,
    effort: :tolerant,
    smart:  true
}

SAX XML Parsing:

require 'stringio'
require 'ox'

class Sample < ::Ox::Sax
  def start_element(name); puts "start: #{name}";        end
  def end_element(name);   puts "end: #{name}";          end
  def attr(name, value);   puts "  #{name} => #{value}"; end
  def text(value);         puts "text #{value}";         end
end

io = StringIO.new(%{
<top name="sample">
  <middle name="second">
    <bottom name="third"/>
  </middle>
</top>
})

handler = Sample.new()
Ox.sax_parse(handler, io)
# outputs
# start: top
#   name => sample
# start: middle
#   name => second
# start: bottom
#   name => third
# end: bottom
# end: middle
# end: top

Yielding results immediately while SAX XML Parsing:

require 'stringio'
require 'ox'

class Yielder < ::Ox::Sax
  def initialize(block); @yield_to = block; end
  def start_element(name); @yield_to.call(name); end
end

io = StringIO.new(%{
<top name="sample">
  <middle name="second">
    <bottom name="third"/>
  </middle>
</top>
})

proc = Proc.new { |name| puts name }
handler = Yielder.new(proc)
puts "before parse"
Ox.sax_parse(handler, io)
puts "after parse"
# outputs
# before parse
# top
# middle
# bottom
# after parse

Parsing XML into a Hash (fast)

require 'ox'

xml = %{
<top name="sample">
  <middle name="second">
    <bottom name="third">Rock bottom</bottom>
  </middle>
</top>
}

puts Ox.load(xml, mode: :hash)
puts Ox.load(xml, mode: :hash_no_attrs)

#{:top=>[{:name=>"sample"}, {:middle=>[{:name=>"second"}, {:bottom=>[{:name=>"third"}, "Rock bottom"]}]}]}
#{:top=>{:middle=>{:bottom=>"Rock bottom"}}}

Object XML format

The XML format used for Object encoding follows the structure of the Object. Each XML element is encoded so that the XML element name is a type indicator. Attributes of the element provide additional information such as the Class if relevant, the Object attribute name, and Object ID if necessary.

The type indicator map is:

  • a => Array
  • b => Base64 - only for legacy loads
  • c => Class
  • f => Float
  • g => Regexp
  • h => Hash
  • i => Fixnum
  • j => Bignum
  • l => Rational
  • m => Symbol
  • n => FalseClass
  • o => Object
  • p => Ref
  • r => Range
  • s => String
  • t => Time
  • u => Struct
  • v => Complex
  • x => Raw
  • y => TrueClass
  • z => NilClass

If the type is an Object, type 'o' then an attribute named 'c' should be set with the full Class name including the Module names. If the XML element represents an Object then a sub-elements is included for each attribute of the Object. An XML element attribute 'a' is set with a value that is the name of the Ruby Object attribute. In all cases, except for the Exception attribute hack the attribute names begin with an @ character. (Exception are strange in that the attributes of the Exception Class are not named with a @ suffix. A hack since it has to be done in C and can not be done through the interpreter.)

Values are encoded as the text portion of an element or in the sub-elements of the principle. For example, a Fixnum is encoded as:

<i>123</i>

An Array has sub-elements and is encoded similar to this example.

<a>
  <i>1</i>
  <s>abc</s>
</a>

A Hash is encoded with an even number of elements where the first element is the key and the second is the value. This is repeated for each entry in the Hash. An example is of { 1 => 'one', 2 => 'two' } encoding is:

<h>
  <i>1</i>
  <s>one</s>
  <i>2</i>
  <s>two</s>
</h>

Ox supports circular references where attributes of one Object can refer to an Object that refers back to the first Object. When this option is used an Object ID is added to each XML Object element as the value of the 'a' attribute.

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]

ox's People

Contributors

bobofraggins avatar cade avatar cbliard avatar choosen avatar dependabot[bot] avatar dersascha avatar ezekg avatar foton avatar gui avatar joshvoigts avatar mberlanda avatar mcarpenter avatar monkeywithacupcake avatar nschonni avatar ohler55 avatar okeeblow avatar olleolleolle avatar pgeraghty avatar rubymaniac avatar rudylee avatar saulius avatar seamusabshere avatar sharpyfox avatar slotos avatar smithtim avatar sriedel avatar takkanm avatar tim-vandecasteele avatar uelb avatar watson1978 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ox's Issues

Sax Parsing Segfault by Ruby GC

I posted this in another issue but realized it was closed, and because it may not be related figured I'd open a clean one up.

I've found sporadic seg faults occurring ever since I switched from a Nokogiri to Ox Sax Parser. I'm running Ruby 1.9.3-p125 and have found the same issue with Ox 1.9.2, 1.9.3, and 2.0.0.

I'm parsing XML within a Rails app and when I made the switch over to Ox these segfaults started occurring, but never in the same place. Sometimes they would happen in ActiveSupport, other times elsewhere in the app. I dug into the .crash file for each seg fault however and found that in every instance the thread ends with the following trace:

0   libsystem_kernel.dylib                  0x00007fff92ab0ce2 __pthread_kill + 10
1   libsystem_c.dylib                       0x00007fff972447d2 pthread_kill + 95
2   libsystem_c.dylib                       0x00007fff97235a7a abort + 143
3   ruby                                    0x000000010bec6ed4 rb_bug + 212
4   ruby                                    0x000000010bf8f62f sigsegv + 127
5   libsystem_c.dylib                       0x00007fff97296cfa _sigtramp + 26
6   ruby                                    0x000000010bee3bd9 gc_marks + 345
7   ruby                                    0x000000010bee40bd garbage_collect + 253
8   ruby                                    0x000000010bee4796 vm_xmalloc + 150

So Ruby's GC hits a bad spot in memory eventually if Ox is used I guess?. I'm wondering if there are some memory management conflicts between the native Ox extension and Ruby?

JRuby compatability broken

Hi Peter!

Today I've tried to run OX under JRuby and got issues with encoding. Then I've run test/sax_test.rb and found it broken - 5 failing tests.

My ruby is: jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_26) [linux-amd64-java]

Error installing ox

I'm unable to bundle this gem. Using: OSX 10.8.2 ruby 1.9.3-p194 Any ideas?

~/Sites/current/tabeso$ gem install ox -v '1.8.1'
Building native extensions.  This could take a while...
ERROR:  Error installing ox:
    ERROR: Failed to build gem native extension.

        /Users/jeremy/.rvm/rubies/ruby-1.9.3-p194/bin/ruby extconf.rb
>>>>> Creating Makefile for ruby version 1.9.3 on x86_64-darwin10.8.0 <<<<<
creating Makefile

make
compiling base64.c
compiling cache.c
compiling cache8.c
compiling cache8_test.c
compiling cache_test.c
compiling dump.c
compiling gen_load.c
compiling obj_load.c
compiling ox.c
compiling parse.c
compiling sax.c
sax.c:111:7: error: expected parameter declarator
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
    ^
/usr/include/secure/_common.h:38:63: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                                                              ^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
    ^
/usr/include/secure/_common.h:38:63: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                                                              ^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
    ^
/usr/include/secure/_common.h:38:54: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                                                     ^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:27: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
                          ^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:4: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
   ^
sax.c:111:7: error: expected ')'
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:111:4: note: expanded from macro 'stpncpy'
   ? __builtin___stpncpy_chk (dest, src, len, __darwin_obsz (dest))     \
   ^
sax.c:111:7: note: to match this '('
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:3: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
  ^
sax.c:111:7: error: conflicting types for '__builtin_object_size'
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
    ^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                               ^
/usr/include/secure/_string.h:61:56: note: '__builtin_object_size' is a builtin with type 'unsigned long (const void *, int)'
  return __builtin___memcpy_chk (__dest, __src, __len, __darwin_obsz0(__dest));
                                                       ^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                               ^
sax.c:111:7: error: definition of builtin function '__builtin_object_size'
char *stpncpy(char *dest, const char *src, size_t n) {
      ^
/usr/include/secure/_string.h:110:5: note: expanded from macro 'stpncpy'
  ((__darwin_obsz0 (dest) != (size_t) -1)                               \
    ^
/usr/include/secure/_common.h:38:32: note: expanded from macro '__darwin_obsz0'
#define __darwin_obsz0(object) __builtin_object_size (object, 0)
                               ^
sax.c:112:25: error: use of undeclared identifier 'src'
    size_t      cnt = strlen(src) + 1;
                             ^
sax.c:114:9: error: use of undeclared identifier 'n'
    if (n < cnt) {
        ^
sax.c:115:8: error: use of undeclared identifier 'n'
        cnt = n;
              ^
sax.c:117:19: error: use of undeclared identifier 'src'
    strncpy(dest, src, cnt);
                  ^
/usr/include/secure/_string.h:124:37: note: expanded from macro 'strncpy'
   ? __builtin___strncpy_chk (dest, src, len, __darwin_obsz (dest))     \
                                    ^
sax.c:117:19: error: use of undeclared identifier 'src'
    strncpy(dest, src, cnt);
                  ^
/usr/include/secure/_string.h:125:34: note: expanded from macro 'strncpy'
   : __inline_strncpy_chk (dest, src, len))
                                 ^
11 errors generated.
make: *** [sax.o] Error 1


Gem files will remain installed in /Users/jeremy/.rvm/gems/ruby-1.9.3-p194@tabeso/gems/ox-1.8.1 for inspection.
Results logged to /Users/jeremy/.rvm/gems/ruby-1.9.3-p194@tabeso/gems/ox-1.8.1/ext/ox/gem_make.out

Encoding issues (Ruby 1.9.x)

Ox currently doesn't seem to care about (Ruby) encodings. I am not sure where to start this, so here a use case.

Let's create 2 XML documents:

x1 = %(<?xml version="1.0" encoding="ISO-8859-1" ?><tag key="value">Français</tag>).encode("ISO-8859-1")
# => "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?><tag key=\"value\">Fran\xE7ais</tag>" 
x1.encoding
# => #<Encoding:ISO-8859-1> 

x2 = %(<?xml version="1.0" encoding="UTF-8" ?><tag key="value">Français</tag>)
# => "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><tag key=\"value\">Fran\xC3\xA7ais</tag>"
x2.encoding
# => #<Encoding:UTF-8> 

With Ox:

class OH < ::Ox::Sax
  def start_element(name)
    puts "EL: #{name} (#{name.encoding})"
  end

  def end_element(name)
  end

  def attr(key, value)
    puts "AT: #{key} => #{value} (#{key.encoding} => #{value.encoding})"
  end

  def text(value)
    puts "TX: #{value} (#{value.encoding})"
  end
end

::Ox.sax_parse OH.new, StringIO.new(x1)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
#    AT: encoding => ISO-8859-1 (US-ASCII => ISO-8859-1)
#    EL: tag (US-ASCII)
#    AT: key => value (US-ASCII => ISO-8859-1)
#    TX: Fran�ais (ISO-8859-1)

::Ox.sax_parse OH.new, StringIO.new(x2)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
#    AT: encoding => UTF-8 (US-ASCII => UTF-8)
#    EL: tag (US-ASCII)
#    AT: key => value (US-ASCII => UTF-8)
#    TX: Français (UTF-8)

Now the same with Nokogiri:

class NH

  def self.parse(io)
    root = Nokogiri::XML(io).root
    puts "EL: #{root.name} (#{root.name.encoding})"
    root.attributes.each do |key, value|
      puts "AT: #{key} => #{value.value} (#{key.encoding} => #{value.value.encoding})"
    end
    puts "TX: #{root.text} (#{root.text.encoding})"    
  end

end

NH.parse StringIO.new(x1)
# => EL: tag (UTF-8)
#    AT: key => value (UTF-8 => UTF-8)
#    TX: Français (UTF-8)

NH.parse StringIO.new(x2)
# => EL: tag (UTF-8)
#    AT: key => value (UTF-8 => UTF-8)
#    TX: Français (UTF-8)

As you can see, Nokogiri encodes everything correctly to Encoding.default_external while Ox's encodings are a little "random".

It gets a lot worse with non-ASCII attributes:

x1 = %(<?xml version="1.0" encoding="ISO-8859-1" ?><tag Português="Español">Français</tag>).encode("ISO-8859-1")
x2 = %(<?xml version="1.0" encoding="UTF-8" ?><tag Português="Español">Français</tag>)

NH.parse StringIO.new(x1)
# => EL: tag (UTF-8)
#    AT: Português => Español (UTF-8 => UTF-8)
#    TX: Français (UTF-8)

NH.parse StringIO.new(x2)
# Same as above

::Ox.sax_parse OH.new, StringIO.new(x1)
# => AT: version => 1.0 (US-ASCII => ASCII-8BIT)
#    AT: encoding => ISO-8859-1 (US-ASCII => ISO-8859-1)
#    EL: tag (US-ASCII)
#    EncodingError: invalid encoding symbol

Any ideas? Cheers, dim

Ox.parse stack overflow on 4-deep nested tags under Cygwin

Ox 1.9.2, ruby 1.9.3 p374 on Cygwin

irb(main):001:0> require 'ox'
=> true
irb(main):002:0> Ox.parse('<?xml version="1.0" encoding="UTF-8"?><a><b><c><d></d></c></b></a>')
SystemStackError: stack level too deep
        from /usr/lib/ruby/1.9.1/irb/workspace.rb:80
Maybe IRB bug!
irb(main):003:0> 

Looks fine on other platforms, and also fine on Cygwin with only 3-deep tags:

irb(main):001:0> require 'ox'
=> true
irb(main):002:0> Ox.parse('<?xml version="1.0" encoding="UTF-8"?><a><b><c></c></b></a>')
=> #<Ox::Document:0x802f080c @attributes={:version=>"1.0", :encoding=>"UTF-8"}, @nodes=[#<Ox::Element:0x802f071c @value="a", @nodes=[#<Ox::Element:0x802f06e0 @value="b", @nodes=[#<Ox::Element:0x802f06a4 @value="c", @nodes=[]>]>]>]>
irb(main):003:0> 

I don't think this is environmental but might be nice to get confirmation from another Cygwin user before going beserk on this one.

double free or corruption crash

The following fails for me using ox 2.0.0 (tolerant and normal mode) under both Ruby 1.9.3 and 2.0.0 (with 32-bit and 64-bit kernels if that is useful, tested under Ubuntu).

ruby -rox -ropen-uri -e 'Ox.sax_parse(Ox::Sax.new, open("http://go.alphashare.com/external/external.php?__method=external_xmlfeed&__feed=kyr&__company_serial=376"))'

Please see [https://gist.github.com/pgeraghty/5431830] for more information.

I have checked with a handler and it does actually appear to get all the way through to the final end_element call.

I have stacks of real estate related XML feeds to test this with if it can help improve your fantastically fast parser.

Serialization of Integer not portable between 32 and 64 bits

On 32 bits architecture, the de-serialization of Integers serialized on 64 bits architecture returns invalid data if it's big enough. Check the following example.

On 64 bits architecture:

$ irb -r Ox
ruby-1.9.2-p180 :001 > Ox.dump 1234567890 # Fixnum
 => "<i>1234567890</i>\n" 
ruby-1.9.2-p180 :002 > Ox.parse_obj "<i>1234567890</i>\n" # OK
 => 1234567890 
ruby-1.9.2-p180 :003 > Ox.parse_obj "<j>1234567890</j>\n" # OK
 => 1234567890 

On 32 bits architecture:

$ irb -r Ox
ruby-1.9.2-p180 :001 > Ox.dump 1234567890 # Bignum
 => "<j>1234567890</j>\n" 
ruby-1.9.2-p180 :002 > Ox.parse_obj "<j>1234567890</j>\n" # OK
 => 1234567890 
ruby-1.9.2-p180 :003 > Ox.parse_obj "<i>1234567890</i>\n" # Fail
 => -912915758 

Because Ox serialize Integers either as Fixnum or Bignum, and because the size of the Integer a Fixnum can hold depends on the machine, we have a problem of portability when we exchange data between 64 and 32 bits architectures.

sax parser ignores encoding="UTF-8" if the standalone attribute is also present

Normally, if sax_parse is given an ascii-8bit string (containing utf-8 encoded data), if the xml declaration specifies utf-8, it will correctly interpret the contents as utf-8 and yield utf-8 encoded nodes. But in the presence of the standalone attribute (or other garbage attributes) it seems to fail to parse the encoding and yields ascii-8bit nodes.

The test code below will print ascii-8bit. If you remove the standalone attribute, it will print utf8

# encoding: utf-8

require 'ox'

class Handler
  attr_accessor :stack

  def initialize()
    @stack = []
  end

  def doc
    @stack[0]
  end

  def attr(name, value)
    unless @stack.empty?
      append(name, value)
    end
  end

  def text(value)
    append('__content__', value)
  end

  def cdata(value)
    append('__content__', value)
  end

  def start_element(name)
    if @stack.empty?
      @stack.push(Hash.new)
    end
    h = Hash.new
    append(name, h)
    @stack.push(h)
  end

  def end_element(name)
    @stack.pop()
  end

  def error(message, line, column)
    raise Exception.new("#{message} at #{line}:#{column}")
  end

  def append(key, value)
    key = key.to_s
    h = @stack.last
    if h.has_key?(key)
      v = h[key]
      if v.is_a?(Array)
        v << value
      else
        h[key] = [v, value]
      end
    else
      h[key] = value
    end
  end

end

str = %{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<label>©</label>}
str.force_encoding 'ascii-8bit'
handler = Handler.new
Ox.sax_parse(handler, StringIO.new(str), :convert_special => true)
p handler.doc['label']['__content__'].encoding

SAX parser crash on xml with BOM

Hello.
I have some trouble:

If xml file present BOM chars (<U+FEFF>) parser is crash:

ERROR -- : invalid format, expected < at line 1, column 1
(SyntaxError)

Ox.parse changes numeric entities to from &#xHHHH; to ##xHHHH;

On input like this: <p>&#x201c;</p>
Ox.parse changes it to: <Ox:Element ... @nodes=[##x201c;], @value="p">

And then Ox.dump outputs it that same same way:<p>##x201c;</p>

I've tried it on on Ubuntu 12.10, 12.04 and 10.04, with Ruby both 1.9.3 and 1.8.7, and always gotten the same result.

SIGSEGV on repeated top-level element

Clearly this isn't valid XML... but Ox shouldn't segfault either.

mcarpenter@ubuntu:/tmp$ uname -a 
Linux ubuntu 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
mcarpenter@ubuntu:/tmp$ ruby --version 
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]
mcarpenter@ubuntu:/tmp$ gem list ox 

*** LOCAL GEMS ***

ox (1.8.0)
mcarpenter@ubuntu:/tmp$ irb
ruby-1.9.2-p180 :001 > require 'ox'
 => true 
ruby-1.9.2-p180 :002 > Ox.parse('<foo></foo>')
 => #<Ox::Element:0x000000019184e8 @value="foo", @nodes=[]> 
ruby-1.9.2-p180 :003 > Ox.parse('<foo></foo><foo></foo>')
(irb):3: [BUG] Segmentation fault
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]

-- control frame ----------
c:0024 p:---- s:0086 b:0086 l:000085 d:000085 CFUNC  :parse
c:0023 p:0017 s:0082 b:0082 l:000a98 d:000081 EVAL   (irb):3
c:0022 p:---- s:0080 b:0080 l:000079 d:000079 FINISH
c:0021 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC  :eval
c:0020 p:0028 s:0071 b:0071 l:000070 d:000070 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80
c:0019 p:0033 s:0064 b:0063 l:000062 d:000062 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/context.rb:254
c:0018 p:0031 s:0058 b:0058 l:000eb8 d:000057 BLOCK  /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:159
c:0017 p:0042 s:0050 b:0050 l:000049 d:000049 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:273
c:0016 p:0011 s:0045 b:0045 l:000eb8 d:000044 BLOCK  /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:156
c:0015 p:0144 s:0041 b:0041 l:000024 d:000040 BLOCK  /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:243
c:0014 p:---- s:0038 b:0038 l:000037 d:000037 FINISH
c:0013 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC  :loop
c:0012 p:0009 s:0033 b:0033 l:000024 d:000032 BLOCK  /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229
c:0011 p:---- s:0031 b:0031 l:000030 d:000030 FINISH
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 CFUNC  :catch
c:0009 p:0023 s:0025 b:0025 l:000024 d:000024 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228
c:0008 p:0046 s:0022 b:0022 l:000eb8 d:000eb8 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:155
c:0007 p:0011 s:0019 b:0019 l:000af8 d:000018 BLOCK  /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:70
c:0006 p:---- s:0017 b:0017 l:000016 d:000016 FINISH
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 CFUNC  :catch
c:0004 p:0183 s:0011 b:0011 l:000af8 d:000af8 METHOD /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69
c:0003 p:0142 s:0006 b:0006 l:000ec8 d:000318 EVAL   /usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/bin/irb:16
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000ec8 d:000ec8 TOP   
---------------------------
-- Ruby level backtrace information ----------------------------------------
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/bin/irb:16:in `<main>'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69:in `start'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:69:in `catch'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:70:in `block in start'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:155:in `eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:228:in `catch'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `block in each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `loop'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/ruby-lex.rb:243:in `block (2 levels) in each_top_level_statement'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:156:in `block in eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:273:in `signal_status'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb.rb:159:in `block (2 levels) in eval_input'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/context.rb:254:in `evaluate'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80:in `evaluate'
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/irb/workspace.rb:80:in `eval'
(irb):3:in `irb_binding'
(irb):3:in `parse'

-- C level backtrace information -------------------------------------------
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_vm_bugreport+0x61) [0x7f9452562101]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x5f24e) [0x7f945244c24e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_bug+0xa5) [0x7f945244d075]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x10b874) [0x7f94524f8874]
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f94520644a0]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x300c6) [0x7f945241d0c6]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0x14041) [0x7f9450176041]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0x8f88) [0x7f945016af88]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(ox_parse+0x133) [0x7f945016bd33]
/usr/share/ruby-rvm/gems/ruby-1.9.2-p180/gems/ox-1.8.0/ext/ox/ox.so(+0xee23) [0x7f9450170e23]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16ac53) [0x7f9452557c53]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_f_eval+0xbf) [0x7f94525580ff]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16eebf) [0x7f945255bebf]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_rescue2+0x16b) [0x7f94524535bb]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16134e) [0x7f945254e34e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16f82e) [0x7f945255c82e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_catch_obj+0xc6) [0x7f945254fa16]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x162ace) [0x7f945254face]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16f82e) [0x7f945255c82e]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_catch_obj+0xc6) [0x7f945254fa16]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x162ace) [0x7f945254face]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16e5a6) [0x7f945255b5a6]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x164978) [0x7f9452551978]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x16a80b) [0x7f945255780b]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(rb_iseq_eval_main+0xb1) [0x7f945255d631]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(+0x65292) [0x7f9452452292]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(ruby_exec_node+0x1d) [0x7f945245314d]
/usr/share/ruby-rvm/rubies/ruby-1.9.2-p180/lib/libruby.so.1.9(ruby_run_node+0x1e) [0x7f945245540e]
irb(main+0x4b) [0x40082b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f945204f76d]
irb() [0x400859]

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Aborted (core dumped)
mcarpenter@ubuntu:/tmp$ 
mcarpenter@ubuntu:/tmp$ gdb `which ruby` core 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /var/cache/ruby-rvm/rubies/ruby-1.9.2-p180/bin/ruby...done.
[New LWP 3688]
[New LWP 3689]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `irb                                                    '.
Program terminated with signal 6, Aborted.
#0  0x00007f9452064425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0  0x00007f9452064425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f9452067b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f945244d07a in rb_bug (fmt=0x7f945258bc68 "Segmentation fault") at error.c:253
#3  0x00007f94524f8874 in sigsegv (sig=<optimized out>, info=<optimized out>, ctx=<optimized out>) at signal.c:613
#4  <signal handler called>
#5  rb_ary_push_1 (ary=1, item=26341040) at array.c:728
#6  0x00007f9450176041 in add_element (pi=0x7ffff0e83400, ename=<optimized out>, attrs=<optimized out>, hasChildren=1) at gen_load.c:329
#7  0x00007f945016af88 in read_element (pi=0x7ffff0e83400) at parse.c:388
#8  0x00007f945016bd33 in ox_parse (xml=<optimized out>, pcb=<optimized out>, endp=0x0, options=<optimized out>) at parse.c:160
#9  0x00007f9450170e23 in to_gen (self=<optimized out>, ruby_xml=26341240) at ox.c:413
#10 0x00007f945255b5a6 in vm_call_cfunc (me=0x15cc910, blockptr=0x0, recv=19428600, num=1, reg_cfp=0x7f9452a13828, th=<optimized out>)
    at vm_insnhelper.c:402
#11 vm_call_method (th=<optimized out>, cfp=0x7f9452a13828, num=<optimized out>, blockptr=0x0, flag=<optimized out>, id=<optimized out>, me=0x15cc910, 
    recv=19428600) at vm_insnhelper.c:524
#12 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#13 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#14 0x00007f9452557c53 in eval_string_with_cref (self=19615680, src=26313040, scope=19613560, cref=0x0, file=0x12b3ce8 "(irb)", line=3) at vm_eval.c:1028
#15 0x00007f94525580ff in eval_string (line=<optimized out>, file=<optimized out>, scope=<optimized out>, src=<optimized out>, self=19615680)
    at vm_eval.c:1070
#16 rb_f_eval (argc=4, argv=<optimized out>, self=19615680) at vm_eval.c:1118
#17 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f5cc0, blockptr=0x0, recv=19615680, num=4, reg_cfp=0x7f9452a13930, th=<optimized out>)
    at vm_insnhelper.c:402
#18 vm_call_method (th=<optimized out>, cfp=0x7f9452a13930, num=<optimized out>, blockptr=0x0, flag=<optimized out>, id=<optimized out>, me=0x12f5cc0, 
    recv=19615680) at vm_insnhelper.c:524
#19 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#20 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#21 0x00007f945255bebf in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x0, argc=0, self=<optimized out>, block=<optimized out>, th=<optimized out>)
    at vm.c:558
#22 vm_yield (th=<optimized out>, argv=0x0, argc=0) at vm.c:588
#23 rb_yield_0 (argv=0x0, argc=0) at vm_eval.c:740
#24 loop_i () at vm_eval.c:798
#25 0x00007f94524535bb in rb_rescue2 (b_proc=0x7f945255bbe0 <loop_i>, data1=0, r_proc=0, data2=0) at eval.c:646
#26 0x00007f945254e34e in rb_f_loop (self=19593640) at vm_eval.c:826
#27 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f6740, blockptr=0x7f9452a13c18, recv=19593640, num=0, reg_cfp=0x7f9452a13bf0, th=<optimized out>)
    at vm_insnhelper.c:402
#28 vm_call_method (th=<optimized out>, cfp=0x7f9452a13bf0, num=<optimized out>, blockptr=0x7f9452a13c18, flag=<optimized out>, id=<optimized out>, 
    me=0x12f6740, recv=19593640) at vm_insnhelper.c:524
#29 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#30 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#31 0x00007f945255c82e in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x7ffff0e8a3c8, argc=1, self=19593640, block=<optimized out>, th=0x126d190)
    at vm.c:558
#32 vm_yield (th=0x126d190, argv=0x7ffff0e8a3c8, argc=1) at vm.c:588
#33 rb_yield_0 (argv=0x7ffff0e8a3c8, argc=1) at vm_eval.c:740
#34 catch_i (tag=4052238, data=<optimized out>) at vm_eval.c:1458
#35 0x00007f945254fa16 in rb_catch_obj (tag=4052238, func=0x7f945255c570 <catch_i>, data=0) at vm_eval.c:1533
#36 0x00007f945254face in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1509
#37 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f63c0, blockptr=0x7f9452a13d20, recv=19593640, num=1, reg_cfp=0x7f9452a13cf8, th=<optimized out>)
    at vm_insnhelper.c:402
#38 vm_call_method (th=<optimized out>, cfp=0x7f9452a13cf8, num=<optimized out>, blockptr=0x7f9452a13d20, flag=<optimized out>, id=<optimized out>, 
    me=0x12f63c0, recv=19593640) at vm_insnhelper.c:524
#39 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#40 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#41 0x00007f945255c82e in invoke_block_from_c (cref=0x0, blockptr=0x0, argv=0x7ffff0e8a888, argc=1, self=19546480, block=<optimized out>, th=0x126d190)
    at vm.c:558
#42 vm_yield (th=0x126d190, argv=0x7ffff0e8a888, argc=1) at vm.c:588
#43 rb_yield_0 (argv=0x7ffff0e8a888, argc=1) at vm_eval.c:740
#44 catch_i (tag=3218702, data=<optimized out>) at vm_eval.c:1458
#45 0x00007f945254fa16 in rb_catch_obj (tag=3218702, func=0x7f945255c570 <catch_i>, data=0) at vm_eval.c:1533
#46 0x00007f945254face in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1509
#47 0x00007f945255b5a6 in vm_call_cfunc (me=0x12f63c0, blockptr=0x7f9452a13ed8, recv=19546480, num=1, reg_cfp=0x7f9452a13eb0, th=<optimized out>)
    at vm_insnhelper.c:402
#48 vm_call_method (th=<optimized out>, cfp=0x7f9452a13eb0, num=<optimized out>, blockptr=0x7f9452a13ed8, flag=<optimized out>, id=<optimized out>, 
    me=0x12f63c0, recv=19546480) at vm_insnhelper.c:524
#49 0x00007f9452551978 in vm_exec_core (th=<optimized out>, initial=<optimized out>) at insns.def:1006
#50 0x00007f945255780b in vm_exec (th=0x126d190) at vm.c:1147
#51 0x00007f945255d631 in rb_iseq_eval_main (iseqval=19535800) at vm.c:1388
#52 0x00007f9452452292 in ruby_exec_internal (n=0x12a17b8) at eval.c:214
#53 0x00007f945245314d in ruby_exec_node (n=0x12a17b8) at eval.c:261
#54 0x00007f945245540e in ruby_run_node (n=0x12a17b8) at eval.c:254
#55 0x000000000040082b in main (argc=2, argv=0x7ffff0e8afb8) at main.c:35
(gdb) quit
mcarpenter@ubuntu:/tmp$ 

compact output

is there a way for Ox.dump() to output the xml as a one-liner without any indentation/newline ?

I could not find anything so I ended relying on the following but I feel dirty doing things like that xD

Ox.dump(xml, indent: 0).gsub("\n", "")

Ox.parse converts numeric character reference &#233; but Ox.sax_parse doesn't.

With the following XML : <test>&#233;</test>, Ox.parse will parse "é" correctly, but Ox.sax_parse with :convert_special => true will parse "\351".

For example :

require "ox" 
# With Ox.parse
Ox.parse("<test>StringWithAccent&#233;</test>").nodes.first
=> "StringWithAccenté" 

# With Ox.sax_parse
class Handler < ::Ox::Sax
  def text(value); puts value.inspect; end
end

Ox.sax_parse(
    Handler.new, 
    StringIO.new("<test>StringWithAccent&#233;</test>"),
    :convert_special => true
)
=> "StringWithAccent\351"  (Ruby 1.8)
=> "StringWithAccent\xE9"  (Ruby 1.9)

In our case we don't call Ox, directly but use MultiXML which relies on Ox.sax_parse. The escaped 'é' is coming from an external API.

Is it a bug or the expected behaviour ?

Native extensions fail to build

Installing the gem fails with a NoMethodError on my system (Ubuntu 12.04) .

extconf.rb:9:in

': undefined method []' for nil:NilClass (NoMethodError)

It seems like my RUBY_DESCRIPTION constant causes the error, since it has only four elements separated by spaces and therefore setting the platform variable in line 9 fails.

My RUBY_DESCRIPTION output looks like this:

ruby 1.9.3p194 (2012-04-20) [x86_64-linux]

USE_B64

is it commented because it does not work ?
I was curious because it seems a nice feature :)

Dumping of processing instructions

When dumping processing instructions, they are printed with a carriage return after the instruction. This should probably not be the case since the "content" of a processing instruction is within its angle brackets.

Here's some sample printed output from a project I'm working on:

<CharacterStyleRange AppliedCharacterStyle="CharacterStyle/FootnoteReference">
  <Content><?ACE 4?>
  </Content>
</CharacterStyleRange>

`readpartial': end of file reached (EOFError) issue in 1.9

In the processes of evaluating ox, I plugged it into my existing tests/benchmarks. I'm running into an issue with a particular large file (~8.4MB) that always raises the following error:

`readpartial': end of file reached (EOFError)

The test passes with flying colors if I pass Ox.sax_parse a File object (e.g. File.open('...', 'r')). If I pass Ox.sax_parse the same file, already read into memory as a string, wrapped in a StringIO object, then it fails with the error.

The exact same tests pass in ruby 1.8.7 (but fail in 1.9.2 and 1.9.3).

Thoughts?

SAX parser segfaults Ruby

I am using Ruby 1.9.3p194 on x86_64 and I have an XML file that crashes Ruby when using SAX parsing:

% ruby -rox -e 'File.open("broken.xml", "r") { |io| Ox.sax_parse(Ox::Sax.new, io) }'
-e:1: [BUG] Segmentation fault
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0007 p:---- s:0021 b:0021 l:000020 d:000020 CFUNC  :sax_parse
c:0006 p:0033 s:0016 b:0016 l:002558 d:000015 BLOCK  -e:1
c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC  :open
c:0003 p:0019 s:0006 b:0006 l:002558 d:002318 EVAL   -e:1
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:002558 d:002558 TOP   

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
-e:1:in `open'
-e:1:in `block in <main>'
-e:1:in `sax_parse'

-- C level backtrace information -------------------------------------------
/usr/lib/libruby-1.9.1.so.1.9(+0x158379) [0x2b29b4b32379]
/usr/lib/libruby-1.9.1.so.1.9(+0x5a4d9) [0x2b29b4a344d9]
/usr/lib/libruby-1.9.1.so.1.9(rb_bug+0xb3) [0x2b29b4a34cc3]
/usr/lib/libruby-1.9.1.so.1.9(+0xf922f) [0x2b29b4ad322f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030) [0x2b29b4dff030]
/lib/x86_64-linux-gnu/libc.so.6(+0x1120e6) [0x2b29b59e50e6]
/usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so(+0x98a5) [0x2b29b61748a5]
/usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so(+0x9887) [0x2b29b6174887]

-- Other runtime information -----------------------------------------------

* Loaded script: -e

* Loaded features:

    0 enumerator.so
    1 /usr/lib/ruby/1.9.1/x86_64-linux/enc/encdb.so
    2 /usr/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
    3 /usr/lib/ruby/1.9.1/rubygems/defaults.rb
    4 /usr/lib/ruby/1.9.1/x86_64-linux/rbconfig.rb
    5 /usr/lib/ruby/1.9.1/rubygems/deprecate.rb
    6 /usr/lib/ruby/1.9.1/rubygems/exceptions.rb
    7 /usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb
    8 /usr/lib/ruby/1.9.1/rubygems/custom_require.rb
    9 /usr/lib/ruby/1.9.1/rubygems.rb
   10 /usr/lib/ruby/vendor_ruby/ox/version.rb
   11 /usr/lib/ruby/vendor_ruby/ox/error.rb
   12 /usr/lib/ruby/vendor_ruby/ox/hasattrs.rb
   13 /usr/lib/ruby/vendor_ruby/ox/node.rb
   14 /usr/lib/ruby/vendor_ruby/ox/comment.rb
   15 /usr/lib/ruby/vendor_ruby/ox/instruct.rb
   16 /usr/lib/ruby/vendor_ruby/ox/cdata.rb
   17 /usr/lib/ruby/vendor_ruby/ox/doctype.rb
   18 /usr/lib/ruby/vendor_ruby/ox/element.rb
   19 /usr/lib/ruby/vendor_ruby/ox/document.rb
   20 /usr/lib/ruby/vendor_ruby/ox/bag.rb
   21 /usr/lib/ruby/vendor_ruby/ox/sax.rb
   22 /usr/lib/ruby/1.9.1/x86_64-linux/date_core.so
   23 /usr/lib/ruby/1.9.1/date/format.rb
   24 /usr/lib/ruby/1.9.1/date.rb
   25 /usr/lib/ruby/1.9.1/time.rb
   26 /usr/lib/ruby/1.9.1/x86_64-linux/stringio.so
   27 /usr/lib/ruby/vendor_ruby/1.9.1/x86_64-linux/ox.so
   28 /usr/lib/ruby/vendor_ruby/ox.rb
   29 /usr/lib/ruby/1.9.1/x86_64-linux/enc/iso_8859_1.so

The XML file broken.xml contains the following XML:

<?xml version="1.0" encoding="Windows-1252" standalone="yes" ?>
<AVXML>
    <SIGNONMSGRS>
        <DTSERVER>2013-02-21T12:13:21</DTSERVER>
        <APPID>ACCOUNTVIEW</APPID>
        <APPVER>0901-</APPVER>
    </SIGNONMSGRS>
<ERRORS>
    <ERROR>
        <NUMBER>10000</NUMBER>
        <DATE>2013-02-21T12:13:21</DATE>
        <MESSAGE>Bericht mag maximaal 15.000.000 tekens bevatten. </MESSAGE>
    </ERROR>
</ERRORS>
</AVXML>

Other XML files that are produced by this application seem to be handled fine, so there is something specific going on here that I cannot see.

P.S. I am so sorry for filing 4 tickets in one day :)

Build failure when upgrading to ox-gem 1.5.5

I can't upgrade to the newest version of ox (1.5.5). Getting this build-error:

[...]
Installing ox (1.5.5) with native extensions Unfortunately, a fatal error has occurred. Please report this error to the Bundler issue tracker at https://github.com/carlhuda/bundler/issues so that we can fix it. Thanks!
/home/cjk/.rbenv/versions/1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/installer.rb:552:in `rescue in block in build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError)

    /home/cjk/.rbenv/versions/1.9.3-p0/bin/ruby extconf.rb 

Creating Makefile for ruby version 1.9.3 <<<<<
creating Makefile

make
compiling cache8.c
compiling obj_load.c
compiling dump.c
compiling parse.c
compiling ox.c
compiling cache8_test.c
compiling cache.c
compiling gen_load.c
compiling sax.c
compiling base64.c
compiling cache_test.c
linking shared-object ox.so

make install
/bin/install -c -m 0755 ox.so /home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5/lib
make: /bin/install: Command not found
make: *** [/home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5/lib/ox.so] Error 127

Gem files will remain installed in /home/cjk/proj/daimler/hotrails/vendor/ruby/1.9.1/gems/ox-1.5.5 for inspection.
[...]

[BUG] Bus Error

I'm running into a bug with this strange scenario. This is badly written ruby but I figure it shouldn't be throwing an error anyways. The problem appears to be with declaring the para variable twice. It seems to affect the dump() function as well.

I was previously having a memory leak related to this same problem, the system would thrash on memory over a period of just a few seconds.

require 'ox'
include Ox

def make_table
   para = Element.new('Paragraph')
   char = Element.new('Character')
   table = Element.new('Table')

   para = Element.new('Paragraph')
   table << para

   char << table
   para << char
   para
end

puts dump(make_table)

Here's the error it throws:

/Users/jvoigts1/Desktop/temp2.rb:17: [BUG] Bus Error
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin12.2.0]

-- Control frame information -----------------------------------------------
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC  :dump
c:0003 p:0063 s:0007 b:0006 l:000168 d:0022a8 EVAL   /Users/jvoigts1/Desktop/temp2.rb:17
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000168 d:000168 TOP   

-- Ruby level backtrace information ----------------------------------------
/Users/jvoigts1/Desktop/temp2.rb:17:in `<main>'
/Users/jvoigts1/Desktop/temp2.rb:17:in `dump'

-- C level backtrace information -------------------------------------------

   See Crash Report log file under ~/Library/Logs/CrashReporter or
   /Library/Logs/CrashReporter, for the more detail of.

-- Other runtime information -----------------------------------------------

* Loaded script: /Users/jvoigts1/Desktop/temp2.rb

* Loaded features:

    0 enumerator.so
    1 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/enc/encdb.bundle
    2 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/enc/trans/transdb.bundle
    3 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/rbconfig.rb
    4 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/compatibility.rb
    5 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb
    6 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/deprecate.rb
    7 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/errors.rb
    8 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/version.rb
    9 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb
   10 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb
   11 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb
   12 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb
   13 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_gem.rb
   14 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb
   15 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems.rb
   16 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/path_support.rb
   17 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb
   18 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/version.rb
   19 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/error.rb
   20 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/hasattrs.rb
   21 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/node.rb
   22 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/comment.rb
   23 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/instruct.rb
   24 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/cdata.rb
   25 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/doctype.rb
   26 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/element.rb
   27 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/document.rb
   28 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/bag.rb
   29 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox/sax.rb
   30 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/date_core.bundle
   31 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/date/format.rb
   32 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/date.rb
   33 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/time.rb
   34 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/1.9.1/x86_64-darwin12.2.0/stringio.bundle
   35 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/ext/ox/ox.bundle
   36 /Users/jvoigts1/.rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/ox-2.0.4/lib/ox.rb

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Abort trap: 6

Need option to *not* symbolize attributes

Ruby has no way to garbage-collect unused symbols. That means that if input is provided (from a user, say) that has a huge number of attributes, it's a big memory leak.

For JSON parsers, there is usually an option to have the keys be strings (which can be GC'd) instead.

Ox should have that, or some other way to prevent the memory leak on suspect input.

Adding [] to the new api

I like the new API. I played with it a bit and looks good! I like it.

In the other issue mentioned that you could do doc.foo[] . I think that would make a lot of sense. I had some elements that all had the same name, at the same level, and I was a little surprised when it only returned the first one. If it could return the list of elements that match, that would be even cooler.

Thanks for the rad library!

Bad encoding for Unicode characters

The following (turkish) text....

"G 270 CDI Aç1\u0001k araç"

... is converted by Ox into the following text inside an XML-tag:

"G 270 CDI Aç1&�k araç"

Which the xml-parser signals as bad encoding.

Using v1.5.4 of the ox-gem.

Ox::Sax ignores encoding declared in XML declaration

Hi!

I think SAX parser should use encoding from XML declaration - it would be useful for parsing user-generated files with unknown encoding.

Here is the failing test for test/sax_test.rb:

  def test_sax_non_utf8_encoding
    if RUBY_VERSION.start_with?('1.8')
      assert(true)
    else
      xml = %{<?xml version="1.0" encoding="Windows-1251"?>
<top>тест</top>
}

      handler = AllSax.new()
      input = StringIO.new(xml)
      Ox.sax_parse(handler, input)

      content = handler.calls.assoc(:text)[1]
      assert_equal('Windows-1251', content.encoding.to_s)
      assert_equal('тест', content.encode('UTF-8'))
    end
  end

What do you think about it?

Problems locating nodes when dashes are in their names

I had a lot of trouble with XML documents (returned by a 3rd party) that had node names with dashes in them.

Here's a sample IRB session that demonstrates the bug:

irb(main):001:0> require 'ox'
true
irb(main):002:0> xml = <<-EOS
irb(main):003:0" <?xml version="1.0"?>
irb(main):004:0" <xml-response>
irb(main):005:0"   <nodashesnode>hihi</nodashesnode>
irb(main):006:0"   <clear-tradeline>
irb(main):007:0"     <supplier-tradeline>
irb(main):008:0"       <clear-tradeline-reason-code-description>hi</clear-tradeline-reason-code-description>
irb(main):009:0"       <some-dashed-node></some-dashed-node>
irb(main):010:0"       <nodashesnode></nodashesnode>
irb(main):011:0"     </supplier-tradeline>
irb(main):012:0"   </clear-tradeline>
irb(main):013:0" </xml-response>
irb(main):014:0" EOS
"<?xml version=\"1.0\"?>\n<xml-response>\n  <nodashesnode>hihi</nodashesnode>\n  <clear-tradeline>\n    <supplier-tradeline>\n      <clear-tradeline-reason-code-description>hi</clear-tradeline-reason-code-description>\n      <some-dashed-node></some-dashed-node>\n      <nodashesnode></nodashesnode>\n    </supplier-tradeline>\n  </clear-tradeline>\n</xml-response>\n"
irb(main):015:0> ox_doc = Ox.parse xml
#<Ox::Document:0x101e2fd68
attr_reader :attributes = {
    :version => "1.0"
},
attr_reader :nodes = [
    [0] #<Ox::Element:0x101e2fca0
        attr_accessor :value = "xml-response",
        attr_reader :nodes = [
            [0] #<Ox::Element:0x101e2fc28
                attr_accessor :value = "nodashesnode",
                attr_reader :nodes = [
                    [0] "hihi"
                ]
            >,
            [1] #<Ox::Element:0x101e2fb88
                attr_accessor :value = "clear-tradeline",
                attr_reader :nodes = [
                    [0] #<Ox::Element:0x101e2fb10
                        attr_accessor :value = "supplier-tradeline",
                        attr_reader :nodes = [
                            [0] #<Ox::Element:0x101e2fa98
                                attr_accessor :value = "clear-tradeline-reason-code-description",
                                attr_reader :nodes = [
                                    [0] "hi"
                                ]
                            >,
                            [1] #<Ox::Element:0x101e2f9f8
                                attr_accessor :value = "some-dashed-node",
                                attr_reader :nodes = []
                            >,
                            [2] #<Ox::Element:0x101e2f980
                                attr_accessor :value = "nodashesnode",
                                attr_reader :nodes = []
                            >
                        ]
                    >
                ]
            >
        ]
    >
]
>
irb(main):016:0> ox_doc.locate 'clear-tradeline-reason-code-description'
[]
irb(main):017:0> ox_doc.locate 'nodashesnode'
[]
irb(main):018:0> ox_doc.locate 'xml-response'
[
[0] #<Ox::Element:0x101e2fca0
    attr_accessor :value = "xml-response",
    attr_reader :nodes = [
        [0] #<Ox::Element:0x101e2fc28
            attr_accessor :value = "nodashesnode",
            attr_reader :nodes = [
                [0] "hihi"
            ]
        >,
        [1] #<Ox::Element:0x101e2fb88
            attr_accessor :value = "clear-tradeline",
            attr_reader :nodes = [
                [0] #<Ox::Element:0x101e2fb10
                    attr_accessor :value = "supplier-tradeline",
                    attr_reader :nodes = [
                        [0] #<Ox::Element:0x101e2fa98
                            attr_accessor :value = "clear-tradeline-reason-code-description",
                            attr_reader :nodes = [
                                [0] "hi"
                            ]
                        >,
                        [1] #<Ox::Element:0x101e2f9f8
                            attr_accessor :value = "some-dashed-node",
                            attr_reader :nodes = []
                        >,
                        [2] #<Ox::Element:0x101e2f980
                            attr_accessor :value = "nodashesnode",
                            attr_reader :nodes = []
                        >
                    ]
                >
            ]
        >
    ]
>
]

Installation under macruby

Not critical but just noting that Ox fails to install as a macruby gem, for example:

$ macgem install ox
Fetching: ox-1.9.4.gem (100%)
Building native extensions.  This could take a while...
ERROR:  Error installing ox:
    ERROR: Failed to build gem native extension.

        /Library/Frameworks/MacRuby.framework/Versions/0.12/usr/bin/macruby extconf.rb
>>>>> Creating Makefile for MacRuby version 1.9.2 on universal-darwin10.0 <<<<<
creating Makefile

make
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o base64.o -c base64.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache.o -c cache.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache8.o -c cache8.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache8_test.o -c cache8_test.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o cache_test.o -c cache_test.c
/usr/bin/gcc -I. -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/universal-darwin10.0 -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2/ruby/backward -I/Library/Frameworks/MacRuby.framework/Versions/0.12/usr/include/ruby-1.9.2 -I.  -DRUBY_TYPE=MacRuby -DMACRUBY_RUBY -DRUBY_VERSION=1.9.2 -DRUBY_VERSION_MAJOR=1 -DRUBY_VERSION_MINOR=9 -DRUBY_VERSION_MICRO=2 -DHAS_RB_TIME_TIMESPEC=0 -DHAS_TM_GMTOFF=0 -DHAS_ENCODING_SUPPORT=0 -DHAS_PRIVATE_ENCODING=0 -DHAS_NANO_TIME=0 -DHAS_RSTRUCT=0 -DHAS_IVAR_HELPERS=0 -DHAS_PROC_WITH_BLOCK=0 -DHAS_TOP_LEVEL_ST_H=0 -DNEEDS_UIO=1 -Wall -fno-common -arch x86_64 -fexceptions -fno-common -pipe -O3 -g -Wall -arch x86_64 -o dump.o -c dump.c
In file included from dump.c:39:
ox.h:190: error: expected specifier-qualifier-list before ‘rb_encoding’
dump.c: In function ‘dump_obj’:
dump.c:598: warning: initialization discards qualifiers from pointer target type
dump.c:838: warning: initialization discards qualifiers from pointer target type
dump.c: In function ‘dump_gen_nodes’:
dump.c:1108: warning: initialization discards qualifiers from pointer target type
make: *** [dump.o] Error 1


Gem files will remain installed in /Library/Ruby/Gems/MacRuby/0.12/gems/ox-1.9.4 for inspection.
Results logged to /Library/Ruby/Gems/MacRuby/0.12/gems/ox-1.9.4/ext/ox/gem_make.out

Getting symbol binding failed

I am getting the following errors after installing Ox 1.5.5 on Mac OS X (ruby 1.9.2)

dyld: lazy symbol binding failed: Symbol not found: _stpncpy
Referenced from: /Users/tom/.rvm/gems/ruby-1.9.2-p290@multi_xml/gems/ox-1.5.5/ext/ox/ox.bundle
Expected in: flat namespace

dyld: Symbol not found: _stpncpy
Referenced from: /Users/tom/.rvm/gems/ruby-1.9.2-p290@multi_xml/gems/ox-1.5.5/ext/ox/ox.bundle
Expected in: flat namespace

Not sure what is going on here, the gem installed without errors. Any ideas?

Preserving whitespace in whitespace only elements

I'm having some difficulties with Ox in that it isn't picking up whitespace in elements with nothing but whitespace in them.

For example, in the sample document below, the <element> with only whitespace is returning nil for its text value, when you would expect it to return a space character.

require 'ox'

xml = <<-END
<?xml?>
<root>
   <element>Hello, this is</element>
   <element> </element>
   <element>a sentence. </element>
</root>
END

doc = Ox.parse(xml)

p doc.root.element(0).text   #=> "Hello, this is"
p doc.root.element(1).text   #=> nil
p doc.root.element(2).text   #=> "a sentence. "

SAX parser failed test case.

We used ox parser, its performance is outstanding.
But we found there is a failed test case, just like below:

<?xml version="1.0"?><abcdefghijklmnop></abcdefghijklmnop>

As we guess, the reason maybe is that the code doesn't handle 16-letters length element name correctly.

You could have a try to test this. Waiting for your reply...

Sax parser's `start_element` limitation

Hi,

We are trying to implement an Ox parser for Nori and we've encountered a limitation due to the way Ox implements start_element in the Sax parser.

In Nokogiri, start_element has a second argument attrs that contains all the attributes for the given element. However, in Ox, attributes are parsed separately and individually in attr_value. This creates a problem as there is no place to perform aggregated actions for all the attributes. end_element wouldn't work as it's executed in the reversed order.

Any suggestions on a workaround?

Thanks!

SystemStackError on 1.9.3

I get this when calling Ox.parse(str) on a really small xml document with no more than 2 nested levels, the really annoying things is that I can only reproduce this on our test server it runs perfectly fine on my machine...

It runs perfectly on the same test server with ruby 2.0.0 unfortunately we are not entirely ready yet to deploy this application with this ruby version.

The only things I am sure about is that the error happens at the C level, the ruby error only shows the "Ox.parse" line as the last executed line, I tried patching the interpreter to increase fiber stack size since we use them but it changed nothing.

I also tried writing a minimal reproduction case but of course it works... The problem only occurs inside our application...

Do you have any idea about what could cause this or what I could try to solve this ?

Edit: Here is what my xml looks like:

<root>
<sub />
</root>

If I remove the internal sub element it works so it indeed looks like a stack overflow but I don't see how it can possibly overflow with this, the application is a simple rack server and the line before the crash there is only 15 lines in the caller array.

Method to access element content

It may just be that I missed it when I was browsing the code base, but it would be nice implement a convenience method on Ox::Element that enables easy access to the Element content.

For example, with XML like the following:

<?xml version="1.0"?>
<foo>bar</foo>

It would be fantastic if one could retrieve the content for foo without having to do something like:

element.nodes.first
#=> "bar"

Would there be any chance to implement something like the following:

element.content
#=> "bar"

Thoughts?

ox-1.8.9: sax-parser produces segfault

Hi Peter!

It seems that SAX-parser has some bugs since 1.8.7 version.

Here is the test script:

require 'ox'

class Sample < ::Ox::Sax
  def start_element(name); puts "start: #{name}";        end
  def end_element(name);   puts "end: #{name}";          end
  def attr(name, value);   puts "  #{name} => #{value}"; end
  def text(value);         puts "text #{value}";         end
end

handler = Sample.new()
Ox.sax_parse(handler, ARGF)

And that's the way you can reproduce the bug:

$ wget "http://www.benzocenter.ru/yam/market.xml"
$ ruby -Ilib test.rb < market.xml

With ox-1.8.7 I get:

test.rb:11:in `sax_parse': invalid format, element start and end names do not match at line 4221, column 10 (Ox::ParseError)
    from test.rb:11:in `<main>'

With ox-1.8.8 I get:

ruby 1.9.3p392 (2013-02-22 revision 39386) [i686-linux]

-- Control frame information -----------------------------------------------
c:0004 p:---- s:0012 b:0012 l:000011 d:000011 CFUNC  :sax_parse
c:0003 p:0076 s:0007 b:0007 l:001a04 d:002588 EVAL   test.rb:11
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001a04 d:001a04 TOP   

-- Ruby level backtrace information ----------------------------------------
test.rb:11:in `<main>'
test.rb:11:in `sax_parse'

-- C level backtrace information -------------------------------------------
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x1814da) [0xb765e4da] vm_dump.c:796
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x52ae3) [0xb752fae3] error.c:258
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(rb_bug+0x44) [0xb75307d4] error.c:277
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x11064c) [0xb75ed64c] signal.c:609
[0xb771740c]
/lib/i386-linux-gnu/libc.so.6(+0x13eafb) [0xb745bafb] time.c:198
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe053) [0xb6d9b053] sax.c:796
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0xe02d) [0xb6d9b02d] sax.c:793
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(ox_sax_parse+0x2f4) [0xb6d9c4c4] sax.c:257
/home/amikhailov/.rvm/gems/ruby-1.9.3-p392/gems/ox-1.8.8/ext/ox/ox.so(+0x9a53) [0xb6d96a53] ox.c:640
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x16c0d5) [0xb76490d5] vm_insnhelper.c:317
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x17ad97) [0xb7657d97] vm_insnhelper.c:404
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x1714ef) [0xb764e4ef] insns.def:1018
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x176b1c) [0xb7653b1c] vm.c:1236
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(rb_iseq_eval_main+0xb5) [0xb7659755] vm.c:1478
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(+0x567d4) [0xb75337d4] eval.c:204
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(ruby_exec_node+0x24) [0xb7534644] eval.c:251
/home/amikhailov/.rvm/rubies/ruby-1.9.3-p392/lib/libruby.so.1.9(ruby_run_node+0x36) [0xb7536616] eval.c:244
ruby() [0x8048658]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb73364d3] enumerator.c:162
ruby() [0x8048681]

With ox-1.8.9 I get:

test.rb:7: [BUG] Segmentation fault
ruby 1.9.3p392 (2013-02-22 revision 39386) [i686-linux]

-- Control frame information -----------------------------------------------
c:0006 p:0012 s:0021 b:0018 l:000017 d:000017 METHOD test.rb:7
c:0005 p:---- s:0014 b:0014 l:000013 d:000013 FINISH
c:0004 p:---- s:0012 b:0012 l:000011 d:000011 CFUNC  :sax_parse
c:0003 p:0076 s:0007 b:0007 l:001504 d:0016b0 EVAL   test.rb:11
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001504 d:001504 TOP   

-- Ruby level backtrace information ----------------------------------------
test.rb:11:in `<main>'
test.rb:11:in `sax_parse'
test.rb:7:in `text'

I'm on Ubuntu-12.04, ruby-1.9.3p392

Initializing an Element's @nodes attribute with an empty array

I'm frequently getting errors as listed below when trying to use locate with a newly added Element. For example:

require 'ox'
include Ox

doc = Document.new

elem = Element.new('Element')

doc.locate('Element')

# => 
/Users/josh/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/ox-1.8.0/lib/ox/element.rb:181:in `alocate': private method `select' called for nil:NilClass (NoMethodError)
   from /Users/josh/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/ox-1.8.0/lib/ox/element.rb:124:in `locate'
   from /Users/josh/Desktop/temp.rb:8:in `<main>'

If Elements were initialized with @nodes = [] instead of @nodes = nil this wouldn't be a problem, is there a reason they're initialized this way?

MultiXML spec failing with Ox 1.8.6

Specifically, this spec, which verifies that an invalid XML document raises an error, is failing: https://github.com/sferik/multi_xml/blob/20fe5f8cf5bff610035d40c63e14c59de4a1b562/spec/parser_shared_example.rb#L32-L40

I believe it was caused by f0a2dfe, since this was the only commit between 1.8.5 and 1.8.6 and I've verified that specs pass on 1.8.5.

IMHO, parsing invalid XML (e.g. <open></close>) should not raise a SyntaxError for the same reason articulated in ohler55/oj#39.

Premature tag closing in strange edge case involving two spaces after an opening tag

I'm trying to write a harvester for the arxiv.org OAI-PMH data. There's quite a bit of this, so it seemed sensible to use Ox for efficiency reasons. However, I soon noticed I was missing about 150k papers, and after investigating Ox (2.0.1) seems to be the culprit:

[2] pry(main)> require 'arxivsync'; parser=ArxivSync::Parser.new; Ox.sax_parse(parser, File.open("/home/mispy/arxiv/2013-05-25T12:45:48+10:00_436115|406001")); parser.models.count
=> 499
[3] pry(main)> require 'nokogiri'; Nokogiri(File.open("/home/mispy/arxiv/2013-05-25T12:45:48+10:00_436115|406001")).css('metadata').count
=> 1000

It seems the SAX parser will abruptly close the outer tag after certain metadata elements, discarding those remaining. The element in question looks like this:

<metadata>
 <arXiv xsi:schemaLocation='http://arxiv.org/OAI/arXiv/ http://arxiv.org/OAI/arXiv.xsd' xmlns='http://arxiv.org/OAI/arXiv/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
 <id>1302.1147</id><created>2013-02-05</created><authors><author><keyname>Lin</keyname><forenames>Chang-shou</forenames></author><author><keyname>Zhang</keyname><forenames>Lei</forenames></author></authors><title>On Liouville systems at critical parameters, Part 1: one bubble</title><categories>math.AP</categories><msc-class>35J60, 35J55</msc-class><license>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</license><abstract>  In this paper we consider bubbling solutions to the general Liouville system:
\label{abeq1} \Delta_g u_i^k+\sum_{j=1}^n a_{ij}\rho_j^k(\frac{h_j
e^{u_j^k}}{\int h_j e^{u_j^k}}-1)=0\quad\text{in}M, i=1,...,n (n\ge 2) where
$(M,g)$ is a Riemann surface, and $A=(a_{ij})_{n\times n}$ is a constant
non-negative matrix and $\rho_j^k\to \rho_j$ as $k\to \infty$. Among other
things we prove the following sharp estimates. The location of the blowup
point. The convergence rate of $\rho_j^k-\rho_j$, $j=1,..,n$. These results are
of fundamental importance for constructing bubbling solutions. It is
interesting to compare the difference between the general Liouville system and
the SU(3) Toda system on estimates (1) and (2).
</abstract></arXiv>
</metadata>

Bisect debugging led us to conclude that the effect is conditional on the inclusion of two spaces after the opening tag. This is necessary but not sufficient to reproduce the bug; other metadata elements defiantly flaunt their two spaces with no such disastrous repercussions.

The XML file in question can be found here, and a stripped-down version of the SAX parser which reproduces the bug follows:

class Parser < ::Ox::Sax
  attr_accessor :count

  def initialize
    @count = 0
  end

  def start_element(name)
    @count += 1 if name == :metadata
  end
end

Serialization issue on 32-bit Intel architectures

Hi Pete,

Thanks for your constant support with Ox. We have found a potential issue on 32-bit Intel machines. Here is the environment and test case:

Environment:

  • Linux vagrantup 2.6.32-24-generic-pae #39-Ubuntu SMP Wed Jul 28 07:39:26 UTC 2010 i686 GNU/Linux
  • ruby-1.9.2-p180

Test case:
$ irb -r ox
ruby-1.9.2-p180 :001 > Ox
=> Ox
ruby-1.9.2-p180 :002 > t = Time.now
=> 2011-10-03 16:10:20 +0900
ruby-1.9.2-p180 :003 > x = Ox.dump t
=> "1317625820.584365\n"
ruby-1.9.2-p180 :004 > Ox.parse_obj x
=> 1943-09-15 12:56:12 +0900

Martin & Eric

Processing instructions

Sorry, two issues in a row. Ox doesn't seem to support processing instructions. I believe they're supposed to be parsed as processing instruction nodes. I was attempting to parse an .icml file and was getting errors related to these. They are used in .icml files to denote special characters that aren't supported by xml.

Here's an example:

require 'ox'

xml = <<-END
<root>
   <element>Here some text with a <?PITarget PIContent?> processing instruction.</element>
</root>
END

p Ox.parse(xml)

#=> /Users/josh/Desktop/temp.rb:9:in `parse': invalid format, document not 
#=> terminated at line 2, column 47 [parse.c:583] (SyntaxError)
#=>     from /Users/josh/Desktop/temp.rb:9:in `<main>'

Trace line and column numbers in SAX-parser

Hi Peter!

I use ox to parse large user generated XML files containing a lots of domain logic. Sometimes these files are valid syntatically, but are not valid in terms of domain model. And I need to show verbose error messages that'll contain line and column number where the error occurs.

Is it possible to add some trace methods to Ox::Sax? Something like this (just a suggestion):

class Sax < ::Ox::Sax
  def start_element(name)
    if name != 'node'
      puts "Unknown element #{name} at line #{__line__}, column #{__column__}"
    end
  end
end

Error building the native extension (no tm_gmtoff struct member)

I'm running ruby 1.9.3p194 (2012-04-20) [i386-mingw32] from rubyinstaller.org with the DevKit on Windows 7 64 bit.

When I try to install the ox gem (gem install ox), the build of the native extension fails:

C:\Users\Thomas>gem install ox
Temporarily enhancing PATH to include DevKit...
Building native extensions. This could take a while...
ERROR: Error installing ox:
ERROR: Failed to build gem native extension.

    D:/Ruby/Ruby193/bin/ruby.exe extconf.rb

Creating Makefile for ruby version 1.9.3 <<<<<
creating Makefile

make
generating ox-i386-mingw32.def
compiling base64.c
compiling cache.c
cache.c: In function 'ox_cache_new':
cache.c:62:5: warning: implicit declaration of function 'bzero'
cache.c:62:5: warning: incompatible implicit declaration of built-in function 'b
zero'
compiling cache8.c
compiling cache8_test.c
cache8_test.c:35:5: warning: large integer implicitly truncated to unsigned type

cache8_test.c:37:5: warning: large integer implicitly truncated to unsigned type

compiling cache_test.c
compiling dump.c
dump.c: In function 'dump_time_xsd':
dump.c:507:15: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:509:26: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:510:25: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:512:26: error: 'struct tm' has no member named 'tm_gmtoff'
dump.c:513:25: error: 'struct tm' has no member named 'tm_gmtoff'
make: *** [dump.o] Error 1

Gem files will remain installed in D:/Ruby/Ruby193/lib/ruby/gems/1.9.1/gems/ox-1
.5.9 for inspection.
Results logged to D:/Ruby/Ruby193/lib/ruby/gems/1.9.1/gems/ox-1.5.9/ext/ox/gem_m
ake.out

Is there something wrong with my build environment? Other gems' extensions build fine.

Add node filtering methods to the API

This is just an idea/suggestion. I was thinking more about what I wrote in #40, and the following suggestion is a bit in conflict with the suggested API change.

What if Ox::Element has extra methods that provide filters on its node. There is already one filter method: #text. It would be nice to also have #elements, #comments, etc.
that just provides something like:

def elements
  nodes.select { |node| node.is_a? Ox::Element }
end

We could then have:

doc = Ox.load("<foo><!-- nice comment --><bar/>some text</foo>")
#=> #<Ox::Element:0x00000001975aa8 @value="foo", @nodes=[#<Ox::Comment:0x00000001975a30 @value="nice comment">, #<Ox::Element:0x00000001975828 @value="bar">, "some text"]>
doc.text
#=> "some text"
doc.elements
#=> [#<Ox::Element:0x00000001975828 @value="bar">]
doc.comments
#=> [<Ox::Comment:0x00000001975a30 @value="nice comment">]

etcetera.

Make CDATA more accessible

CDATA is usually just used as a way to escape text, but there is no real nice API to reach it in Ox. For example, compare <foo>bar</foo> and <quux><![CDATA[<garply>nice</garply>]]></quux> when trying to reach the text/literal data:

foo_xml.text
#=> "bar"
quux_xml.nodes.first.value
#=> "<garply>nice</garply>"

I am aware that <quux> could have had multiple CDATA nodes, but the same holds for <foo> containing mixed strings and elements, then #text also returns the first string node. Additionally, I should also have checked that quux_xml.nodes.first is even a CData node.

Ability to pass options in at Runtime

It would be fantastic to be able to pass options in when calling .parse instead of having to define them prior to parsing the document:

For example, including the following option passing methodology:

Ox.parse('<?xml version="1.0"?><foo>bar</foo>', {:symbolize_keys => false})

in addition to the current way options are passed to Ox:

Ox.default_options = Ox.default_options.merge(:symbolize_keys => false)

Thoughts on this? I'll provide code if this seems desirable.

Reduce verbosity of #inspect

Currently, Node#inspect is way too verbose. When playing/testing with Ox documents in IRB or using p to debug some program, Ox produces pages and pages of text.
This is mainly due to the expansion of @nodes, maybe #inspect could be changed to print something like:

doc = Ox.load('<foo bar="baz"><quux>meh</quux><garply/></foo>')
#=> #<Ox::Element:0x00000001bbcaa0  2 nodes, value: "foo", attributes: {:bar=>"baz"}>

IMO #inspect doesn't have to dump the entire object, as there already is Ox.dump and it is also easy enough to do doc.nodes.inspect if you really want to know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.