GithubHelp home page GithubHelp logo

feature request: sax push parser about ox HOT 11 CLOSED

ohler55 avatar ohler55 commented on June 18, 2024
feature request: sax push parser

from ox.

Comments (11)

ohler55 avatar ohler55 commented on June 18, 2024

Please explain what you mean by a sax push parser. I believe the current Ox SAX parser is what is sometimes referred to as a push parser.

from ox.

notezen avatar notezen commented on June 18, 2024

I mean that now we did not have http://nokogiri.org/Nokogiri/XML/SAX/PushParser.html analog

from ox.

ohler55 avatar ohler55 commented on June 18, 2024

It look like that takes is just a standard SAX parser. I don't see the difference between that and the Os::Sax parser other than the handler is created inline. Am I missing something?

from ox.

notezen avatar notezen commented on June 18, 2024

Main feature of push parser is that we push new incoming data to parser instead of situation when we let parser read data from IO stream directly. It is useful in event-processing code (eventmachine library for example).
Code example:
p = PushParser.new(sax_handler)
p.push '<chi'
p.push 'ld>some con'
p.push 'tent'

where sax_handler object with methods such as 'start_element','end_element', 'text' and so on.

from ox.

ohler55 avatar ohler55 commented on June 18, 2024

I understand. Basically it parses a stream until the stream closes.

from ox.

ohler55 avatar ohler55 commented on June 18, 2024

Look for a continuous SAX parser in some future release. Now that I understand what you were asking for it makes a lot of sense and would be ver useful.

from ox.

Zapotek avatar Zapotek commented on June 18, 2024

I just found myself in need of this feature as well.

from ox.

Zapotek avatar Zapotek commented on June 18, 2024

Not very efficient but does the job:

buffer, buffer_in = IO.pipe

handler = MySAXHandler.new

Thread.new do
    Ox.sax_xml( document, buffer, sax_options )
end

buffer_in << '<parent><chi'
buffer_in << 'ld>some con'
buffer_in << 'tent</child></parent>'
buffer_in.close

# Do stuff with handler

from ox.

ohler55 avatar ohler55 commented on June 18, 2024

I prefer that over cooking something into Ox that does the same thing as a stream or pipe. You might be surprised how efficient it is.

from ox.

Zapotek avatar Zapotek commented on June 18, 2024

Creating threads isn't efficient for this, for example in my case this can be called hundreds of times in series leaving me with that many short-lived threads which has substantial impact on resource utilization and performance.

Pushing directly to Ox seems like the cleaner way and push parsers are an established paradigm.

from ox.

ohler55 avatar ohler55 commented on June 18, 2024

Another approach would be to keep a longer running thread that services parse requests. This feature will take some time to get done. Right now the state of the parser is on the stack and not carried around in a ruby object. That would have to change for the parser to be reentrant. The solution may be to keep a thread in C to maintain the parser state but I have no idea what problems that would cause for Ruby.

from ox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.