GithubHelp home page GithubHelp logo

gnp / xml-rs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kornelski/xml-rs

0.0 0.0 0.0 1.58 MB

An XML library in Rust

Home Page: https://lib.rs/xml-rs

License: MIT License

Rust 100.00%

xml-rs's Introduction

xml-rs, an XML library for Rust

CI crates.io docs

Documentation

xml-rs is an XML library for the Rust programming language. It supports reading and writing of XML documents in a streaming fashion (without DOM).

Features

  • XML spec conformance better than other pure-Rust libraries.

  • Easy to use API based on Iterators and regular Strings without tricky lifetimes.

  • Support for UTF-16, UTF-8, ISO-8859-1, and ASCII encodings.

  • Written entirely in the safe Rust subset. Designed to safely handle untrusted input.

The API is heavily inspired by Java Streaming API for XML (StAX). It contains a pull parser much like StAX event reader. It provides an iterator API, so you can leverage Rust's existing iterators library features.

It also provides a streaming document writer much like StAX event writer. This writer consumes its own set of events, but reader events can be converted to writer events easily, and so it is possible to write XML transformation chains in a pretty clean manner.

This parser is mostly full-featured, however, there are limitations:

  • Legacy code pages and non-Unicode encodings are not supported;
  • DTD validation is not supported (but entities defined in the internal subset are supported);
  • attribute value normalization is not performed, and end-of-line characters are not normalized either.

Other than that the parser tries to be mostly XML-1.1-compliant.

Writer is also mostly full-featured with the following limitations:

  • no support for encodings other than UTF-8,
  • no support for emitting <!DOCTYPE> declarations;
  • more validations of input are needed, for example, checking that namespace prefixes are bounded or comments are well-formed.

Building and using

xml-rs uses Cargo, so add it with cargo add xml or modify Cargo.toml:

[dependencies]
xml = "0.8.16"

The package exposes a single crate called xml.

Reading XML documents

xml::reader::EventReader requires a Read instance to read from. It can be a File wrapped in BufReader, or a Vec<u8>, or a &[u8] slice.

EventReader implements IntoIterator trait, so you can use it in a for loop directly:

use std::fs::File;
use std::io::BufReader;

use xml::reader::{EventReader, XmlEvent};

fn main() -> std::io::Result<()> {
    let file = File::open("file.xml")?;
    let file = BufReader::new(file); // Buffering is important for performance

    let parser = EventReader::new(file);
    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{:spaces$}+{name}", "", spaces = depth * 2);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{:spaces$}-{name}", "", spaces = depth * 2);
            }
            Err(e) => {
                eprintln!("Error: {e}");
                break;
            }
            // There's more: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html
            _ => {}
        }
    }

    Ok(())
}

Document parsing can end normally or with an error. Regardless of exact cause, the parsing process will be stopped, and the iterator will terminate normally.

You can also have finer control over when to pull the next event from the parser using its own next() method:

match parser.next() {
    ...
}

Upon the end of the document or an error, the parser will remember the last event and will always return it in the result of next() call afterwards. If iterator is used, then it will yield error or end-of-document event once and will produce None afterwards.

It is also possible to tweak parsing process a little using xml::reader::ParserConfig structure. See its documentation for more information and examples.

You can find a more extensive example of using EventReader in src/analyze.rs, which is a small program (BTW, it is built with cargo build and can be run after that) which shows various statistics about specified XML document. It can also be used to check for well-formedness of XML documents - if a document is not well-formed, this program will exit with an error.

Parsing untrusted inputs

The parser is written in safe Rust subset, so by Rust's guarantees the worst that it can do is to cause a panic. You can use ParserConfig to set limits on maximum lenghts of names, attributes, text, entities, etc. You should also set a maximum document size via io::Read's take(max) method.

Writing XML documents

xml-rs also provides a streaming writer much like StAX event writer. With it you can write an XML document to any Write implementor.

use std::io;
use xml::writer::{EmitterConfig, XmlEvent};

/// A simple demo syntax where "+foo" makes `<foo>`, "-foo" makes `</foo>`
fn make_event_from_line(line: &str) -> XmlEvent {
    let line = line.trim();
    if let Some(name) = line.strip_prefix("+") {
        XmlEvent::start_element(name).into()
    } else if line.starts_with("-") {
        XmlEvent::end_element().into()
    } else {
        XmlEvent::characters(line).into()
    }
}

fn main() -> io::Result<()> {
    let input = io::stdin();
    let output = io::stdout();
    let mut writer = EmitterConfig::new()
        .perform_indent(true)
        .create_writer(output);

    let mut line = String::new();
    loop {
        line.clear();
        let bytes_read = input.read_line(&mut line)?;
        if bytes_read == 0 {
            break; // EOF
        }

        let event = make_event_from_line(&line);
        if let Err(e) = writer.write(event) {
            panic!("Write error: {e}")
        }
    }
    Ok(())
}

The code example above also demonstrates how to create a writer out of its configuration. Similar thing also works with EventReader.

The library provides an XML event building DSL which helps to construct complex events, e.g. ones having namespace definitions. Some examples:

// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")

// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")

// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")

Of course, one can create XmlEvent enum variants directly instead of using the builder DSL. There are more examples in xml::writer::XmlEvent documentation.

The writer has multiple configuration options; see EmitterConfig documentation for more information.

Bug reports

Please report issues at: https://github.com/kornelski/xml-rs/issues.

xml-rs's People

Contributors

netvl avatar kornelski avatar tomaka avatar rgafiyatullin avatar gkoz avatar heroesgrave avatar brendanzab avatar frewsxcv avatar emberian avatar danigm avatar fabiensk avatar guillaumegomez avatar mbrubeck avatar mjdsys avatar maulingmonkey avatar simonsapin avatar nox avatar bfops avatar bryanburgers avatar dcampbell24 avatar jdalberg avatar lucidd avatar nathansizemore avatar sanxiyn avatar fschutt avatar rhn avatar sinkuu avatar tomjw64 avatar asheb avatar andrewrk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.