GithubHelp home page GithubHelp logo

xml_parser's Introduction

XML Parser

A forward-only XML parser that parses xml.
Executes callbacks indicating values at each node.

Handles:

  1. XML comments
  2. CDATA sections
  3. XML escapes:
    "    
    '
    <
    >
    &

Running demo

Execute below line in terminal in the directory where the repo is cloned.

    g++ sample_program.cpp -o sample_program && ./sample_program < "./data/sample_prog_input.txt"

The sample program takes in a sample XML and outputs ids of orders whose amounts exceed 100.

Files

  1. file_reader.cpp : Handles fstream for given file. Currently a singleton.
  2. xml_parser.h and xml_parser.cpp : The XML parser. Initiates and continues XML parsing.
  3. node.h and node.cpp : Contains the bulk of logic. Responsible for parsing XML and storing the results.
  4. sample_program.cpp : The demo program.
  5. tests.cpp : Used for testing.

Data

  1. sample_prog_input.txt : The terminal input for sample program.
  2. test.xml : Testing data
  3. sample_program_test.xml : Data used for sample program.
  4. standard.xml : Used by tests.cpp to test performance. Has 2 million lines of XML. The data can be found here: https://drive.google.com/open?id=19GZ6SbPpgEuFF8Fu84Hgx8W_yFmqq2hK

Details

Callback syntax

The callback syntax is:

    std::function<void(std::string path, std::string name, std::shared_ptr<Node> node)>;

where path is the path to current tag, name is the name of current tag and node contains data of the tag.

Node public functions

Currently, node supports only two public member functions:

    std::string& get_value();
    std::string& get_attribute(std::string key);

This can be expanded to allow iteration over node.

Node structure

Each node contains the tag's name, the path till that tag,tag attributes, tag value and pointer to child nodes.

The pointer to child nodes can be used to get the entire XML subtree.

Note that smart pointers are returned, so if callback does not save root level node, it will be deleted from memory.

    std::string _name; // Node's name
    std::string _path; // Current path
    std::string _text_value; // Current node's value
    std::unordered_map<std::string, std::string> _attributes; // Node Attributes
    std::vector<std::shared_ptr<Node>> _child_nodes; // Pointer to child nodes

    callback_type _callback = NULL;

Running tests

To run test on standard.xml, execute:

   g++ tests.cpp -o test && ./test > res.txt

Please note that some of the tests are mutually exclusive.

Current limitations

  • Spaces at start of XML are ignored, which may be an issue if they are part of value.
  • Does not handle beginning of non-nested tag on same line.
  • Expects special XML tags like <?xml ...?> on its own line

Future improvements

  • Improve space handling
  • Send callbacks on separate thread
  • Have xml_parser manage file_reader instances and make the latter non-singleton
  • Automate tests via google_test

xml_parser's People

Contributors

varun196 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.