A forward-only XML parser that parses xml.
Executes callbacks indicating values at each node.
Handles:
- XML comments
- CDATA sections
- XML escapes:
"
'
<
>
&
Execute below line in terminal in the directory where the repo is cloned.
g++ sample_program.cpp -o sample_program && ./sample_program < "./data/sample_prog_input.txt"
The sample program takes in a sample XML and outputs ids of orders whose amounts exceed 100.
file_reader.cpp
: Handles fstream for given file. Currently a singleton.xml_parser.h
andxml_parser.cpp
: The XML parser. Initiates and continues XML parsing.node.h
andnode.cpp
: Contains the bulk of logic. Responsible for parsing XML and storing the results.sample_program.cpp
: The demo program.tests.cpp
: Used for testing.
sample_prog_input.txt
: The terminal input for sample program.test.xml
: Testing datasample_program_test.xml
: Data used for sample program.standard.xml
: Used by tests.cpp to test performance. Has 2 million lines of XML. The data can be found here: https://drive.google.com/open?id=19GZ6SbPpgEuFF8Fu84Hgx8W_yFmqq2hK
The callback syntax is:
std::function<void(std::string path, std::string name, std::shared_ptr<Node> node)>;
where path is the path to current tag, name is the name of current tag and node contains data of the tag.
Currently, node supports only two public member functions:
std::string& get_value();
std::string& get_attribute(std::string key);
This can be expanded to allow iteration over node.
Each node contains the tag's name, the path till that tag,tag attributes, tag value and pointer to child nodes.
The pointer to child nodes can be used to get the entire XML subtree.
Note that smart pointers are returned, so if callback does not save root level node, it will be deleted from memory.
std::string _name; // Node's name
std::string _path; // Current path
std::string _text_value; // Current node's value
std::unordered_map<std::string, std::string> _attributes; // Node Attributes
std::vector<std::shared_ptr<Node>> _child_nodes; // Pointer to child nodes
callback_type _callback = NULL;
To run test on standard.xml, execute:
g++ tests.cpp -o test && ./test > res.txt
Please note that some of the tests are mutually exclusive.
- Spaces at start of XML are ignored, which may be an issue if they are part of value.
- Does not handle beginning of non-nested tag on same line.
- Expects special XML tags like
<?xml ...?>
on its own line
- Improve space handling
- Send callbacks on separate thread
- Have
xml_parser
managefile_reader
instances and make the latter non-singleton - Automate tests via google_test