GithubHelp home page GithubHelp logo

wojciechmigda / tco-xmlprocessing Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 404 KB

Solution to the "Offset Well Analysis - XML Data Extraction CLI" TopCoder competition

Home Page: http://www.topcoder.com/challenge-details/30151920/?type=develop

CMake 21.81% C++ 78.19%
xml-processing offset-well-analysis topcoder-solutions

tco-xmlprocessing's Introduction

Welcome to the wonderful world of XML processing!

Have you ever been given a bunch of large cluttered XML files and told to do something with them?

We have a solution for you!

Our modern application will take care of your troubles and transform your XML files in a bat of an eye.

Building the application

Building of the application is handled by CMake. The process follows standard approach, but I will repeat the most important details below.

The code resides in the top-most folder, called solution. This is also where CMakeLists.txt resides.

Designated out-of-source folder is located below and is called .build. There is a small README.md file there with basic cmake invocation example. As customary with cmake it allows to build the software using different toolchains under different operating systems.

Even though core development of this program was done under Linux (64-bit Ubuntu 18.04) it should be possible to build the program also on Windows, using either clang, MinGW, or Visual Studio. Care, though, should be taken as to the version of the compiler as this program makes use of C++17 and std::filesystem features.

Specifically, I have used g++ 7.5.0 and cmake 3.10.2. Also, there is a dependency to boost.property_tree. Version used by me is 1.68. Other dependencied are already included in the source code bundle. They include:

  • spdlog (logging)
  • fmt (printing and string operations)
  • neither (error handling)
  • clipp (CLI parameters)
  • nlohmann::json (JSON handling)

Once in the .build folder first you need to generate build system files. On Linux these are native Makefiles. Simply execute

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-Wall -Werror -march=native" ..

then issue

make -j

and if you like you may install the program by issuing

make install

Make sure, though, that you have privileges to install into default install paths. It is always possible to provide custom install prefixes, should one want that.

Successful build will create program executable in the app subfolder of .build directory. The program is called main.

Usage

To see the program's synopsis simply execute main without arguments.

This is the example output:

SYNOPSIS
        ./main --input <XML data file to read from> --output-dir <Destination folder> [--root-name
               <Name for the root node=<ROOT>>] [--skip-root|--no-skip-root] [--json-indent <JSON
               indentation level=-1>] [--verbose] [--debug]

OPTIONS
        --input, -i Input XML file
        --output-dir, -o
                    Destination folder where output files are to be created

        --root-name Name for the root node to use, if enabled
        --skip-root|--no-skip-root
                    Include or do not include root node in created json files, default=true

        --json-indent
                    Indentation level for produced JSON, -1 to disable

        --verbose, -v
                    Enable verbose output, default=false

        --debug     Enable debug output, default=false

There are two mandatory parameters: input XML file (-i) and output folder path (-o).

Their meaning is self-explanatory, and also they were requested in the problem specification.

There are few extra parameters that I will bring my attention to.

  • verbose and debug control verbosity of the program output. debug is more verbose than `verbose.
  • json-indent controls formatting of the produced JSON output written into output files. With default value of -1 no indentation is used:
{"attributes":[{"name":"data_encoding","required":true}],"children":[{"name":"CD_ATTACHMENT_JOURNAL","required":true}]}

Specifying different non-negative indentation value will cause indentation to be applied. Value of 2 will give this:

{
  "attributes": [
    {
      "name": "project_id",
      "required": true
    },
    {
      "name": "project_target_id",
      "required": true
    },
    {
      "name": "site_id",
      "required": true
    }
  ],
  "children": []
}

  • skip-root/no-skip-root and root-name. Very specific details about these two options are provided in the source code. Basically, they control behaviour with respect to the top node which is produced by boost's XML parser. As such, root node has no name. In the code it is given an arbitrary name, which can be changed with root-name. This matters if non-default no-skip-root option is chosen. If skip-root is selected then name of the root node does not matter. But if the user selects no-skip-root then the root node will output into one of the resulting files. One thing to note: if you select custom name for the root node make sure that it is not the same as any of the nodes in the processed XML file.

Various comments

Resources consumption

I have not made any extra effort as to optimize memory usage. With the provided example XML file peak resident memory value indicated by time -v shows that main ran for 12 seconds and consumed 6.5 GB of RAM.

Code comments

There are three kind of them:

  • in-code comments, present where needed, as to for instance explain things that might not be immediately obvious,
  • public function interface comments, they are present in header files, because these are the files the public interface user looks at. They are not duplicted in translation units.
  • non-public function interface comments, these are present in translation units directly above functions they pertain to. They are intended for the developers.

tco-xmlprocessing's People

Contributors

wojciechmigda avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.