GithubHelp home page GithubHelp logo

sinelaw / xml-to-json Goto Github PK

View Code? Open in Web Editor NEW
94.0 12.0 16.0 424 KB

Fast & easy command line tool for converting XML files to JSON

Home Page: hackage.haskell.org/package/xml-to-json

License: MIT License

Haskell 88.80% JavaScript 4.53% Shell 6.68%

xml-to-json's Introduction

xml-to-json

Fast & easy library & command line tool for converting XML files to JSON.

Heads up! The project has been split into two projects. See xml-to-json-fast for a low-memory-usage version that has less features.

Contents

Overview

xml-to-json converts xml to json. It includes a Haskell library and a command-line tool.

xml-to-json ships with two different executables:

  1. xml-to-json-fast ("fast") uses a lot less memory, but you can't control the output. Can be used on XML files of any size.
  2. xml-to-json ("classic") provides some control over json output format, but uses a lot of memory. Suitable for smaller files.

"Fast" xml-to-json-fast

The so-called "fast" version (which uses a lot less memory) has been forked into a separate project, xml-to-json-fast.

"Classic" xml-to-json

The fully featured "classic" xml-to-json provides compact json output that's designed to be easy to store and process using JSON-based databases, such as mongoDB or CouchDB. In fact, the original motivation for xml-to-json was to store and query a large (~10GB) XML-based dataset, using an off-the-shelf scalable JSON database.

When using "classic" xml-to-json, the input XML must be valid.

Currently the xml-to-json processes XMLs according to lossy rules designed to produce sensibly minimal output. If you need to convert without losing information at all consider something like the XSLT offered by the jsonml project. Unlike jsonml, this tool - xml-to-json - produces json output similar (but not identical) to the xml2json-xslt project.

Implementation Notes

xml-to-json is implemented in Haskell.

As of this writing, xml-to-json uses hxt with the expat-based hxt-expat parser. The pure Haskell parsers for hxt all seem to have memory issues which hxt-expat doesn't.

Installation

The easy way: use Stack

  1. Get Haskell Stack
  2. Run: stack install xml-to-json

You may need to install libcurl. For that, follow your platform's instructions. On debian/ubuntu, you can use: sudo apt-get install libcurl4-openssl-dev.

Can't use Stack? Use cabal-install

Note for Windows users: Only local files, not URLs, are supported as command line arguments. This is because curl doesn't compile on my (windows + cygwin) machine out-of-the-box.

To install the release version: Since xml-to-json is implemented in Haskell, "all you need to do" is install the latest cabal-install (you can try the Haskell platform) for your system, and then run:

cabal update
cabal install xml-to-json

To install from source: Clone this repository locally, and then (assuming you have Haskell platform installed) run cabal install:

cd xml-to-json
cabal install

Usage

Basic usage

Just run the tool with the filename as a single argument, and direct the stdout to a file or a pipe:

xml-to-json myfile.xml > myfile.js

Classic xml-to-json: Advanced Usage

Use the --help option to see the full command line options.

Here's a (possibly outdated) snapshot of the --help output:

Usage: <program> [OPTION...] files...
  -h      --help                      Show this help
  -t TAG  --tag-name=TAG              Start conversion with nodes named TAG (ignoring all parent nodes)
  -s      --skip-roots                Ignore the selected nodes, and start converting from their children
                                      (can be combined with the 'start-tag' option to process only children of the matching nodes)
  -a      --as-array                  Output the resulting objects in a top-level JSON array
  -m      --multiline                 When using 'as-array' output, print each of top-level json object on a seperate line.
                                      (If not using 'as-array', this option will be on regardless, and output is always line-seperated.)
          --no-collapse-text=PATTERN  For elements with tag matching regex PATTERN only:
	  			      Don't collapse elements that only contain text into a simple string property.
                                      Instead, always emit '.value' properties for text nodes, even if an element contains only text.
                                      (Output 'schema' will be more stable.)
          --no-ignore-nulls           Don't ignore nulls (and do output them) in the top level of output objects

Example output

Input file:

<?xml version="1.0"?>
<!DOCTYPE Test>
<Tests>
  <Test Name="The First Test">
    <SomeText>Some simple text</SomeText>
    <SomeMoreText>More text</SomeMoreText>
    <Description Format="FooFormat">
Just a dummy
<!-- comment -->
Xml file.
    </Description>
  </Test>
  <Test Name="Second"/>
</Tests>

JSON output using default settings:

{"Tests":{"Test":[{"Name":"The First Test","SomeText":"Some simple text","Description":{"Format":"FooFormat","value":"Just a dummy\n\nXml file."}},{"Name":"Second"}]}}

Formatted for readability (not the actual output):

{
   "Tests":{
      "Test":[
         {
            "Name":"The First Test",
	    "SomeMoreText":"More text",
            "SomeText":"Some simple text",
            "Description":{
               "Format":"FooFormat",
               "value":"Just a dummy\n\nXml file."
            }
         },
         {
            "Name":"Second"
         }
      ]
   }
}

Note that currently xml-to-json does not retain the order of elements / attributes.

Using the various options you can control various aspects of the output such as:

  • At which top-level nodes the conversion starts to work (-t)
  • Whether to wrap the output in a top-level JSON array,
  • Whether or not to collapse simple string elements, such as the element in the example, into a simple string property (you can specify a regular expression pattern to decide which nodes to collapse)
  • And more...

Use the --help option to see a full list of options.

Performance of "classic" xml-to-json

The "classic" xml-to-json cannot operate on large files. However, it is fast when operating on multiple small files. For large XML files, the speed on a core-i5 machine is about 2MB of xml / sec, with a 100MB XML file resulting in a 56MB json output. It took about 10 minutes to process 1GB of xml data. The main performance limit is memory - only one single-threaded process was running since every single large file (tens of megabytes) consumes a lot of memory - about 50 times the size of the file.

A few simple tests have shown this to be at least twice as fast as jsonml's xlst-based converter (however, the outputs are not similar, as stated above).

Currently the program processes files serially. If run in parallel on many small XML files (<5MB) the performance becomes cpu-bound and processing may be much faster, depending on the architecture. A simple test showed performance of about 5MB / sec (on the same core-i5).

xml-to-json's People

Contributors

ayamshanov avatar peaker avatar sinelaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xml-to-json's Issues

install

Hey man,

Sorry for posting this in issues but wasn't sure how else to contact you.

I need to convert a ~2GB XML file to JSON to get it into mongodb and came across this project and it looks like exactly what I need.

I've followed the install steps but when I run xml-to-json from my command line I get a command not found error. Are there some other steps I need to take as I'm not too familiar with Hackage.

Cheers.

Tim

fails to build with aeson-2.0

xml-to-json               > /tmp/stack-0f056ee49d19dd3b/xml-to-json-2.0.1/src/Text/XML/JSON/XmlToJson.hs:114:41: error:
xml-to-json               >     • Couldn't match expected type ‘Aeson.Key’
xml-to-json               >                   with actual type ‘T.Text’
xml-to-json               >     • In the expression: packJSValueName a
xml-to-json               >       In the expression: (packJSValueName a, b)
xml-to-json               >       In the first argument of ‘Aeson.object’, namely
xml-to-json               >         ‘[(packJSValueName a, b)]’
xml-to-json               >     |
xml-to-json               > 114 | wrapRoot (Just (a, b)) = Aeson.object [(packJSValueName a, b)]
xml-to-json               >     |                                         ^^^^^^^^^^^^^^^^^
xml-to-json               > 
xml-to-json               > /tmp/stack-0f056ee49d19dd3b/xml-to-json-2.0.1/src/Text/XML/JSON/XmlToJson.hs:122:20: error:
xml-to-json               >     • Couldn't match type: HashMap.HashMap T.Text Aeson.Value
xml-to-json               >                      with: Data.Aeson.KeyMap.KeyMap Aeson.Value
xml-to-json               >       Expected: [(JSValueName, Aeson.Value)] -> Aeson.Object
xml-to-json               >         Actual: [(JSValueName, Aeson.Value)]
xml-to-json               >                 -> HashMap.HashMap T.Text Aeson.Value
xml-to-json               >     • In the second argument of ‘(.)’, namely
xml-to-json               >         ‘HashMap.fromList . (map . first) packJSValueName’
xml-to-json               >       In the first argument of ‘($)’, namely
xml-to-json               >         ‘Aeson.Object . HashMap.fromList . (map . first) packJSValueName’
xml-to-json               >       In the expression:
xml-to-json               >         Aeson.Object . HashMap.fromList . (map . first) packJSValueName
xml-to-json               >           $ M.toList m
xml-to-json               >     |
xml-to-json               > 122 |     Aeson.Object . HashMap.fromList . (map . first) packJSValueName $ M.toList m
xml-to-json               >     |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Not able to install

Hi,

I think this is amazing project and this is exactly what was looking for to convert large XML file into JSON. I have tried to install this and it throws me error as below.

I am new to Haskell so i have downloaded from the URL that you gave here and ran below commands but i see error related to dependency as below.

H:>cabal update
Downloading the latest package list from hackage.haskell.org
Note: there is a new version of cabal-install available.
To upgrade, run: cabal install cabal-install

H:>cabal install xml-to-json
Resolving dependencies...
cabal: Could not resolve dependencies:
trying: xml-to-json-1.0.0 (user goal)
next goal: base (dependency of xml-to-json-1.0.0)
rejecting: base-4.7.0.1/installed-b8b... (conflict: xml-to-json => base>=4.5.0
&& <4.7)
rejecting: base-4.7.0.1, 4.7.0.0, 4.6.0.1, 4.6.0.0, 4.5.1.0, 4.5.0.0, 4.4.1.0,
4.4.0.0, 4.3.1.0, 4.3.0.0, 4.2.0.2, 4.2.0.1, 4.2.0.0, 4.1.0.0, 4.0.0.0,
3.0.3.2, 3.0.3.1 (global constraint requires installed instance)
Dependency tree exhaustively searched.

I have tried below as well but getting same dependency error.

To install from source: Clone this repository locally, and then (assuming you have Haskell platform installed) run cabal install:

cd xml-to-json
cabal install

Please help. Thank you very much.

Not working

I have followed the instructions but when I went to run the command in the Terminal is says:
"-bash: xml-to-json: command not found".

Any help would be appreciated because this tools would be awesome!

Out of Memory

used to convert a 450M XML file to a JSON file(.js),
result: xml-to-json: out of memory

is there a config for memory used by the tool? if so, I can know and give a bigger number for it.
Now, seems it doesn't work for me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.