GithubHelp home page GithubHelp logo

bibjson-to-csv's Introduction

BibJSON To CSV - Data Engineer - Programming Test

This is a data conversion tool used to parse BibJSON files and transform them into CSV file. I decided to use Python as it has a great set of standard libraries for handling strings, parsing arguments, json and csv files. The program is self contained in the main.py file.

Considerations

When writing the transformer, I initially just supported the common keys within each entry of the BibJSON file. After further examination of the BibJSON specification, I found that the default set of keys is from the BibTeX specification. The BibJSON specification is very loose and allows for missing values. As such I found it reasonable to enable the full specification via a feature flag. By default the program fills missing values with nothing, leaving an empty value in row. However I added a flag to allow for custom fillers to suite the needs of the user. To enable the most flexibility, the user should pass the --full-spec flag. A special note about the authors column is that each author is separated by a semi colon. This is to make it easily parsable and to make it easy to read if the user were to query for the authors. There are some limitations of this transformer. The nature of JSON allows for lots of nested objects. This fact alongside the loose specification of BibJSON makes it difficult to make a fully 1:1 transformer. Given this I did my best to try to allow for the full specification.

Requirements

Either:

  • Python
  • Docker

Usage

You are able to run the parser either directly from Python or you can build and run the Docker container.

Python

With default filler

python main.py -i data/xdd_sample.bibjson -o data/xdd_sample.csv

With custom filler

python main.py -i data/xdd_sample.bibjson -o data/xdd_sample.csv -f n/a

With the full BibTeX/BibJSON spec

python main.py -i data/xdd_sample.bibjson -o data/xdd_sample.csv --full-spec

Help

python main.py -h

Docker

Build the container

docker build -t bibjson-to-csv .

Run the container with default missing value filler

docker run \
-v $(pwd)/data:/usr/src/app/data \
bibjson-to-csv \
-i /usr/src/app/data/xdd_sample.bibjson \
-o /usr/src/app/data/xdd_sample.csv

Run the container with custom missing value filler

docker run \
-v $(pwd)/data:/usr/src/app/data \
bibjson-to-csv \
-i /usr/src/app/data/xdd_sample.bibjson \
-o /usr/src/app/data/xdd_sample.csv \
-f n/a

Run the container with the full BibTeX/BibJSON spec

docker run \
-v $(pwd)/data:/usr/src/app/data \
bibjson-to-csv \
-i /usr/src/app/data/xdd_sample.bibjson \
-o /usr/src/app/data/xdd_sample.csv \
--full-spec

Help

docker run \
-v $(pwd)/data:/usr/src/app/data \
bibjson-to-csv \
-h

bibjson-to-csv's People

Contributors

patrickbrophy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.