GithubHelp home page GithubHelp logo

lrpinto / text-wrangler Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 2.7 MB

Mini-wrangler system to massage a csv file as specified by given DSL

ANTLR 3.91% Java 93.90% TSQL 2.19%
antlr4 java validation csv dsl parsing library contributions-welcome mini-wrangler csv-data

text-wrangler's Introduction

Text Wrangler

Text-wrangler app to transform CSV data as specified by bespoke ANTLR4 DSL.

Usage

  • Run mvn package to generate the ANTLR4 parser
  • Import the MySQL schema in main/java/resources/schema.sql
  • Update the MySQL connection details in main/java/resources/configuraiton.ini
  • Start on App.java for a demonstration on how to use the library

Transformations DSL

  • A sample DSL for transformations is provided in main/java/resources/transformations.dsl
  • The grammar and lexer describing the DSL rules and syntax can be founded in src\main\antlr4\org\luisa\miniwrangler

Regex grammar

  • Java Regex patterns are supported, which can be used to skip data
  • In addition, sugar patterns are supported through a mapped pattern util

CSV parser

  • ANTLR4 is also used for parsing CSV
  • A grammar and lexer descriving CSV parsing rules can be found in src\main\antlr4\org\luisa\miniwrangler

CSV Data

  • CSV data sample is in main/java/resources/orders.csv
  • No orders are created from this sample
  • Data values that do not match the provided pattern (if one) are skipped

Assumptions

  • CSV data has a header row with field names

Tests

  • App.java runs and saves orders for the given data and dsl samples.

Dependencies

  • MariaDB as JBDC driver
  • ANTLR4 for DSL and CSV processing and parsing support
  • JUnit for unit tests

Javadoc

  • Javadoc is under folder 'doc', containing additional usage, assumptions and implementation notes

Future Work

  • Refactor so as to abstract the type order (details given in the source code marked with // TODO comments), making it possible to easily reuse for other type of data
  • Create a utility to support database schema creation from the dsl or import schema from database (most likelly to bring a dependency such as QueryDSL in and extend it to fit the purpose)
  • Extend the DSL grammar/lexer to allow setting up formatters for representation of the target object in stdout
  • Extend the DSL grammar/lexer to allow a filter regex construct that allows a user to parse only the objects that match that filter
  • Extend to support other DB languages, for example, it would be quite interesting to support Redis
  • Extend to support Map<>Reduce processor in order to allow parallel working with BigData
  • Extend to support Reactive streams and Publisher/Subscribers
  • Add proper JUnit Test suite with rich set of examples
  • Provide proper technical documentation, such as class diagram and interaction diagrams

text-wrangler's People

Contributors

lrpinto avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.