GithubHelp home page GithubHelp logo

dataops's Introduction

DataOps


The goal of this project is to provide an extensible Java/Groovy framework for reading and/or writing data, so that disparate data sources can be linked.

It uses convention over configuration (configuration being the fallback).

This library currently provides the ability to:

  • read Excel2013, CSV and FixedWidth files
  • write to any JDBC database connection (comes with a H2 database OOTB)

Distribution

There are no public JARs available yet, simply because I haven't got anywhere to host them. If there's interest, I'll gladly publish the library (to search.maven.org?).


Example Usage: Reading a CSV file into an inmemory database connection

test.csv

Name,Age
Nick,30
Bob,27
Jack,55

example.groovy

import org.dataops.writers.JDBCWriter

def minAge = 29
new JDBCWriter()
        .read('/test.csv')
        .eachRow("select * from data where age > $minAge"){
    println "$it.name is $it.age years old."
}

console output

Nick is 30 years old.
Jack is 55 years old.

console err

INFO: create table data (
    name varchar(2048) ,
    age decimal
)

Example Usage: Reading a CSV file into a custom database connection/schema

example.groovy (snippet)

new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
        .read('/test.txt', [schemaName: 'csv'])
        .eachRow("select * from csv.data where age > $minAge"){
        // do something
}

console err

INFO: create schema if not exists csv
INFO: create table csv.data (
    name varchar(2048) ,
    age decimal
)

Example Usage: Reading files from different locations

example.groovy (snippet)

def db = new JDBCWriter()
db.read('/test.csv')                // read from system
db.read('file:/test.csv')           // read from system (using url syntax)
db.read('classpath:///test.csv')    // read from classpath (using url syntax)
db.read('http://foo/bar/test.csv')  // read from http
// or you can just use a URL
db.read(new URL('http://foo/bar/test.csv'))

Example Usage: Using the DataOps.bat file (GroovySH)

Run bin\DataOps.bat

db = new JDBCWriter()
db.read '/test.csv'
db.rows 'select * from data'

Example Usage: Different JDBC Connection strings.

DataOps comes with H2, MySQL and PostgreSQL drivers OOTB. Below are some examples on how to connect to them:

new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
new JDBCWriter(Sql.newInstance('jdbc:h2:file:/test'))
new JDBCWriter(Sql.newInstance('jdbc:postgresql://localhost:5432/test?user=postgres&password=password'))
new JDBCWriter(Sql.newInstance([
       url: 'jdbc:h2:mem:foobar',
       user: 'sa',
       password: 'sa',
       driver: 'org.h2.Driver']))

TODO

  • DONE - Build executable jar, integrated with Groovy Shell.
  • Ability to configure tableName mappings in options
  • Ability to configure creation of schemas/tables
  • Document "How to write your own Reader/Writer" (extend AbsDataReader/AbsDataWriter and register it.)
  • Test reading files from URLs (e.g. http)
  • Extend reader formats (Read html tables from web sites? Xml? Xls<2013? Other JDBC databases? etc???)
  • Introduce batch mode (pipe data into a table from command line)

References

We make use of:


Bitdeli Badge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.