GithubHelp home page GithubHelp logo

data-tools / big-data-types Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 2.0 3.18 MB

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

Home Page: https://data-tools.github.io/big-data-types/

License: Apache License 2.0

Scala 95.01% JavaScript 4.15% CSS 0.84%
bigquery database-types schemas spark bigquery-tables typeclass typeclass-derivation typesafe cassandra scala

big-data-types's Introduction

Big Data Types

CI Tests BQ IT Maven Central codecov Scala Steward badge

A type-safe library to transform Case Classes into Database schemas and to convert implemented types into other types

Documentation

Check the Documentation website to learn more about how to use this library

Available conversions:

From / To Scala Types BigQuery Spark Cassandra Circe (JSON)
Scala -
BigQuery -
Spark -
Cassandra -
Circe (JSON)

Versions for Scala Scala 2.12 ,Scala_2.13 and Scala 3.x are available in Maven

Quick Start

The library has different modules that can be imported separately

  • BigQuery
libraryDependencies += "io.github.data-tools" %% "big-data-types-bigquery" % "{version}"
  • Spark
libraryDependencies += "io.github.data-tools" %% "big-data-types-spark" % "{version}"
  • Cassandra
libraryDependencies += "io.github.data-tools" %% "big-data-types-cassandra" % "{version}"
  • Circe (JSON)
libraryDependencies += "io.github.data-tools" %% "big-data-types-circe" % "{version}"
  • Core
    • To get support for abstract SqlTypes, it is included in the others, so it is not needed if you are using one of the others
libraryDependencies += "io.github.data-tools" %% "big-data-types-core" % "{version}"

In order to transform one type into another, both modules have to be imported.

How it works

The library internally uses a generic ADT (SqlType) that can store any schema representation, and from there, it can be converted into any other. Transformations are done through 2 different type-classes.

Quick examples

Case Classes to other types

//Spark
val s: StructType = SparkSchemas.schema[MyCaseClass]
//BigQuery
val bq: List[Field] = SqlTypeToBigQuery[MyCaseClass].bigQueryFields // just the schema
BigQueryTable.createTable[MyCaseClass]("myDataset", "myTable") // Create a table in a BigQuery real environment
//Cassandra
val c: CreateTable = CassandraTables.table[MyCaseClass]

There are also extension methods that make easier the transformation between types when there are instances

//from Case Class instance
val foo: MyCaseClass = ???
foo.asBigQuery // List[Field]
foo.asSparkSchema // StructType
foo.asCassandra("TableName", "primaryKey") // CreateTable

Conversion between types works in the same way

// From Spark to others
val foo: StructType = myDataFrame.schema
foo.asBigQuery // List[Field]
foo.asCassandra("TableName", "primaryKey") // CreateTable

//From BigQuery to others
val foo: Schema = ???
foo.asSparkFields // List[StructField]
foo.asSparkSchema // StructType
foo.asCassandra("TableName", "primaryKey") // CreateTable

//From Cassandra to others
val foo: CreateTable = ???
foo.asSparkFields // List[StructField]
foo.asSparkSchema // StructType
foo.asBigQuery // List[Field]
foo.asBigQuery.schema // Schema

big-data-types's People

Contributors

dependabot[bot] avatar javiermonton avatar scala-steward avatar xavierrdrgz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

big-data-types's Issues

Rename extension methods to a better name

For each type, there are two type classes, both of them have the same method name, one without arguments, the other expects an instance of A.

For example, BigQuery has bigQueryFields and bigQueryFields(value: A), both return a BigQuery type from any other type. It makes sense as the usage of them is in the format of SqlTypeToBigQuery[Something].bigQueryFields or SqlInstanceToBigQuery[SomeType].bigQueryFields(instanceOfSomeType)

But there are also extension methods that allow to extract a BigQuery Schema directly from an instance, like myInstance.bigQueryFields and it's quite confusing. A better name would be myInstance.asBigQuery or in the case of Spark, myInstance.asSpark

This would cause a major version change as it will be a breaking change.

As example code:
BigQuery code12

We should change the same for Spark and ensure that all tests are running well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.