GithubHelp home page GithubHelp logo

julianpeeters / avrotuples Goto Github PK

View Code? Open in Web Editor NEW

This project forked from massie/avrotuples

0.0 2.0 0.0 121 KB

Avro Scala helper classes: AvroTuple[1-22]

License: Apache License 2.0

Scala 100.00%

avrotuples's Introduction

Avro Tuples

Build Status

The Scala library provides Tuple1 to Tuple22 that allow programmers to hold a fixed number of items together so they can be passed as a single object. While all the elements in an Array have the same type, a TupleN can have a mix of element types, e.g.

scala> val mytuple = ((2, "Be"), "Or", "Not", (2, "Be"))
mytuple: ((Int, String), String, String, (Int, String)) = ((2,Be),Or,Not,(2,Be))

scala> mytuple._1
res1: (Int, String) = (2,Be)

In this example, mytuple is a Tuple4 and has both Int and String elements.

The same code using Avro tuples, looks like...

scala> val mytuple = AvroTuple4(AvroTuple2(2, "Be"), "Or", "Not", AvroTuple2(2, "Be"))
mytuple: com.github.massie.avrotuples.AvroTuple4[com.github.massie.avrotuples.AvroTuple2[Int,String],String,String,com.github.massie.avrotuples.AvroTuple2[Int,String]] = ((2,Be),Or,Not,(2,Be))

scala> mytuple._1
res0: com.github.massie.avrotuples.AvroTuple2[Int,String] = (2,Be)

Using Avro Tuples with your project

Avro tuples is published to Maven Central.

In Maven, use

<dependency>
  <groupId>com.github.massie</groupId>
  <artifactId>avrotuples_**SCALA_VERSION**</artifactId>
  <version>**AVROTUPLES_VERSION**</version>
</dependency>

In sbt, add the line

libraryDependencies += "com.github.massie" %% "avrotuples" % "**AVROTUPLES_VERSION**"

Note, that for sbt you don't need to specify the Scala version since the line above uses %% which will automatically use the correct Scala version.

Avro Tuples are like Scala Tuples

  • Avro tuples can serve as a drop in replacement for Scala tuples
  • AvroTuple2 has a swap method just like Tuple2
  • All Avro tuples extend ProductN, e.g. AvroTuple1[T1] extends Product1[T1]
  • Avro tuples implement Externalizable making them Java serializable
  • Avro tuples can be nested

Avro Tuples have additional functionality over Scala tuples

Avro tuples implement SpecificRecord

This interface allows Avro to (de)serialize Avro tuples. An Avro serialize/deserialize round-trip looks like...

val tuple = AvroTuple2("This", AvroTuple4("That", "and", "the", "other"))
val outTuple = AvroTuple2.fromBytes(tuple.toBytes)
assert(tuple == outTuple)

Avro tuples implement KryoSerializable

If you pass Avro tuples to Kryo, the tuple will be (de)serialized in Avro format using the Avro tuple schema.

Avro tuples are mutable

You can update the values for an Avro tuple without needing to create a new tuple, e.g.

val tuple = AvroTuple2("One", 1L)
assert(tuple._1 == "One")
assert(tuple._2 == 1L)
tuple.update("Two", 2L)
assert(tuple._1 == "Two")
assert(tuple._2 == 2L)

Avro tuples have limitations (for now)

No syntactic sugar

Scala provides syntactic sugar that Avro tuples do not. In Scala, you don't need to write Tuple2("a", "b"), you can just use ("a", "b"). Avro tuple code is more verbose.

Limited number of types

For now, Avro tuples can be comprised of null values, strings, booleans, floats, doubles, ints, longs, and records (case classes that implement SpecificRecordBase). Support for more types is coming, e.g. Option.

Records are hard to use

To use a record dataype in an Avro tuple, their schemas must be loaded before any Avro tuple is used:

val userRecordSchemas = List(AvroRecordTestClass.SCHEMA$)
AvroTupleSchemas.addRecordSchemas(userRecordSchemas)

and deserialized records require an extra cast if one is to use their field values:

AvroTuple1.fromBytes(tuple.toBytes)._1.asInstanceOf[AvroRecordTestClass].x

Recursive schemas break Parquet

There is a known issue with Avro/Parquet and recursive schemas. Avro tuples use a recursive schema in order to support nesting.

License

Avro tuples is released under an Apache 2.0 license.

Pull requests are welcomed.

avrotuples's People

Contributors

julianpeeters avatar massie avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.