GithubHelp home page GithubHelp logo

avro's Introduction

Logo

Go Report Card Build Status Coverage Status Go Reference GitHub release GitHub license

A fast Go avro codec

Overview

Install with:

go get github.com/hamba/avro/v2

Note: This project has renamed the default branch from master to main. You will need to update your local environment.

Usage

type SimpleRecord struct {
	A int64  `avro:"a"`
	B string `avro:"b"`
}

schema, err := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.hamba.avro",
    "fields" : [
        {"name": "a", "type": "long"},
        {"name": "b", "type": "string"}
    ]
}`)
if err != nil {
	log.Fatal(err)
}

in := SimpleRecord{A: 27, B: "foo"}

data, err := avro.Marshal(schema, in)
if err != nil {
	log.Fatal(err)
}

fmt.Println(data)
// Outputs: [54 6 102 111 111]

out := SimpleRecord{}
err = avro.Unmarshal(schema, data, &out)
if err != nil {
	log.Fatal(err)
}

fmt.Println(out)
// Outputs: {27 foo}

More examples in the godoc.

Types Conversions

Avro Go Struct Go Interface
null nil nil
boolean bool bool
bytes []byte []byte
float float32 float32
double float64 float64
long int64, uint32* int64, uint32
int int, int32, int16, int8, uint8*, uint16* int, uint8, uint16
fixed uint64 uint64
string string string
array []T []any
enum string string
fixed [n]byte [n]byte
map map[string]T{} map[string]any
record struct map[string]any
union see below see below
int.date time.Time time.Time
int.time-millis time.Duration time.Duration
long.time-micros time.Duration time.Duration
long.timestamp-millis time.Time time.Time
long.timestamp-micros time.Time time.Time
long.local-timestamp-millis time.Time time.Time
long.local-timestamp-micros time.Time time.Time
bytes.decimal *big.Rat *big.Rat
fixed.decimal *big.Rat *big.Rat
string.uuid string string

* Please note that when the Go type is an unsigned integer care must be taken to ensure that information is not lost when converting between the Avro type and Go type. For example, storing a negative number in Avro of int = -100 would be interpreted as uint16 = 65,436 in Go. Another example would be storing numbers in Avro int = 256 that are larger than the Go type uint8 = 0.

Unions

The following union types are accepted: map[string]any, *T and any.

  • map[string]any: If the union value is nil, a nil map will be en/decoded. When a non-nil union value is encountered, a single key is en/decoded. The key is the avro type name, or scheam full name in the case of a named schema (enum, fixed or record).
  • *T: This is allowed in a "nullable" union. A nullable union is defined as a two schema union, with one of the types being null (ie. ["null", "string"] or ["string", "null"]), in this case a *T is allowed, with T matching the conversion table above. In the case of a slice, the slice can be used directly.
  • any: An interface can be provided and the type or name resolved. Primitive types are pre-registered, but named types, maps and slices will need to be registered with the Register function. In the case of arrays and maps the enclosed schema type or name is postfix to the type with a : separator, e.g "map:string". Behavior when a type cannot be resolved will depend on your chosen configuation options:
    • !Config.UnionResolutionError && !Config.PartialUnionTypeResolution: the map type above is used
    • Config.UnionResolutionError && !Config.PartialUnionTypeResolution: an error is returned
    • !Config.UnionResolutionError && Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will fallback to the map type above.
    • Config.UnionResolutionError && !Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will return an error.
TextMarshaler and TextUnmarshaler

The interfaces TextMarshaler and TextUnmarshaler are supported for a string schema type. The object will be tested first for implementation of these interfaces, in the case of a string schema, before trying regular encoding and decoding.

Enums may also implement TextMarshaler and TextUnmarshaler, and must resolve to valid symbols in the given enum schema.

Identical Underlying Types

One type can be ConvertibleTo another type if they have identical underlying types. A non-native type is allowed be used if it can be convertible to time.Time, big.Rat or avro.LogicalDuration for the particular of LogicalTypes.

Ex.: type Timestamp time.Time

Untrusted Input With Bytes and Strings

For security reasons, the configuration Config.MaxByteSliceSize restricts the maximum size of bytes and string types created by the Reader. The default maximum size is 1MiB and is configurable. This is required to stop untrusted input from consuming all memory and crashing the application. Should this not be need, setting a negative number will disable the behaviour.

Recursive Structs

At this moment recursive structs are not supported. It is planned for the future.

Benchmark

Benchmark source code can be found at: https://github.com/nrwiersma/avro-benchmarks

BenchmarkGoAvroDecode-8      	  788455	      1505 ns/op	     418 B/op	      27 allocs/op
BenchmarkGoAvroEncode-8      	  624343	      1908 ns/op	     806 B/op	      63 allocs/op
BenchmarkGoGenAvroDecode-8   	 1360375	       876.4 ns/op	     320 B/op	      11 allocs/op
BenchmarkGoGenAvroEncode-8   	 2801583	       425.9 ns/op	     240 B/op	       3 allocs/op
BenchmarkHambaDecode-8       	 5046832	       238.7 ns/op	      47 B/op	       0 allocs/op
BenchmarkHambaEncode-8       	 6017635	       196.2 ns/op	     112 B/op	       1 allocs/op
BenchmarkLinkedinDecode-8    	 1000000	      1003 ns/op	    1688 B/op	      35 allocs/op
BenchmarkLinkedinEncode-8    	 3170553	       381.5 ns/op	     248 B/op	       5 allocs/op

Always benchmark with your own workload. The result depends heavily on the data input.

Go structs generation

Go structs can be generated for you from the schema. The types generated follow the same logic in types conversions

Install the struct generator with:

go install github.com/hamba/avro/v2/cmd/avrogen@<version>

Example usage assuming there's a valid schema in in.avsc:

avrogen -pkg avro -o bla.go -tags json:snake,yaml:upper-camel in.avsc

Tip: Omit -o FILE to dump the generated Go structs to stdout instead of a file.

Check the options and usage with -h:

avrogen -h

Or use it as a lib in internal commands, it's the gen package

Avro schema validation

A small Avro schema validation command-line utility is also available. This simple tool leverages the schema parsing functionality of the library, showing validation errors or optionally dumping parsed schemas to the console. It can be used in CI/CD pipelines to validate schema changes in a repository.

Install the Avro schema validator with:

go install github.com/hamba/avro/v2/cmd/avrosv@<version>

Example usage assuming there's a valid schema in in.avsc (exit status code is 0):

avrosv in.avsc

An invalid schema will result in a diagnostic output and a non-zero exit status code:

avrosv bad-default-schema.avsc; echo $?
Error: avro: invalid default for field someString. <nil> not a string
2

Schemas referencing other schemas can also be validated by providing all of them (schemas are parsed in order):

avrosv base-schema.avsc schema-withref.avsc

Check the options and usage with -h:

avrosv -h

Go Version Support

This library supports the last two versions of Go. While the minimum Go version is not guarantee to increase along side Go, it may jump from time to time to support additional features. This will be not be considered a breaking change.

avro's People

Contributors

akihiro17 avatar brianshih1 avatar brunsgaard avatar daboyuka avatar dependabot-preview[bot] avatar dependabot[bot] avatar founderio avatar getumen avatar hhromic avatar jacobmarble avatar jags9415 avatar jmacvey avatar joliver avatar khalid-nowaf avatar mcgrawia avatar mdpaquin avatar meandnano avatar mibanescu avatar n-oden avatar nrwiersma avatar nucccc avatar pedromss avatar redalaanait avatar rockwotj avatar slipros avatar stampy88 avatar torwig avatar tuskan avatar weirdgiraffe avatar zeroshade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

avro's Issues

Consider extensions point to hook into ser/de

Description of the problem

It could be practical to let library users hook into the serialization process. For example with GDPR regulation, field encryption can be an interesting add on :

{"type": "string", "tags": ["pii"]}

With such schema as above, one could implement a function that encrypts the string before its serialized by Avro.

Do you think this could be a good addition or is there already an existing way to achieve it ?

Embedded structs not handled

Hello hamba and all the commiters,

First thanks a lot for this lib, it's really great and simple to use!

It seems that the support of embedded structs is missing. It would be great to have it in order to have the same behaviour than the standard libraries like the json Unmarshaler.

This is an exemple with some json (https://play.golang.org/p/9D_B3lXtJ0D):

package main

import (
	"encoding/json"
	"fmt"
)

type commonListResponse struct {
	PageSize int `json:"page_size"`
}

type Account struct {
	Name string `json:"name"`
}

type AccountList struct {
	commonListResponse
}

const jsonBlob = `{ "page_size": 50 }`

func main() {
	alb := AccountList{}
	json.Unmarshal([]byte(jsonBlob), &alb)
	fmt.Printf("With Embedded field: %#v\n", alb)
}

Nested logical types not working

This may be similar to #42 as it's another issue with nested types. But this time parsing succeeds, but trying to marshal fails.

package main

import (
	"fmt"
	"time"

	"github.com/hamba/avro"
)

type MyType struct {
	Timestamp time.Time
}

var schema = `
{"name":"MyType", "type":"record", "fields": [
	{"name":"Timestamp", "type":"long", "logicalType": "timestamp-micros"}
]}
`

func main() {
	s := avro.MustParse(schema)
	_, err := avro.Marshal(s, MyType{})
	fmt.Printf("Error: %s\n", err)
}
Error: Timestamp: avro: time.Time is unsupported for Avro long

question of thread safety ?

Hi,

I've come across this repo, not much experienced with avro. I liked the simple approach this project uses for parsing compared to other alternatives. I want to use in production with millions of requests per minute for encoding, however I want to make sure I've got details right.

My question is that; is the parsed schema thread safe ? Can I parse it once at the start of server for per struct type and use it concurrently ?

avro: unknown type: long.timestamp-micros

Sorry for another issue, I am using an encoder to Encode map[string]interface, but the schema seems unable to parse long.timestamp-micros, I have included my schema below, including the map i am trying to parse when this fails. I can also confirm that the TimestampTest is a time.Time object as mentioned in your mappings.
Schema:

{
  type: "record",
  name: "TestTable",
  namespace: "TestTable.avsc",
  fields: [
    {
      name: "ExampleNest_DoubleNest_DOUBLENESTVAL",
      type: [
        "boolean",
        "null"
      ]
    },
    {
      name: "ExampleNest_NestedKey",
      type: [
        "int",
        "null"
      ]
    },
    {
      name: "TestData",
      type: [
        "string",
        "null"
      ]
    },
    {
      name: "NewValue",
      type: [
        "string",
        "null"
      ]
    },
    {
      name: "TimestampTest",
      type: [
        "long.timestamp-micros",
        "null"
      ]
    }
  ]
}

Record:

map[ExampleNest_DoubleNest_DOUBLENESTVAL:true, ExampleNest_NestedKey:<nil>, NewValue:<nil>, TestData:A, TimestampTest:2021-01-02 15:04:05 +0000 UTC]

ocf: append serialized avro

Hi!

Our usecase is as follows: We are streaming Avro messages from multiple Kafka topics produced on Confluent Cloud and want these imported to BigQuery for analytics. From what I gather from the BigQuery docs the Avro files must be in OCF format (schema included).

We already use registry from this project to look up the schema from the messages and it would be nice if we could use ocf to produce the container files. We would like to avoid having to convert the raw Avro bytes to interface{} before we append them to the OCF, though. Could this be a possible addition to the API surface of ocf?

Example for slices

I'm having a difficult time understanding how to unmarshal into an array of a typed struct. The readme mentions that you have to register the types, but I don't see where that is happening in the tests. https://github.com/hamba/avro/blob/master/decoder_array_test.go#L39-L51

When I naively register my type, I get an error for avro: decode union type: unknown union type. I've copied the example from the readme below and turned some of it into slices. Can you help point me in the right direction? I think it would be very useful to add to the readme too.

type SimpleRecord struct {
	A int64  `avro:"a"`
	B string `avro:"b"`
}

schema, err := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.hamba.avro",
    "fields" : [
        {"name": "a", "type": "long"},
        {"name": "b", "type": "string"}
    ]
}`)
if err != nil {
	log.Fatal(err)
}

avro.Register("mytype:", SimpleRecord{})
avro.Register("mytype:", &SimpleRecord{})

in := []SimpleRecord{{A: 27, B: "foo"}}

data, err := avro.Marshal(schema, in)
if err != nil {
	log.Fatal(err)
}

fmt.Println(data)
// Outputs: [54 6 102 111 111]

out := []SimpleRecord{}
err = avro.Unmarshal(schema, data, &out)
if err != nil {
	log.Fatal(err)
}

fmt.Println(out)
// Outputs: {27 foo}

Support for Generic Data

Hi, I am trying to serialize json string.

schema := `{
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "default": null,
      "name": "name",
      "type": [
        "null",
        "string"
      ]
    }
  ],
  "name": "example",
  "namespace": "namespace",
  "type": "record"
}`

json_data := "{\"id\": 10, \"name\": \"test\"}"
var result map[string]interface{}
json.Unmarshal([]byte(json_data), &result)

a := avro.DefaultConfig
avroSchema, _ := avro.Parse(schema)
buf, err := a.Marshal(avroSchema, result)
if err != nil {
	fmt.Println(err.Error())
}

I am getting the following error avro: float64 is unsupported for Avro int. Go automatically determines the types when creating JSON objects and those types are not compatible. Is there any way to create json object that is compatible with Avro?

Enum in map[string]interface{} does not encode

Hi, thanks for making this great library. My team and I have a use case where we are using Go to encode many kinds of messages for writing to kafka. As a result, we are using the map[string]interface{} encoding method. Maybe I am not configuring something correctly but it seems like nullable enums do not work in this scenario. Please let me know if there is something I need to fix. Here is a reproducible example:

package main

import (
	"fmt"
	"github.com/hamba/avro"
)

func main() {
	schema, err := avro.Parse(`
{
	"type": "record",
	"name": "test", 
	"fields": [
		{"name": "PET", "type": ["null", {"type": "enum", "name": "pet", "symbols": ["dog", "cat"]}]}
	]
}`)
	if err != nil {
		fmt.Print(err)
		return
	}

	inf := map[string]interface{}{
		"PET": "dog",
	}

	_, err = avro.Marshal(schema, inf)
	if err != nil {
		fmt.Printf("could not encode: %v\n", err)
	}
}

This returns the error:

could not encode: avro: unknown union type string

Stepping through a debugger, it looks like it's trying to find an encoder for "string" but "enum" and "pet" are found. I think this was in this function: https://github.com/hamba/avro/blob/master/schema.go#L96

Thanks for the help

Invalid name when name contains periods '.'

When parsing a schema with a name containing period (e.g "abcd_1.0_efg") I get invalid name error. The issue seems to be the parser splits the name by '.' and invalidates any substring starting with a number (0_efg is invalid in this case)

From here, I understand periods in name is invalid. But I would like to disable name verification and the library does not seem to have that option.

Is there any plan to bring in the option to disable name verification. If not, is there any hack I can use to disable name validation?

`date` encode/decode incorrect for small/large dates

Similar to #128. Unix time cannot be represented in nanoseconds for small/large values.

avro/codec_native.go

Lines 392 to 400 in bf2e271

func (c *dateCodec) Decode(ptr unsafe.Pointer, r *Reader) {
i := r.ReadInt()
*((*time.Time)(ptr)) = time.Unix(0, int64(i)*int64(24*time.Hour)).UTC()
}
func (c *dateCodec) Encode(ptr unsafe.Pointer, w *Writer) {
t := *((*time.Time)(ptr))
w.WriteInt(int32(t.UnixNano() / int64(24*time.Hour)))
}

return time.Unix(0, int64(r.ReadInt())*int64(24*time.Hour)).UTC()

How to encode top level union in structs

Hi. I'm trying to encode/decode content with a schema that is a top level union. I have a sample .go file that you can run with go run:

func main() {
	type person struct {
		Name string `avro:"name"`
		Age  int    `avro:"age"`
	}

	type child struct {
		Name string `avro:"name"`
		Age  int    `avro:"age"`
		Toy  string `avro:"toy"`
	}

	type union struct {
		child  *child  `avro:"child"`
		person *person `avro:"person"`
	}

	schemaText := `[{
		"type": "record",
		"name": "Child",
		"namespace": "sample",
		"fields": [
			{ "name": "name", "type": "string" },
			{ "name": "age", "type": "int" },
		  { "name": "toy", "type": "string" }
		]
	},{
		"type": "record",
		"name": "Person",
		"namespace": "sample",
		"fields": [
			{ "name": "name", "type": "string" },
			{ "name": "age", "type": "int" }
		]
	}]`

	schema := avro.MustParse(schemaText)

	p := person{Name: "name", Age: 1}
	value := union{
		person: &p,
	}
	encoded, err := avro.Marshal(schema, value)
	if err != nil {
		log.Fatal(err)
	}

	var decodedPerson person
	err = avro.Unmarshal(schema, encoded, &decodedPerson)
	if err != nil {
		log.Fatal(err)
	}

	log.Printf("decodedPerson = %+v\n", decodedPerson)
}

This must be one of person or child. How would I encode this using Go structs and possibly map[string]interface{}? Getting an error avro: unable to resolve type main.union which I'm finding dificult to figure out

Would you consider exposing `field.Doc()`

Description of the problem

We are considering using hamba avro to validate our schema conforms to a given specification (naming, required fields etc, documentation etc..).

Unfortunately the type Field does not expose a method to access the doc attribute, since it's a reserved property.

Question

Would you consider a contribution to add a Doc() function to the type Field ?

`Field.MarshalJSON` doesn't correctly encode null default values in unions

Hello, first thank you for writing and supporting this library!

As the title says, there is an issue when json encoding a schema containing unions with a default null value. For example a schema defined like this:

{
    "type":"record",
    "namespace": "org.hamba.avro",
    "name":"X",
    "fields":[{"name":"value", "type": ["null", "int"], "default": null}]
}

When json marshalled, it becomes:

{
    "type":"record",
    "namespace": "org.hamba.avro",
    "name":"X",
    "fields":[{"name":"value", "type": ["null", "int"], "default": {}]
}

Note the default value being an empty object instead of null data type.

I identified the root cause in how Field.MarshalJSON handle fields with default values and prepared a pull request with a fix and updated tests to check this particular use case. I hope it helps.

Unexpected unknown union type errors

Hi, I'm using debezium connect with schema registry, and connect is using io.confluent.connect.avro.AvroConverter as value.converter. My schema is as below;

{
  "name": "images.Value",
  "type": "record",
  "fields": [
    {
      "name": "id",
      "type": "string"
    },
    {
      "name": "file",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "name": "width",
      "type": [
        "null",
        "int"
      ]
    },
    {
      "name": "height",
      "type": [
        "null",
        "int"
      ]
    }
  ]
}

And, I'm trying to convert these messages into my Image{} struct but it's failing with the unknown union type message.

type Image struct {
  ID string `avro:"id"`
  File *string `avro:"file"`
  Width *int `avro:"width"`
  Height *int `avro:"height"`
}
schema, _ := avro.Parse( `...my long schema json`)
c.SubscribeTopics([]string{"images"}, nil)

for {
  msg, err := c.ReadMessage(-1)
  if err == nil {
    img := Image{}
    err = avro.Unmarshal(schema, msg.Value, &img)
    if err != nil {
      log.Fatalf("Failed to read image %s: %v\n", msg.Key, err)
    }
    fmt.Printf("Read image %s: %v\n", msg.Key, img)
  }
}

Error is as below:

Failed to read image f66b15f0-a51e-4a00-b80f-1fcf89ea7bc7: Height: Width: File: avro: decode union type: unknown union type
exit status 1

What am I missing? Is there a bug?

`ocf.Decoder` produces incorrect `timestamp-millis`

First, thanks for a great Avro encoder/decoder!

Seems like there is a problem to decode timestamp-millis (would also affect timestamp-micros) correctly when using an ocf.Decoder like:

m := make(map[string]interface{})
dec, _ := ocf.NewDecoder(bytes.NewReader(b))
dec.Decode(&m)

and the timestamp is small/large. Due to The result is undefined if the Unix time in nanoseconds cannot be represented by an int64 (a date before the year 1678 or after 2262).

Screenshot 2021-11-18 at 15 13 07

avro/reader_generic.go

Lines 38 to 42 in bf2e271

case TimestampMillis:
return time.Unix(0, r.ReadLong()*int64(time.Millisecond)).UTC()
case TimestampMicros:
return time.Unix(0, r.ReadLong()*int64(time.Microsecond)).UTC()

While correct decoding is done at:

avro/codec_native.go

Lines 402 to 423 in bf2e271

type timestampMillisCodec struct{}
func (c *timestampMillisCodec) Decode(ptr unsafe.Pointer, r *Reader) {
i := r.ReadLong()
sec := i / 1e3
nsec := (i - sec*1e3) * 1e6
*((*time.Time)(ptr)) = time.Unix(sec, nsec).UTC()
}
func (c *timestampMillisCodec) Encode(ptr unsafe.Pointer, w *Writer) {
t := *((*time.Time)(ptr))
w.WriteLong(t.Unix()*1e3 + int64(t.Nanosecond()/1e6))
}
type timestampMicrosCodec struct{}
func (c *timestampMicrosCodec) Decode(ptr unsafe.Pointer, r *Reader) {
i := r.ReadLong()
sec := i / 1e6
nsec := (i - sec*1e6) * 1e3
*((*time.Time)(ptr)) = time.Unix(sec, nsec).UTC()
}

avro: unknown union type int.date

we have date logicalType in our schema as shown below, when marshal from map[string]interface{} we are getting
error : avro: unknown union type int.date

Schema

    {
      "name": "someDt",
      "type": [
        "null",
        {
          "type": "int",
          "logicalType": "date"
        }
      ],
      "props": [
        {
          "sourceName": "myBaseDrpt"
        }
      ]
    },
...

Unmarshal works.

var generic map[string]interface{}
avro.Unmarshal(schema,  Data, &generic);

but when re-Marshal, it fails with avro: unknown union type int.date error

if binary, err = avro.Marshal(schema, generic); err != nil {
		s.handleError(ctx, msg, err)
		return
}

Schema fails to parse nested fixed type

package main

import (
	"fmt"

	"github.com/hamba/avro"
)

var schema = `
{"name":"MyType", "type":"record", "fields": [
	{"name":"duration", "type":"fixed", "size": 12}
]}
`

func main() {
	_, err := avro.Parse(schema)
	fmt.Printf("Error: %s\n", err)
}
Error: avro: unknown type: fixed

Note that it parses fine when not nested inside a record.

Custom types

Hi,

I'm reading some json data that contains a string with a timestamp that needs a custom UnmarshalJSON to unmarshal properly. I've crated a type Timestamp time.Time with an UnmarshalJSON method that works. Next I need to serialize this data to avro.

Naively just trying to use avro.Marshal on it unsurprisingly returns an avro: mypkg.Timestamp is unsupported for Avro string error.

Is there an easy way to get Avro to treat my custom Timestamp type as a string for a avro string field?

I can't seem to find anything similar to how encoding/json works where I just need to implement my own MarshalJSON function.

I'd like to avoid having to create the same struct again, just with string types for the timestamps, and copy over all the values to get it to serialize to Avro - its a fairly big struct.

Encoding/Decocing time.Time zero value does not result in same value

Hi,

I am stuggling with encoding/decoding of zero value time.Time fields. If I encode a struct with zero value time, and decode it again, the value is no longer zero but instead 1754-08-30 22:43:41.128654848.

Here is a sample code to reproduce:

package main

import (
	"github.com/davecgh/go-spew/spew"
	"github.com/hamba/avro"
	"github.com/rs/zerolog/log"
	"time"
)

type TimeStruct struct {
	Timestamp time.Time `avro:"timestamp"`
}

func main() {
	const SchemaString = `{
    "type": "record",
    "name": "timestruct",
    "namespace": "test",
    "fields": [
		{"name": "timestamp", "type":{"type":"long","logicalType":"timestamp-micros"}}
    ]
}`
	schema, err := avro.Parse(SchemaString)
	if err != nil {
		log.Fatal().Msg(err.Error())
	}

	ts := TimeStruct{Timestamp: time.Time{}}

	bytes, err := avro.Marshal(schema, ts)
	if err != nil {
		log.Fatal().Msg(err.Error())
	}

	var ts2 TimeStruct

	err = avro.Unmarshal(schema, bytes, &ts2)
	if err != nil {
		log.Fatal().Msg(err.Error())
	}

	if ts.Timestamp != ts2.Timestamp {
		log.Fatal().Msgf("ts != ts2: %v != %v",ts,ts2)
	}

}

I found this on UnixNano function from time package:

UnixNano returns t as a Unix time, the number of nanoseconds elapsed since January 1, 1970 UTC. The result is undefined if the Unix time in nanoseconds cannot be represented by an int64 (a date before the year 1678 or after 2262). Note that this means the result of calling UnixNano on the zero Time is undefined. The result does not depend on the location associated with t.

Is this the expected behavior of encoding/decoding zero value or is it a bug?

aliyun dts schema parsed with error "avro: union type must be unique"

Below is the schema which cannot be parsed correctly: https://github.com/LioRoger/subscribe_example/blob/master/avro/Record.avsc?spm=a2c4g.11186623.2.9.605f741acYavoL&file=Record.avsc

It reports the error: "avro: union type must be unique", the problem comes from section:

{
        "name": "beforeImages",
        "default": null,
        "type": [
          "null",
          "string",
          {
            "type": "array",
            "items": [
              "null",
              "com.alibaba.dts.formats.avro.Integer",
              "com.alibaba.dts.formats.avro.Character",
              "com.alibaba.dts.formats.avro.Decimal",
              "com.alibaba.dts.formats.avro.Float",
              "com.alibaba.dts.formats.avro.Timestamp",
              "com.alibaba.dts.formats.avro.DateTime",
              "com.alibaba.dts.formats.avro.TimestampWithTimeZone",
              "com.alibaba.dts.formats.avro.BinaryGeometry",
              "com.alibaba.dts.formats.avro.TextGeometry",
              "com.alibaba.dts.formats.avro.BinaryObject",
              "com.alibaba.dts.formats.avro.TextObject",
              "com.alibaba.dts.formats.avro.EmptyObject"
            ]
          }
        ]
}

I am not familiar with avro specification, so I cannot figure out what's the root cause, but "github.com/linkedin/goavro/v2" works well with this schema, is there any specification not implemented yet?

unmarshal reports error: "avro: decode union type: unknown union type"

Try unmarshal data with below schema:

https://github.com/LioRoger/subscribe_example/blob/master/avro/Record.avsc?spm=a2c4g.11186623.2.9.605f741acYavoL&file=Record.avsc

The data we captured is๏ผšdts.log

schema,  _ := avro.Parse(ALIYUN_DTS_SCHEMA)

data, _ := ioutil.ReadFile("dts.log")

var record interface{}
err = avro.Unmarshal(schema, data, &record)

it reports error "avro: decode union type: unknown union type"๏ผŒ any hints?

Convert CSV line to avro format

Background

We have a very specific use case where the data is in a CSV file and there is a separate file which stores avro schema.
We will have to convert the csv data into OCF format and send it out.

The problem we face is the CSV data is array of strings, and because of this we get errors like

error: avro: string is unsupported for Avro long

For example

Schema

{
  "type": "record",
  "name": "simple",
  "namespace": "org.hamba.avro",
  "fields": [
    { "name": "a", "type": "long" },
    { "name": "b", "type": "string"}
  ]
}

Value

map[string]interface{}{
  "a": "27",
  "b": "foo",
}

We get an error like

error: avro: string is unsupported for Avro long

Play ground link - https://play.golang.org/p/P1Vz76K3F_E

Feature request

  1. Can we auto parse string to a type required by the avro schema at run time?
  2. Also if a data is not present(empty string) default value must be populated

MapUnionEncoder can not work correct

I hava a schema:

{
			"namespace":"test.test",
			"type":"record",
			"name":"In",
			"fields":[
				{"name":"m1","type":["null",{"type":"map","values":"string"}]}
			]
		}

but if I use Marshal() to a map:

rawSchema := `
{
			"namespace":"test.test",
			"type":"record",
			"name":"In",
			"fields":[
				{"name":"m1","type":["null",{"type":"map","values":"string"}]}
			]
		}`

	schema := avro.MustParse(rawSchema)

	b, err := avro.Marshal(schema, map[string]interface{}{
		"m1": map[string]interface{}{"map": map[string]string{"ASD": "asd"}},
	})

I got this:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x1 pc=0x40fbe2]

if I edit the schema to :

rawSchema := `
{
			"namespace":"test.test",
			"type":"record",
			"name":"In",
			"fields":[
				{"name":"m1","type":{"type":"map","values":"string"}}
			]
		}`

it is work.

I review the code, and find this bug is happen in codec_union.go : 144. reflect2.Ptr() is not get the correct ptr.

Redefining same type in array

Hello. I'm having same issue as before with redeclaring type ("fi.sok.schema.raflaamo.PobWeeklyOpeningTime"). However this time it's an item in array. I'm creating ExceptionalOpeningTimes array using ref but in generated avro schema it is still redeclared.

OpeningTimes struct {
		DefaultOpeningTimes     []PobWeeklyOpeningTime 
		ExceptionalOpeningTimes []struct {
			Start                   string                 
			End                     string                 
			ExceptionalOpeningTimes []PobWeeklyOpeningTime
		} 
	} 

type PobWeeklyOpeningTime struct {
	Day      string `json:"day,omitempty"`
	TimeType string `json:"type,omitempty"`
	Ranges   []struct {
		Start string `json:"start,omitempty"`
		End   string `json:"end,omitempty"`
	} `json:"ranges,omitempty"`
}
func openingTimesField() (*avro.Field, error) {
	pobWeeklyOpeningTime, err := pobWeeklyOpeningTimeSchema()
	if err != nil {
		return nil, err
	}

	pobWeeklyOpeningTimeRef := avro.NewRefSchema(pobWeeklyOpeningTime)
	exceptionalOpeningTimesArr, err := avro.NewField("ExceptionalOpeningTimes", avro.NewArraySchema(pobWeeklyOpeningTimeRef), nil)
	if err != nil {
		return nil, err
	}

	start, err := avro.NewField("Start", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	end, err := avro.NewField("End", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	exceptionalOpeningTimes, err := avro.NewRecordSchema("ExceptionalOpeningTime", "", []*avro.Field{start, end, exceptionalOpeningTimesArr})
	if err != nil {
		return nil, err
	}


	defaultOpeningTimesField, err := avro.NewField("DefaultOpeningTimes", avro.NewArraySchema(pobWeeklyOpeningTime), nil)
	if err != nil {
		return nil, err
	}

	exceptionalOpeningTimesField, err := avro.NewField("ExceptionalOpeningTimes", avro.NewArraySchema(exceptionalOpeningTimes), nil)
	if err != nil {
		return nil, err
	}

	openingTimes, err := avro.NewRecordSchema("OpeningTimes", "", []*avro.Field{defaultOpeningTimesField, exceptionalOpeningTimesField})
	if err != nil {
		return nil, err
	}

	return avro.NewField("OpeningTimes", openingTimes, nil)
}


func pobWeeklyOpeningTimeSchema() (*avro.RecordSchema, error) {
	day, err := avro.NewField("Day", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	timeType, err := avro.NewField("TimeType", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	start, err := avro.NewField("Start", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	end, err := avro.NewField("End", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	rangeSchema, err := avro.NewRecordSchema("Range", "", []*avro.Field{start, end})
	if err != nil {
		return nil, err
	}

	ranges := avro.NewArraySchema(rangeSchema)
	rangeField, err := avro.NewField("Ranges", ranges, nil)
	if err != nil {
		return nil, err
	}

	return avro.NewRecordSchema("PobWeeklyOpeningTime", namespace, []*avro.Field{day, timeType, rangeField})
}

Generated Schema. OpeningTimes is a field in a schema, so this is only a section from full schema.

   {
      "name": "OpeningTimes",
      "type": {
        "name": "OpeningTimes",
        "type": "record",
        "fields": [
          {
            "name": "DefaultOpeningTimes",
            "type": {
              "type": "array",
              "items": {
                "name": "fi.sok.schema.raflaamo.PobWeeklyOpeningTime",
                "type": "record",
                "fields": [
                  {
                    "name": "Day",
                    "type": "string"
                  },
                  {
                    "name": "TimeType",
                    "type": "string"
                  },
                  {
                    "name": "Ranges",
                    "type": {
                      "type": "array",
                      "items": {
                        "name": "Range",
                        "type": "record",
                        "fields": [
                          {
                            "name": "Start",
                            "type": "string"
                          },
                          {
                            "name": "End",
                            "type": "string"
                          }
                        ]
                      }
                    }
                  }
                ]
              }
            }
          },
          {
            "name": "ExceptionalOpeningTimes",
            "type": {
              "type": "array",
              "items": {
                "name": "ExceptionalOpeningTime",
                "type": "record",
                "fields": [
                  {
                    "name": "Start",
                    "type": "string"
                  },
                  {
                    "name": "End",
                    "type": "string"
                  },
                  {
                    "name": "ExceptionalOpeningTimes",
                    "type": {
                      "type": "array",
                      "items": {
                        "name": "fi.sok.schema.raflaamo.PobWeeklyOpeningTime",
                        "type": "record",
                        "fields": [
                          {
                            "name": "Day",
                            "type": "string"
                          },
                          {
                            "name": "TimeType",
                            "type": "string"
                          },
                          {
                            "name": "Ranges",
                            "type": {
                              "type": "array",
                              "items": {
                                "name": "Range",
                                "type": "record",
                                "fields": [
                                  {
                                    "name": "Start",
                                    "type": "string"
                                  },
                                  {
                                    "name": "End",
                                    "type": "string"
                                  }
                                ]
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    },

Implicit type coercion features

Hi @nrwiersma,

Does this library support the ability to convert numerical types implicitly? The LinkedIn go avro library seems to support this: https://github.com/linkedin/goavro#translating-from-go-to-avro-data

When translating from native Go to either binary or textual Avro data, goavro generally requires the same native Go data types as the decoder would provide, with some exceptions for programmer convenience. Goavro will accept any numerical data type provided there is no precision lost when encoding the value. For instance, providing float64(3.0) to an encoder expecting an Avro int would succeed, while sending float64(3.5) to the same encoder would return an error.

If not, would it be difficult to add features like this? As you know, our use case involves serializing arbitrary customer data. Some customers send us parquet files with 32 bit types and we would like to automatically serialize those into 64 bit Avro types. That way the customers don't have to care whether their files are 32 or 64, we can handle both.

Thanks for the help!

Logical Types for Union and ComplexType

First of all, thank you very much for so great package!!!

I'm having a problem with this schema:

{"type":["null", "int"],"logicalType":"date"}

On parsePrimitive(typ, nil) it pass nil for String, Bytes, Int, Long, Float, Double, Boolean so in parsePrimitiveLogicalType it returns nil and don't create the LogicalType
It also happens with complex structures, no only with union.

Here is a fail example:

func TestEncoder_Time_Date(t *testing.T) {
	defer ConfigTeardown()

	schema := `{"type":["null", "int"],"logicalType":"date"}`
	buf := bytes.NewBuffer([]byte{})
	enc, err := avro.NewEncoder(schema, buf)
	assert.NoError(t, err)

	tm := time.Date(2020, 1, 2, 0, 0, 0, 0, time.UTC)

	err = enc.Encode(&tm)

	assert.NoError(t, err)
	assert.Equal(t, []byte{0xAE, 0x9D, 0x02}, buf.Bytes())
}

If you like, i can investigate more and try to propose a PR.

Latest Schema method does not return "Schema Id"

The AVRO client method GetLatestSchema only returns schema but not the schema_id associated with it. Also there's no other method in the library to retrieve it. https://github.com/hamba/avro/blob/master/registry/client.go#L183-L191

func (c *Client) GetLatestSchema(subject string) (avro.Schema, error) {
	var payload schemaPayload
	err := c.request(http.MethodGet, "/subjects/"+subject+"/versions/latest", nil, &payload)
	if err != nil {
		return nil, err
	}

	return avro.Parse(payload.Schema)
}

Serializing a map results in a panic

I'm converting the data to a map(using avro.Unmarshal), modify the fields I'm interested in, and then again use avro.Marshal() on the map.

In the above process avro.Marshal() panics at https://github.com/hamba/avro/blob/master/codec_record.go#L293

Here is a sample code to reproduce the panic:

package main

import (
	"github.com/sirupsen/logrus"
	"github.com/hamba/avro"
)

func main() {
        schema, _ := avro.Parse(`{"namespace": "People","type": "record","doc": "People information","name": "People",
"fields": [{"name": "name","type": ["null",{"type": "string","avro.java.string": "String"}],"doc": "Name of the person.","default": null},
{"name": "address","type": ["null",{"type": "record","doc": "Address information","name": "Address","fields": [{"name": "name","type": ["null",{"type": "string","avro.java.string": "String"}],"doc": "An optional name of the recipient","default": null}],"doc": "","default": null}]}]}`)

	p := People{
		Name:    "TD",
		Address: &Address{Name: "DT"},
	}

	pBytes, err := avro.Marshal(schema, &p)
	if err != nil {
		logrus.New().Fatal(err)
	}

	resMap := make(map[string]interface{})

	err = avro.Unmarshal(schema, pBytes, &resMap)
	if err != nil {
		logrus.New().Fatal(err)
	}

	pBytes, err = avro.Marshal(schema, &resMap)
	if err != nil {
		logrus.New().Fatal(err)
	}

}

type People struct {
	Name    string   `avro:"name"`
	Address *Address `avro:"address"`
}

type Address struct {
	Name string `avro:"string"`
}

Ordering of types in union of null / type affecting type resolution

A schema with a union of [ "null", "string" ] will happily Unmarshal to a *string, but a union of [ "string", "null" ] throws an error.

For example:

type SimpleRecord struct {
	ID *string `avro:"id"`
}
schema, err := avro.Parse(`{
	"name": "Person",
	"namespace": "umbrella.corp",
	"type": "record",
	"fields": [
		{ "name": "id", "type": [ "string", "null" ] }
	]
}`)
if err != nil {
	log.Fatal(err)
}
id := "abc"
in := SimpleRecord{ID: &id}
_, err = avro.Marshal(schema, in)
if err != nil {
	log.Fatal(err)
}

will fail with an error of ID: avro: unable to resolve type *string

Whereas changing the schema to

schema, err := avro.Parse(`{
	"name": "Person",
	"namespace": "umbrella.corp",
	"type": "record",
	"fields": [
		{ "name": "id", "type": [ "string", "null" ] }
	]
}`)

Will work without error.

partial update to avro bytes

I have streaming job that receive Message with schema path (dynamic) in header and raw data bytes in body.
I am caching parsed schemas in my job's shared memory. Even though the schemas are dynamic, part of that schema is fixed. so I can extract required data to a partial Struct from message body bytes.

Now I have to update that partial struct, based on our internal business logic, and patch the original incoming message bytes with updated partial struct.

This part I am not clear, if there is any API this liberty provide.
please let me know if there is a way to update avro byte buffer with partially changed data.

                      +------------------------------+
 +---------------+    |Extract partial Data          |     +-----------------------+
 | Schema path + |    |into a fixed Struct.          |     |                       |
 | Data bytes    +---->Update that Struct and        +----->  Updated Data bytes   |
 +---------------+    |Patch the original Data buffer|     +-----------------------+
                      |with new updated struct.      |
                      +------------------------------+

Maybe sync.Pool somehow causes panic

running below code snippet has chance of causing SIGSEGV.
Either comment out the avro.Unmarshal() or log.Info() would prevent segmentation fault, which is very possibly a memory problem...

package main

import (
	"fmt"
	"github.com/rs/zerolog"
	"os"
	"strings"
	"time"

	"github.com/hamba/avro"
	"github.com/rs/zerolog/log"
)

func main() {
	// Use zerolog console writer
	output := zerolog.ConsoleWriter{Out: os.Stderr, NoColor: false, TimeFormat: time.RFC3339}

	output.FormatLevel = func(i interface{}) string {
		return strings.ToUpper(fmt.Sprintf("| %-6s|", i))
	}
	output.FormatMessage = func(i interface{}) string {
		return fmt.Sprintf("%s", i)
	}
	output.FormatFieldName = func(i interface{}) string {
		return fmt.Sprintf("%s=", i)
	}
	output.FormatFieldValue = func(i interface{}) string {
		return strings.ToUpper(fmt.Sprintf("%s", i))
	}

	log.Logger = zerolog.New(output).With().Timestamp().Logger().With().Caller().Logger()

	// Begin our test
	type UserLabelMeta struct {
		Language string `avro:"language"`
		Ip       string `avro:"ip"`
	}
	type Event struct {
		UserId    string `avro:"user_id"`
		Timestamp int64  `avro:"timestamp"`
		// User related
		UserLabels map[string]string `avro:"user_labels"`
		MetaLabels *UserLabelMeta    `avro:"meta_labels"`
	}

	in := Event{
		UserId:    "UserId",
		Timestamp: int64(time.Now().UnixNano() / 1e6),

		UserLabels: map[string]string{"userLabel1": "userValue1"},

		MetaLabels: &UserLabelMeta{
			Language: "English",
			Ip:       "1.1.1.1",
		},
	}
	// Init schema
	schema, err := avro.Parse(string(`{
  "namespace": "minerva.dlive.tv",
  "type": "record",
  "name": "Event",
  "fields": [
    {
      "name": "user_id", "type":  "string"
    },
    {
      "name": "timestamp", "type": "long"
    },
    {
      "name": "user_labels",
      "type": {
        "type": "map",
        "values": "string"
      }
    },
    {
      "name": "meta_labels",
      "type": ["null", {
        "type": "record",
        "name": "UserLabelMeta",
        "fields": [
          { "name": "language", "type": "string" },
          { "name": "ip", "type": "string" }
        ]
      }],
      "default": null
    }
  ]
}`))
	if err != nil {
		panic(err)
	}

	// marshal
	data, err := avro.Marshal(schema, in)
	if err != nil {
		panic(err)
	}
	log.Info().Msgf("avro.Marshal(in)=% 08x", data)

	// unmarshal
	out := Event{}
	err = avro.Unmarshal(schema, data, &out)
	if err != nil {
		panic(err)
	}

	log.Info().Msgf("possibly cause SIGSEGV")

}

Log output:

2019-06-14T17:32:32-07:00 | INFO  | cmd/tmp/main.go:99 > avro.Marshal(in)=0c 55 73 65 72 49 64 ca ce e6 88 eb 5a 01 2c 14 75 73 65 72 4c 61 62 65 6c 31 14 75 73 65 72 56 61 6c 75 65 31 00 02 0e 45 6e 67 6c 69 73 68 0e 31 2e 31 2e 31 2e 31
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf pc=0x100db95]

goroutine 1 [running]:
reflect.mapassign(0x11f72e0, 0x7, 0xc000013610, 0xc0000135f0)
    /usr/local/Cellar/go/1.12.5/libexec/src/runtime/map.go:1331 +0x3f
reflect.Value.SetMapIndex(0x11f72e0, 0xc0000103a8, 0x195, 0x11e51e0, 0xc000013610, 0x98, 0x11f6800, 0xc0000135f0, 0x194)
    /usr/local/Cellar/go/1.12.5/libexec/src/reflect/value.go:1615 +0x234
encoding/json.(*decodeState).object(0xc000014188, 0x11dcfa0, 0xc0000103a8, 0x16, 0xc0000141b0, 0xc00001a77b)
    /usr/local/Cellar/go/1.12.5/libexec/src/encoding/json/decode.go:805 +0xd97
encoding/json.(*decodeState).value(0xc000014188, 0x11dcfa0, 0xc0000103a8, 0x16, 0x20, 0xa)
    /usr/local/Cellar/go/1.12.5/libexec/src/encoding/json/decode.go:381 +0x6e
encoding/json.(*decodeState).unmarshal(0xc000014188, 0x11dcfa0, 0xc0000103a8, 0x0, 0xc000114001)
    /usr/local/Cellar/go/1.12.5/libexec/src/encoding/json/decode.go:179 +0x209
encoding/json.(*Decoder).Decode(0xc000014160, 0x11dcfa0, 0xc0000103a8, 0xc0000b8060, 0x1409c60)
    /usr/local/Cellar/go/1.12.5/libexec/src/encoding/json/stream.go:73 +0x187
github.com/rs/zerolog.ConsoleWriter.Write(0x1278800, 0xc000010018, 0x0, 0x123cd82, 0x19, 0xc00006c840, 0x4, 0x4, 0x0, 0x1243990, ...)
    /Users/xin/Developer/go/pkg/mod/github.com/rs/[email protected]/console.go:101 +0x186
github.com/rs/zerolog.levelWriterAdapter.WriteLevel(...)
    /Users/xin/Developer/go/pkg/mod/github.com/rs/[email protected]/writer.go:20
github.com/rs/zerolog.(*Event).write(0xc000024240, 0x9f, 0x1f4)
    /Users/xin/Developer/go/pkg/mod/github.com/rs/[email protected]/event.go:75 +0x125
github.com/rs/zerolog.(*Event).msg(0xc000024240, 0xc000018300, 0x2f)
    /Users/xin/Developer/go/pkg/mod/github.com/rs/[email protected]/event.go:134 +0x13d
github.com/rs/zerolog.(*Event).Msgf(0xc000024240, 0x1241e3c, 0x2f, 0x0, 0x0, 0x0)
    /Users/xin/Developer/go/pkg/mod/github.com/rs/[email protected]/event.go:116 +0x83
main.main()
    /Users/xin/Developer/go/src/github.com/lino-network/minerva/cmd/tmp/main.go:108 +0xb5e

Generating illegal avro schema

I'm generation avro schema for AddressType struct

type AddressType struct {
	Street         *LocalizedString `json:"street,omitempty"`
	ZipCode        string           `json:"zipCode,omitempty"`
	MunicipalityID string           `json:"municipalityID,omitempty"`
	Municipality   *LocalizedString `json:"municipality,omitempty"`
	Country        *LocalizedString `json:"country,omitempty"`
}
type LocalizedString struct {
	FiFi string `json:"fi_FI,omitempty"`
	SvFi string `json:"sv_FI,omitempty"`
	EnGb string `json:"en_GB,omitempty"`
	EtEe string `json:"et_EE,omitempty"`
	RuRu string `json:"ru_RU,omitempty"`
}
func LocalizedStringSchema() (*avro.RecordSchema, error) {
	fiFi, err := avro.NewField("fi_FI", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	svFi, err := avro.NewField("sv_FI", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	enGb, err := avro.NewField("en_GB", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	etEe, err := avro.NewField("et_EE", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	ruRu, err := avro.NewField("ru_RU", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	fields := []*avro.Field{fiFi, svFi, enGb, etEe, ruRu}
	localizedStringSchema, err := avro.NewRecordSchema("LocalizedString", "", fields)
	if err != nil {
		return nil, err
	}

	return localizedStringSchema, nil
}

func AddressTypeSchema() (avro.Schema, error) {
	localizedString, err := LocalizedStringSchema()
	if err != nil {
		return nil, err
	}

	types := []avro.Schema{avro.NewPrimitiveSchema(avro.Null), localizedString}
	unionSchema, err :=  avro.NewUnionSchema(types)
	if err != nil {
		return nil, err
	}

	street, err := avro.NewField("street", unionSchema, nil)
	if err != nil {
		return nil, err
	}

	zipCode, err := avro.NewField("zipCode", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	municipalityID, err := avro.NewField("municipalityID", avro.NewPrimitiveSchema(avro.String), nil)
	if err != nil {
		return nil, err
	}

	municipality, err := avro.NewField("municipality", unionSchema, nil)
	if err != nil {
		return nil, err
	}

	country, err := avro.NewField("country", unionSchema, nil)
	if err != nil {
		return nil, err
	}

	fields := []*avro.Field{street, zipCode, municipalityID, municipality, country}
	addressTypeSchema, err := avro.NewRecordSchema("AddressType", namespace, fields)
	if err != nil {
		return nil, err
	}


	return addressTypeSchema, nil
}

When I'm trying to register this schema in confluent schema registry I get "Input schema is an invalid Avro schema" error. Could you please point me to what I'm doing wrong.

{
  "name": "fi.sok.schema.raflaamo.AddressType",
  "type": "record",
  "fields": [
    {
      "name": "street",
      "type": [
        "null",
        {
          "name": "LocalizedString",
          "type": "record",
          "fields": [
            {
              "name": "fi_FI",
              "type": "string"
            },
            {
              "name": "sv_FI",
              "type": "string"
            },
            {
              "name": "en_GB",
              "type": "string"
            },
            {
              "name": "et_EE",
              "type": "string"
            },
            {
              "name": "ru_RU",
              "type": "string"
            }
          ]
        }
      ]
    },
    {
      "name": "zipCode",
      "type": "string"
    },
    {
      "name": "municipalityID",
      "type": "string"
    },
    {
      "name": "municipality",
      "type": [
        "null",
        {
          "name": "LocalizedString",
          "type": "record",
          "fields": [
            {
              "name": "fi_FI",
              "type": "string"
            },
            {
              "name": "sv_FI",
              "type": "string"
            },
            {
              "name": "en_GB",
              "type": "string"
            },
            {
              "name": "et_EE",
              "type": "string"
            },
            {
              "name": "ru_RU",
              "type": "string"
            }
          ]
        }
      ]
    },
    {
      "name": "country",
      "type": [
        "null",
        {
          "name": "LocalizedString",
          "type": "record",
          "fields": [
            {
              "name": "fi_FI",
              "type": "string"
            },
            {
              "name": "sv_FI",
              "type": "string"
            },
            {
              "name": "en_GB",
              "type": "string"
            },
            {
              "name": "et_EE",
              "type": "string"
            },
            {
              "name": "ru_RU",
              "type": "string"
            }
          ]
        }
      ]
    }
  ]
}

Schema fields do not consider default in String() methods

I am building a tool that needs to derive a JSON Avro schema from a BigQuery schema. Say, for example, I have a simple table that is just nullable INTEGER types in BQ. This means that I need to generate a corresponding Avro schema where the field names match the BQ column names and the Avro types for the fields are "long".

Because the BQ column in nullable, I also want my Avro field to be a union type of ["null", "long"] because I want the Avro schema to define a default value of null so "null" has to be the first field of the union.

I can get mostly there with something like this:

fields := []*avro.Field{}
unionTypes := []avro.Schema{
	avro.NewPrimitiveSchema(avro.Null, nil),
	avro.NewPrimitiveSchema(avro.Long, nil),
}
field, err := avro.NewUnionSchema(unionTypes)
fields = append(fields, field)
schema, err := avro.NewRecordSchema("my-name", "my-namespace", fields)

The problem is that, when I then try to create the JSON schema like this:

jsonSchema := schema.String()

The resulting schema record fields only consist of the "name" and "type" values (both correct) but does not include the "default":null value. The String() function on the various field types does not consider the default value at all. For example, this only considers the "name" and "type" values:

// String returns the canonical form of a field.
func (s *Field) String() string {
	return `{"name":"` + s.name + `","type":` + s.typ.String() + `}`
}

Is this intentional or am I missing something in the way I am trying to generate the Avro schema?

Trying to marshal []map[string]interface{}

Firstly I apologize If this is obvious, this is my first real project in GO so I am learning as I go along!

I am trying to create a microservice to ingest raw JSON (of any format) and create Avro out of it, this is a previous job that exists in Python that I have to convert over, i am using structs to keep track of and dynamically map fields to the schema, however, I get an issue when marshalling the data, the Avro is in the format:

type Field struct {
	Name      string   `json:"name"`
	FieldType []string `json:"type"`
}

type Schema struct {
	Type      string  `json:"type"`
	Name      string  `json:"name"`
	Namespace string  `json:"namespace"`
	Fields    []Field `json:"fields"`
}

type ParsableSchema struct {
	Type  string `json:"type"`
	Items Schema `json:"items"`
}

In order to parse unknown JSON I am using maps of string to interface (to represent a JSON object), I have added below the AVSC that I generated with some sample data, the JSON does match this schema, but when I try to marshal a slice of this JSON, I get the error: []map[string]interface {}: avro: unknown union type double
I have tried wrapping the schema in a {"type":"array", "items": the avsc here} but I still get the same issue, again this is my first Go Project so apologies if i am missing something "obvious", this is a lot different than the Python i am used to :)

{
  "type": "record",
  "name": "TestTable",
  "namespace": "TestTable.avsc",
  "fields": [
    {
      "name": "TestData",
      "type": [
        "string",
        "null"
      ]
    },
    {
      "name": "NEWNEWROW",
      "type": [
        "boolean",
        "null"
      ]
    },
    {
      "name": "NewValue",
      "type": [
        "string",
        "null"
      ]
    },
    {
      "name": "ExampleNest_NestedKey",
      "type": [
        "int",
        "null"
      ]
    },
    {
      "name": "ExampleNest_DoubleNest_DOUBLENESTVAL",
      "type": [
        "boolean",
        "null"
      ]
    }
  ]
}

Support for obtaining schema or other metadata from OCF decoder

It might be nice to be able to access the schema or other metadata from a ocf.Decoder object.

In my use case, I have a few thousand avro files of a few MB each, that I needed to sort on a specific field. The files had a few different schemas, but all had this one field in common. So my intent was to open the file, unmarshal all the records into a generic map[string]interface{}, sort based on the known field, and then marshal the results out with the exact same schema as the input.

So for my use case, accessing the schema of the decoder, so I could pass it to the new encoder would be sufficient (I implemented this in my local fork).
However as there can be other data in the metadata that might be useful to others, I suspect it would be better to expose all the metadata. Or possibly the entire Header object, with a helper method like GetSchema().

Cannot encode optional boolean

I get avro: unable to resolve type bool when I try to marshal a bool value in a schema where the boolean is optional. It does not happen with optional string or int.

package main

import (
	"github.com/hamba/avro"
	"log"
)

func main() {
	type SimpleRecord struct {
		B bool `avro:"b"`
	}

	schema, err := avro.Parse(`{
  "type": "record",
  "name": "simple",
  "namespace": "org.hamba.avro",
  "fields": [
    {
      "name": "b",
      "doc": "",
      "type": [
        "null",
        "boolean"
      ],
      "default": null
    }
  ]
}`)
	if err != nil {
		log.Fatal(err)
	}

	in := SimpleRecord{B: true}
	_, err = avro.Marshal(schema, in)
	if err != nil {
		log.Fatal(err)
	}
}

Convert textual Avro data (Avro in JSON format) to native Go

is it possible Convert textual Avro data (Avro in JSON format) to native Go interface{} ?

Ref: https://github.com/linkedin/goavro

    // Convert textual Avro data (Avro in JSON format) to native Go form
    native, _, err := codec.NativeFromTextual(textual)
    if err != nil {
        fmt.Println(err)
    }

I need to read multi line jsonl file and scan each line,

  1. convert Avro in JSON format string to generic (Go interface{})
  2. convert generic int binary with avro.Marshal(schema, generic)
  3. publish binary to a pubsub messaging system.

How to create an avro file

	schema := `{
	    "type": "record",
	    "name": "simple",
	    "namespace": "org.hamba.avro",
	    "fields" : [
	        {"name": "a", "type": "long"},
	        {"name": "b", "type": "string"}
	    ]
	}`

	type SimpleRecord struct {
		A int64  `avro:"a"`
		B string `avro:"b"`
	}

Can you give me an example?

An issue with encoding

I don't know if it is by design or not, but when not specifying all tags I get a badly encoded byte array. For example:

package main

import (
	"encoding/hex"
	"fmt"
	"log"
	"github.com/hamba/avro"
)

type SimpleRecord struct {
	A string  `avro:"a"`
	C *string `avro:"c"`
}

func main() {
	schema, _ := avro.Parse(`{
    "type": "record",
    "name": "simple",
    "namespace": "org.hamba.avro",
    "fields" : [
        {"name": "a", "type": "string"},
        {"name": "b", "type": ["null", "string"], "default": null},
        {"name": "c", "type": ["null", "string"], "default": null}
    ]}`)

	in := SimpleRecord{A: "test", C: nil}
	data, _ := avro.Marshal(schema, in)
	fmt.Println(hex.Dump(data))
}

The output is:
00000000 08 74 65 73 74 00 |.test.| (it misses 00 and it's not correctly binary serialized avro object)

I would expect having default values encoded, even if they are not specified in a struct.

Unable to resolve a record defined inside an array

Hi. I'm having a weird issue when decoding a record type nested as elements of an array type. I've extracted a runnable solution (requires testify lib)

package sptc

import (
	"bytes"
	"encoding/binary"
	"fmt"

	"testing"

	"github.com/hamba/avro"
	"github.com/stretchr/testify/require"
)

func TestBla(t *testing.T) {
	type D struct {
		X int32 `avro:"x"`
	}
	type C struct {
		D []D `avro:"d"`
	}
	type B struct {
		C C `avro:"c"`
	}
	type A struct {
		ID string `avro:"id"`
	}
	type T struct {
		Record interface{} `avro:"t"`
	}

	d := []D{{X: 1}}
	sptcRecord := B{C: C{D: d}}

	value := T{Record: sptcRecord}

	schema, err := avro.Parse(`
{
    "name": "T",
    "namespace": "a.b.c",
    "type": "record",
    "fields": [
        {
            "name": "t",
            "type": [
                {
                    "name": "A",
                    "type": "record",
                    "fields": [ { "name": "id", "type": "string" } ]
                },
                {
                    "name": "B",
                    "type": "record",
                    "fields": [
                        {
                            "name": "c",
                            "type": {
                                "name": "C",
                                "type": "record",
                                "fields": [
                                    {
                                        "name": "d",
                                        "type": {
                                            "type": "array",
                                            "items": [
                                                {
                                                    "name": "D",
                                                    "type": "record",
                                                    "namespace": "x.y.z.C",
                                                    "fields": [
                                                        {"name": "x", "type": "int"}
                                                    ]
                                                }
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            ]
        }
    ]
}`)
	require.NoError(t, err)

	avro.Register("a.b.c", T{})
	avro.Register("a.b.c.B", B{})
	avro.Register("a.b.c.A", A{})

	data, err := avro.Marshal(schema, value)
	require.NoError(t, err)

	buffer := &bytes.Buffer{}
	err = binary.Write(buffer, binary.BigEndian, data)
	require.NoError(t, err)

	encoded := buffer.Bytes()
	require.NoError(t, err)

	var dst T
	err = avro.Unmarshal(schema, encoded, &dst)
	require.NoError(t, err)

	fmt.Printf("dst = %+v", dst)
}

Things I've noticed that are awkward:

  • Changing from []D to D and making the respective schema changes, this works
  • The inner namespace x.y.z mimics the real example but I've also tested this without a different namespace

This fails, apparently because the type D can't be resolved:

Error: Received unexpected error:
Record: C: D: []sptc.D: avro: unable to resolve type sptc.D

Any help would be appreciated

Missing value of type `["null", "string"]` and a default `null` is not included in encoded value

This looks similar to #23 but I am still seeing a similar error

It looks like a missing value of type ["null", "string"] and a default null is not included in encoding.

package main

import (
	"fmt"
	"log"

	"github.com/hamba/avro"
)

func main() {
	schemaStr := `{
  "type": "record",
  "name": "project",
  "fields": [
    {
      "name": "createdby",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "name",
      "type": [ "null",  "string"],
      "default": null
    }
  ],
  "namsepace": "tesst"
}`

	schema, err := avro.Parse(schemaStr)
	if err != nil {
		log.Fatal(err)
	}

	msg := map[string]interface{}{
		"name": "nnnnn",
	}
	// encode
	encodedMsg, err := avro.Marshal(schema, msg)
	if err != nil {
		panic(err)
	}
	// decode
	decodedMsg := map[string]interface{}{}
	err = avro.Unmarshal(schema, encodedMsg, &decodedMsg)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("Original message:", msg)
	fmt.Println("After encoding/decoding:", decodedMsg)
}

Output:

Original message: map[name:nnnnn]
After encoding/decoding: map[createdby:nnnnn name:<nil>]

Compatible Schema Registry client

Hi, first of all congratz. This is the only library for Go I've found that is truly simple to use and hides the complexity of Go <-> Avro conversions from the user, unlike others such as the linkedin one.

I was wondering if you know of a schema registry client compatible with the interfaces provided here. I started out using

However linkedin/goavro still let's alot of details up to the user like dealing with the quirks of unions in Go struct types, so I (fortunately) found this one.

The srclient/riferrei already provides schema caching but in the form of *srclient.Schema. I think this lib would benefit from a complementary lib (possibly in another repo) to interact with the schema registry and provide in memory caching for the parsed Schemas.

At the moment I can do (pseudo-code):

schema, err := srclient.FetchLatestSchema(topic) // srclient already provides caching, so won't make the network request if it already has the schema
return toHambaSchema(schema) // this however isn't cached and would constantly be parsing the schema

Is there such a client compatible with the interfaces exposed here? Would you see a benefit in such a client?

Support for time.Time format (Logical types)

Hi,

First of all thanks for a great, and simple to use avro serializer/deserializer for go!
It works great for our current streaming applications.

I'm just curious to know if you have adding support for go's time.Time struct in the pipeline?

Currently I am adding a step where I convert time.Time to epoch time saved in int64 (Go) and long (avro). But this entails adding extra steps and structs in my Go code.

Again, thanks for a great avro parser!

Best Mads

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.