klarna / erlavro Goto Github PK
View Code? Open in Web Editor NEWAvro support for Erlang/Elixir (http://avro.apache.org/)
License: Apache License 2.0
Avro support for Erlang/Elixir (http://avro.apache.org/)
License: Apache License 2.0
I've already submitted #97 for this, but I thought I'd open an issue in case there was some dialogue to be had. As mentioned in the issue, I'm not sure if this is controversial in Erlang, but in Elixir it is very common to encode atom values as strings. For example, https://hexdocs.pm/jason/Jason.html#encode/2 happily encodes atom values as strings.
This test shows this functionality: https://github.com/michalmuskala/jason/blob/v1.1.2/test/encode_test.exs#L51
We keep running into issues where we have datastructures that should be avro encoded, but we need to go through the map and change all atoms to strings before encoding. This seems cumbersome, and it would be much simpler if the encoder did it for you.
When doing mix compile or mix test commands on elixir 1.13.4 and erlang 24 and erlang 25, the following warning shows up (even on the latest erlavro release):
dependency :erlavro is using Rebar 2, which is no longer maintained and no longer works in recent Erlang/OTP versions. Remove the :manager option or set it to :rebar3 instead
The suggested fix of listing the dependency in mix.exs as {:erlavro, "~> 2.9.7", manager: :rebar3, override: true},
does remove the warning, but it seems like something may go wrong when newer elixir versions depend on erlang 25 and rebar2 is no longer supported.
Avro schemas can be defined in 2 formats: json and Avro IDL.
erlavro currently supports only JSON version.
Maybe it would worth to have support for avdl also?
Hi there,
My application use erlavro to encode message in avro format and produce to Kafka, it worked well before, but today I start it and receive following error
** (Mix) Could not start application collector: Collector.Application.start(:normal, []) returned an error: shutdown: failed to start child: Collector.AvroEncoder ** (EXIT) an exception was raised: ** (FunctionClauseError) no function clause matching in :avro_schema_store.import_files/2 (erlavro) /media/duyrau/WORK/LEARNING/Elixir/collector/deps/erlavro/src/avro_schema_store.erl:108: :avro_schema_store.import_files(['schema/click.avsc'], #Reference<0.2588577052.1602355204.236674>) (collector) lib/collector/avro_encoder.ex:9: Collector.AvroEncoder.init/1 (stdlib) gen_server.erl:365: :gen_server.init_it/2 (stdlib) gen_server.erl:333: :gen_server.init_it/6 (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
My code isn't changed, it worked when I use Elixir 1.4.2, now I update to 1.4.4 and it crash.
Elixir version
Erlang/OTP 20 [RELEASE CANDIDATE 2] [erts-9.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false] Elixir 1.4.4
What should I do?
Thanks
The specification https://avro.apache.org/docs/1.8.1/spec.html#names says that
In record, enum and fixed definitions, the fullname is determined in one of the following ways:
...
- A fullname is specified. If the name specified contains a dot, then it is assumed to be a fullname, and any namespace also specified is ignored. For example, use "name": "org.foo.X" to indicate the fullname org.foo.X.
erlavro
seems to follow that only partially: it understands a record name containing dots as name+namespace, but then also requires it to be equal to either a global namespace or explicitly given one:
https://github.com/klarna/erlavro/blob/2.4.0/src/avro_record.erl#L74
Here's an example from Elixir shell:
iex(7)> schema = ~S/ {
...(7)> "type": "record",
...(7)> "namespace": "org.foo",
...(7)> "name": "org.foo.X",
...(7)> "fields": [
...(7)> {
...(7)> "type": "string",
...(7)> "name": "field1"
...(7)> }
...(7)> ]
...(7)> } /
" {\n \"type\": \"record\",\n \"namespace\": \"org.foo\",\n \"name\": \"org.foo.X\",\n \"fields\": [\n {\n \"type\": \"string\",\n \"name\": \"field1\"\n }\n ]\n } "
iex(8)> :avro_json_decoder.decode_schema(schema)
{:avro_record_type, "X", "org.foo", "", [],
[{:avro_record_field, "field1", "", {:avro_primitive_type, "string", []},
:undefined, :ascending, []}], "org.foo.X", []}
iex(9)> schema = ~S/ {
...(9)> "type": "record",
...(9)> "name": "org.foo.X",
...(9)> "fields": [
...(9)> {
...(9)> "type": "string",
...(9)> "name": "field1"
...(9)> }
...(9)> ]
...(9)> } /
" {\n \"type\": \"record\",\n \"name\": \"org.foo.X\",\n \"fields\": [\n {\n \"type\": \"string\",\n \"name\": \"field1\"\n }\n ]\n } "
iex(10)> :avro_json_decoder.decode_schema(schema)
(MatchError) no match of right hand side value: false
(erlavro) src/avro_record.erl:72: :avro_record.type/3
I'm trying to decode an OCF with invalid values for the namespace:
{"type":"record","name":"null","namespace":"null","fields":[{"name":"partition","type":"int"}, // [...]
Despite the namespace, erlavro would be able to decode it correctly except for a sanity check in avro_utils:579:
?ERROR_IF(lists:member(CanonicalName, ReservedNames),
{reserved, Name, CanonicalName}).
Which produces:
** (ErlangError) Erlang error: {:reserved, "null", "null"}
code: :avro_ocf.decode_file("redacted file path")
stacktrace:
src/avro_util.erl:579: :avro_util.verify_type_name/1
src/avro_record.erl:90: :avro_record.type/3
src/avro_json_decoder.erl:72: :avro_json_decoder.decode_schema/2
src/avro_ocf.erl:77: :avro_ocf.decode_binary/1
test/avro_archive_file_test.exs:14: (test)
Should that verification be here when deserializing existing data ? Even if "null" is not a value accepted by the spec, erlavro works fine if we bypass that check; it is able to read the OCF file correctly.
I don't know the code base enough to understand if it makes sense from that point of view. From a user of the lib's point of view, on the other hand, it seems regrettable to prevent erlavro from being able to decode a file correctly when it's capable of doing it, if you see what I mean. Maybe there could be a "strict mode" option ?
If relevant, we are ready to work on a PR to improve the situation.
I am using erlavro to generate Kafka messages, and am generating a schema fingerprint as described in https://avro.apache.org/docs/1.8.2/spec.html#schema_fingerprints
(I have written an Erlang implementation of the Rabin fingerprint which I could contribute to erlavro, if you would like it.)
I want to generate the fingerprint based on the Parsing Canonical Form as described in
https://avro.apache.org/docs/1.8.2/spec.html#Transforming+into+Parsing+Canonical+Form
I am trying to do this by running
avro_json_encoder:encode_schema(avro_json_decoder:decode_schema(Json)
avro_json_encoder:encode_schema outputs fields in a different order for records, though.
It should be:
"[ORDER] Order the appearance of fields of JSON objects as follows: name, type, fields, symbols, items, values, size. For example, if an object has type, name, and size fields, then the name field should appear first, followed by the type and then the size fields."
Hello there!!!
I'm having some problems decoding a bytes type with decimal as logical type, whenever the default value is set.
Example schema:
{
"doc": "Testing decimals",
"fields": [
{
"default": "\u0000",
"name": "field",
"type": {
"logicalType": "decimal",
"precision": 2,
"scale": 1,
"type": "bytes"
}
}
],
"name": "sampleRecord",
"namespace": "jperi",
"type": "record"
}
The error I get from the elixir consumer is:
{:error,
{:bad_default,
[record: "jperi.sampleRecord", field: "field", reason: :function_clause]}}
The issue can be circumvented if the ignore_bad_default_values
is set to true.
But I don't think the default value is wrongly set. for a bytes type, it should be possible to set the default as \u0000
. In the case of logical type decimal, that would mean that such values is a Zero.
I took a look at the code, but I'm not sure where a change to address this should go. (For example is itparse
, parse_bytes
? why does the parse_bytes
has a \\u00
at the beginning of the pattern match?).
The following diff makes the test suite of the library fail:
diff --git a/test/data/interop.avsc b/test/data/interop.avsc
index 8cfbba2..7be2c72 100644
--- a/test/data/interop.avsc
+++ b/test/data/interop.avsc
@@ -6,7 +6,7 @@
{"name": "boolField", "type": "boolean"},
{"name": "floatField", "type": "float"},
{"name": "doubleField", "type": "double"},
- {"name": "bytesField", "type": "bytes"},
+ {"name": "bytesField", "type": "bytes", "default": "\u0000"},
{"name": "nullField", "type": "null"},
{"name": "arrayField", "type": {"type": "array", "items": "double"}},
{"name": "mapField", "type":
Do you think that this usecase is something that should be available to users of the library?
Thank you!!
jsone version 1.4.6 used by erlavro does ship with a vendored rebar binary, which does build within deps/
in a mix project instead of _build/
. This is causing issues on our end when switching OTP versions and with caching in CI. Could this be bumped to at least 1.5.0, which no longer vendors rebar, letting jsone build things in the expected folders.
https://diff.hex.pm/diff/jsone/1.4.6..1.5.0 seems to suggest the only BC incompatible change is requiring OTP 21 for the use of the 'OTP_RELEASE'
macro.
Hey there ...
Using version 2.3.1 - decoding schema was working nicely. After upgrade however it fails for some default values for primitives (such as long) which specify null in a union.
Trace:
** (ErlangError) Erlang error: {:bad_default, [record: "MyCrappySchema", field: "someLongTypeField", reason: :function_clause]}
(erlavro) src/avro_json_decoder.erl:368: :avro_json_decoder.parse_prim(:null, {:avro_primitive_type, "long", []})
(erlavro) src/avro_json_decoder.erl:334: anonymous fn/3 in :avro_json_decoder.parse/5
(erlavro) src/avro_record.erl:220: :avro_record.do_default/4
(erlavro) src/avro_record.erl:169: anonymous fn/3 in :avro_record.parse_defaults/2
(stdlib) lists.erl:1239: :lists.map/2
(stdlib) lists.erl:1239: :lists.map/2
(erlavro) src/avro_record.erl:177: :avro_record.parse_defaults/2
(erlavro) src/avro_schema_store.erl:236: :avro_schema_store.import_schema_json/3
I reviewed the versions between 2.3.1 and see some handling of default values changed. I see there is a ignore_bad_default_values
option when using decode_schema
; however I don't call that directly.
I make a schema store using this fun:
store = :avro_schema_store
.import_schema_json(schema_json, :avro_schema_store.new([]))
:avro.make_decoder(store, [])
Then use the decoder to deserialize messages. GIven this usage pattern, where is the best place to pass the ignore_bad_default_values
option; or if there is another way ; what is it?
is there any plan to add support for map data structure. am I correct in assuming the current format [{key, val}]
is used for record/map type because when the project started there was no map in erlang?
{
"namespace": "com.example",
"type": "record",
"name": "Record",
"fields": [
{"name": "UUID", "type": ["null", "string"], "default": "null"},
{"name": "Version", "type": ["null", "string"], "default": "null"}
]
}
The above is valid avro specification. But when the make_simple_decoder is applied to it, it fails complaining about a bad default. If you remove the quotes from the null it works. It should also match the string "null" when determining avro null types.
Avro IDL can generate schema which allow default values of null for any type. Current erlavro decoder doesn't decode these.
I have a local change in the json decoder which allows the decoding; however if I update the interop with a field that has a default value of null; all kinds of things start to break.
I could just write a test that covers the decoding case I have and ignore any encoding changes for now.
Thoughts?
Thank you for merging #92
Can this tag be published to hex.pm for consumption? Thank you!
We're using erlavro in a system with lots of mobile devices and encountered the need to be able to deal with avro data, which was encoded using a newer or older version of a schema without knowing the exact writers schema.
For forward compatibility we made erlavro be able to not require the whole data to be matched, so we can append fields. While for backwards compatibility we made it use defaults of record fields if an explicit value for it was missing. This was aligned to the schema resolution suggestions in the spec sans knowing the writers schema.
The following are our "brute force" changes, but we're wondering if there's the possibility to bring that functionality upstream, likely behind some flag to toggle the behavior.
diff --git a/src/avro_binary_decoder.erl b/src/avro_binary_decoder.erl
index 93d480f..a0b6129 100644
--- a/src/avro_binary_decoder.erl
+++ b/src/avro_binary_decoder.erl
@@ -63,7 +63,7 @@ decode(IoData, Type, StoreOrLkupFun) ->
decode(IoData, Type, StoreOrLkupFun, Options) ->
%% return decoded value as raw erlang term directly
Lkup = avro_util:ensure_lkup_fun(StoreOrLkupFun),
- {Value, <<>>} = do_decode(IoData, Type, Lkup, Options),
+ {Value, _} = do_decode(IoData, Type, Lkup, Options),
Value.
%% @doc decode_stream/4 equivalent with default hook fun.
@@ -149,12 +149,18 @@ dec(Bin, T, _Lkup, #{hook := Hook}) when ?IS_FIXED_TYPE(T) ->
-spec dec_record(binary(), record_type(), lkup_fun(),
decoder_options()) -> {avro:out(), binary()}.
dec_record(Bin, T, Lkup, #{record_type := RecordType} = Options) ->
- FieldTypes = avro_record:get_all_field_types(T),
+ FieldTypes = avro_record:get_all_field_data(T),
{FieldValuesReversed, Tail} =
lists:foldl(
- fun({FieldName, FieldType}, {Values, BinIn}) ->
- {Value, BinOut} = dec_item(T, FieldName, FieldType,
- BinIn, Lkup, Options),
+ fun({FieldName, FieldType, ?NO_VALUE}, {Values, BinIn}) ->
+ {Value, BinOut} = dec_item(T, FieldName, FieldType, BinIn, Lkup, Options),
+ {[{FieldName, Value} | Values], BinOut};
+ ({FieldName, FieldType, Default}, {Values, BinIn}) ->
+ {Value, BinOut} = try dec_item(T, FieldName, FieldType, BinIn, Lkup, Options) of
+ {DecodedValue, BinRest} -> {DecodedValue, BinRest}
+ catch
+ _:_ -> {Default, BinIn}
+ end,
{[{FieldName, Value} | Values], BinOut}
end, {[], Bin}, FieldTypes),
FieldValues1 = case RecordType of
Hey guys, I want to say thanks for an amazing library ๐๐ผ
I've been using it to provide an Elixir library Avrora and I'm missing a small piece of functionality โ a decoder hook for OCF similar to the binary decoder hook.
The need comes from handing null
values of the primitive fields with a value of NULL
. I would like to transform them into nil
values of Elixir. I can do it with a binary decoder via hook, but I can't with OCF.
I'm not sure about the interface (should it be the props list or just a hook) for OCF, but Idea is that the test might look like this
decoder_hook_test() ->
InteropOcfFile = test_data("interop.ocf"),
Hook = fun(Type, __SubNameOrId__, Data, DecodeFun) ->
case avro:get_type_name(Type) of
null -> {<<"modifiedNullValue">>, Data}
_ -> DecodeFun(Data)
end
end,
{_, _, [Object]} = avro_ocf:decode_file(InteropOcfFile, Hook),
?assertEqual(<<"modifiedNullValue">>,
proplists:get_value(<<"nullField">>, Object)).
WDYT?
Line 51 in c55e552
This line seems to cause :epp to error for all records using it and therefore the only record extractable by elixir is :avro_value
.
In Avro specification, it is said that only certain values are allowed to be used as defaults (depending on the field's type), but the library does not validate that. For example, this produces a valid output:
1> Schema = "{ \"type\": \"long\", \"name\": \"field8\", \"default\":\"fff\" }".
2> avro_json_decoder.decode_schema(Schema)
{avro_primitive_type, "long", [{"default", "fff"}]}
But such schema does not make any sense as "fff"
string is not a valid value for long
type.
It would be better if parser raised an error to signify that given default value is invalid.
How to create a binary with the scheme as in interop.ocf?
Hi!
I am having some trouble decoding some ocf files with this client that we have no issues with using a ruby client. If there is a better place to seek help please let me know and close this issue!
I have 2 simple avro schemas:
post.avro
{
"type": "record",
"name": "post_event",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "created_at", "type": "double" },
{ "name": "body", "type": "string" },
{ "name": "nsfw", "type": "boolean" },
{ "name": "url", "type": "string" }
]
}
post_was_loved.avro (stripped to smallest failing schema)
[
{
"type": "record",
"name": "post_was_loved",
"fields" : [
{ "name": "post", "type": "post" }
]
}
]
avro_ocf.decode_file/1 works fine with the first schema, however when trying to import the second I get a error:
** (ErlangError) erlang error: {:unnamed_type, {:avro_union_type, {1, {0, "post_was_loved", nil, nil}}, {1, {"post_was_loved", {0, true}, nil, nil}}}}
(erlavro) /Users/alan/Ello/ello_event_stream/deps/erlavro/src/avro_schema_store.erl:161: :avro_schema_store.add_type/3
(erlavro) /Users/alan/Ello/ello_event_stream/deps/erlavro/src/avro_ocf.erl:65: :avro_ocf.decode_file/1
(Note I am using Elixir and the master branch of this repo)
I am not an avro (nor erlang!) expert so I am very likely doing something wrong, but any guidance or troubleshooting advice would be very appreciated!
Thanks for the hard work on this repo!
I was trying out erlavro for schema evolution. I'm not sure what the stance on evolution is here. Is it not supported? Or is there a way to support it that I'm not aware of. For example:
the schema (block-ethereum.avsc
) has a lot of fields deleted (in contrast to the data provided which was encoded using an earlier version). No additional fields are added or modification done on any of the retained fields in this new schema:
{
"type":"record",
"namespace":"com.covalenthq.brp.avro",
"name":"ReplicationSegment",
"fields":[
{
"name":"startBlock",
"type":"long"
},
{
"name":"endBlock",
"type":"long"
},
{
"name":"elements",
"type":"long"
},
{
"name":"codecVersion",
"type":"double",
"default":0.33
}
]
}
the encoded binary is here
when i try running the decoder:
iex> {:ok, scheman} = Avrora.Schema.Name.parse("block-ethereum")
iex> scheman1 = %Avrora.Schema{full_name: scheman.name}
iex> {:ok, schema} = Avrora.Resolver.resolve(scheman1.full_name)
iex> :avro_binary_decoder.decode(specimen, schema.full_name, schema.lookup_table, Avrora.AvroDecoderOptions.options())
** (MatchError) no match of right hand side value: {%{"codecVersion" => 4.3256427312130535e-37, "elements" => 24, "endBlock" => 66, "startBlock" => 1}, <<97, 100, 54, 99, 57, 57, 54, 51, 97, 56, 55, 52, 56, 48, 54, 97, 51, 102, 50, 51, 51, 49, 49, 50, 50, 99, 56, 50, 99, 54, 50, 55, 48, 49, 98, 55, 54, 48, 55, 50, 98, 48, 100, 55, 55, 50, 49, 56, ...>>}
(erlavro 2.9.8) /Users/sudeep/repos/rudder/deps/erlavro/src/avro_binary_decoder.erl:66: :avro_binary_decoder.decode/4
iex:6: (file)
How can i support backwards of forward evolution of schema here?
Thanks for the great project!
I am interested in implementing this myself, but I thought I'd raise it as an issue if someone got to it first. For Elixir clients of erlavro
, nil
needs to be swapped with :null
on the encoding side, and on the decoding side, :null
should be swapped with nil
.
Since this might not be desirable for all consumers, it seems that this could just be an option passed to :avro.make_simple_encoder/2
and :avro.make_simple_decoder/2
Environment
Description
When I git clone this repo and make it on mac. It shows error while compiling jsone.erl:
rebar3 get-deps
===> Verifying dependencies...
rebar3 compile
===> Verifying dependencies...
===> Compiling jsone
===> Compiling _build/default/lib/jsone/src/jsone.erl failed
_build/default/lib/jsone/src/jsone.erl:294: erlang:get_stacktrace/0: deprecated; use the new try/catch syntax for retrieving the stack backtrace
_build/default/lib/jsone/src/jsone.erl:343: erlang:get_stacktrace/0: deprecated; use the new try/catch syntax for retrieving the stack backtrace
make: *** [compile] Error 1
And I tried to make it on my ubuntu server which erlang/otp version is 20. It makes successfully.
So I think may be you need to use new syntax to retrieve the stack backtrace.
I'm usinn erlavro to decode messages from Kafka, that contains the following element structure:
{
"name":Limit",
"type":["null",{
"type":"bytes",
"scale":2,
"precision":64,
"connect.version":1,
"connect.parameters":{"scale":"2"},
"connect.name":"org.apache.kafka.connect.data.Decimal",
"logicalType":"decimal"
}],
"default":null
}.
Decode function fails trying to find an unexisting clause of avro_util:ensure_binary/1, that doesn't handle the object cast ({"scale":"2"}).
Could you please help with the issue?
Hey! Just wanted to flag that the use of snappyer in recent versions seems to now require a C++ toolchain in order to build erlavro.
This is fixable at the end-user level but breaking in docker environments. For example, with FROM hexpm/elixir:1.13.3-erlang-24.2.2-ubuntu-bionic-20210930 as builder
, I'm now seeing a build error like:
10:33:49.099Z #13 40.78 ===> Compiling /app/deps/snappyer/c_src/snappy-sinksource.cc
10:33:49.099Z #13 40.78 ===> sh: exec: line 1: c++: not found
10:33:49.099Z #13 40.78
10:33:49.099Z #13 40.80 ** (Mix) Could not compile dependency :snappyer, "/root/.mix/rebar3 bare compile --paths /app/_build/test/lib/*/ebin" command failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile snappyer", update it with "mix deps.update snappyer" or clean it with "mix deps.clean snappyer"
10:33:49.099Z #13 ERROR: executor failed running [/bin/sh -c mix do deps.get --only $MIX_ENV, deps.compile]: exit code: 1
10:33:49.099Z ------
10:33:49.099Z > [ 9/15] RUN mix do deps.get --only test, deps.compile:
10:33:49.099Z ------
10:33:49.099Z executor failed running [/bin/sh -c mix do deps.get --only $MIX_ENV, deps.compile]: exit code: 1
I only flag this because I wasn't sure of the intentions and wanted to provide the datapoint. I went from 2.9.3 to 2.9.8 and saw this issue, which is unexpected for patch upgrades.
I was looking to decode keys to atoms as an option in cogini/avro_schema@2c07ca2 and cogini/avro_schema#25
However, as far as I can reason through the erlavro decoder hook code, I don't think it is possible to implement this as a decoder hook. In this case, it might make more sense to implement this feature directly into erlavro
, and provide options similar to the Jason.decode/2 function.
I am happy to implement, but I would appreciate feedback if I'm wasting my time and this is easily implemented as a decoder hook. Thanks!
Trying to compile with mix triggered a deprecation error from crypto:
โ mix deps.compile erlavro
==> erlavro (compile)
src/avro_ocf.erl:134: crypto:rand_bytes/1 is deprecated and will be removed in a future release; use crypto:strong_rand_bytes/1
Compiling src/avro_ocf.erl failed:
ERROR: compile failed while processing /Users/dhembree/code/transformer/deps/erlavro: rebar_abort
** (Mix) Could not compile dependency :erlavro, "/Users/dhembree/.mix/rebar compile skip_deps=true deps_dir="/Users/dhembree/code/transformer/_build/dev/lib"" command failed. You can recompile this dependency with "mix deps.compile erlavro", update it with "mix deps.update erlavro" or clean it with "mix deps.clean erlavro"
This can be fixed by using strong_rand_bytes
here:
https://github.com/klarna/erlavro/blob/master/src/avro_ocf.erl#L134
Can you please publish the latest releases to Hex.pm?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.