tarantool / avro-schema Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 3.0 878 KB

Apache Avro schema tools for Tarantool

CMake 1.32% Lua 74.97% C 7.34% Makefile 0.71% Perl 0.35% Shell 5.31% C++ 8.68% Python 0.51% HTML 0.58% JavaScript 0.23%

avro-schema's Introduction

Tarantool

Tarantool is an in-memory computing platform consisting of a database and an application server.

It is distributed under BSD 2-Clause terms.

Key features of the application server:

Heavily optimized Lua interpreter with incredibly fast tracing JIT compiler, based on LuaJIT 2.1.
Cooperative multitasking, non-blocking IO.
Persistent queues.
Sharding.
Cluster and application management framework.
Access to external databases such as MySQL and PostgreSQL.
A rich set of built-in and standalone modules.

Key features of the database:

MessagePack data format and MessagePack based client-server protocol.
Two data engines: 100% in-memory with complete WAL-based persistence and an own implementation of LSM-tree, to use with large data sets.
Multiple index types: HASH, TREE, RTREE, BITSET.
Document oriented JSON path indexes.
Asynchronous master-master replication.
Synchronous quorum-based replication.
RAFT-based automatic leader election for the single-leader configuration.
Authentication and access control.
ANSI SQL, including views, joins, referential and check constraints.
Connectors for many programming languages.
The database is a C extension of the application server and can be turned off.

Supported platforms are Linux (x86_64, aarch64), Mac OS X (x86_64, M1), FreeBSD (x86_64).

Tarantool is ideal for data-enriched components of scalable Web architecture: queue servers, caches, stateful Web applications.

To download and install Tarantool as a binary package for your OS or using Docker, please see the download instructions.

To build Tarantool from source, see detailed instructions in the Tarantool documentation.

To find modules, connectors and tools for Tarantool, check out our Awesome Tarantool list.

Please report bugs to our issue tracker. We also warmly welcome your feedback on the discussions page and questions on Stack Overflow.

We accept contributions via pull requests. Check out our contributing guide.

Thank you for your interest in Tarantool!

avro-schema's People

Contributors

Stargazers

Watchers

Forkers

dlintw mejedi vikorzel

avro-schema's Issues

Unstable tests in ubuntu xenial

Here is make test output

taransible@sh6:~/avro/tarantool-avro$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.2 LTS
Release:	16.04
Codename:	xenial
taransible@sh6:~/avro/tarantool-avro$ make test
Running tests...
Test project /home/taransible/avro/tarantool-avro
    Start 1: ddt_tests
1/3 Test #1: ddt_tests ........................   Passed    0.24 sec
    Start 2: api_tests
2/3 Test #2: api_tests ........................   Passed    0.05 sec
    Start 3: buf_grow_test
3/3 Test #3: buf_grow_test ....................   Passed    0.06 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =   0.35 sec
taransible@sh6:~/avro/tarantool-avro$ make test
Running tests...
Test project /home/taransible/avro/tarantool-avro
    Start 1: ddt_tests
1/3 Test #1: ddt_tests ........................***Failed    0.26 sec
    Start 2: api_tests
2/3 Test #2: api_tests ........................   Passed    0.04 sec
    Start 3: buf_grow_test
3/3 Test #3: buf_grow_test ....................   Passed    0.04 sec

67% tests passed, 1 tests failed out of 3

Total Test time (real) =   0.34 sec

The following tests FAILED:
	  1 - ddt_tests (Failed)
Errors while running CTest
Makefile:61: recipe for target 'test' failed
make: *** [test] Error 8

ddt tests chance of failure ~= 30%

Looks like floating bug

Consistent use of avro-schema vs. avro_schema.

The rockspec name is avro-schema, so the module name should be the same.
All the examples should be updated.

avro-schema does not support default value for record

Subj.

it says: 'too complex'

`fixed` type nullability do not work

local json = require('json')
local schema, ok, res, handle
local schema = [[
        {"type":"record","name":"X","fields":
                [{"name":"f1","type":{"type":"fixed*","name":"ff","size":4}},
                {"name":"f2","type":"int"}]}]]
schema = json.decode(schema)
ok, handle = avro.create(schema)
local obj = json.decode('{"f2":1}')
ok, res = avro.validate(handle, obj)
print("validate", json.encode(res))

result

validate	"Field f1 missing"

Add tests

objects are GC-ed
resolver cache
flatten/unflatten API
different Avro schema features
error cases

Unions do not support preserve feature

local schema = {
    {
        type = "record",
        name = "X",
        fields = {
            { name = "f1", type = "int" }
        },
        extra = 1
    },
    "string",
    extra = 2
}
print(schema.extra)
ok, res = avro.create(schema, {preserve_in_ast={"extra"}})
schema, ok = avro.export(res)
print(schema.extra)

result

2
nil

Feature flag to allow arbitrary UTF-8 in enums

I'd like to be able to use UTF-8 in enums. This is strictly against what the avro standard, but enterprise customers need this.

Example:

#!/usr/bin/env tarantool

local avro = require('avro_schema')
local json = require('json')

local model = [[
[
    {
    "type": "enum",
    "name": "CyrillicEnum",
    "symbols": ["Значение 1","Значение 2","Значение 3"]
    }
]
]]


local dict = json.decode(model)

local ok, schema = avro.create(dict)

if not ok then
    error(schema)
end

Expected behavior: parsed and validated correctly

Actual behavior:

./avro_test.lua:22: <union>/<branch-1>/CyrillicEnum: Bad enum symbol name: Значение 1

In order to not disrupt regular users' workflow, I propose to hide it behind a feature flag.

Cyclic type declaration

Sometimes I need types that contain each other in their fields. Example:

{
        {
            name="typeA",
            type="record",
            fields={{name='fieldA', type='typeB'}}
        },
        {
            name="typeB",
            type="record",
            fields={{name='fieldB', type='typeA'}},
        }
}

Currently the library can't parse such schema, complaining that typeB is not defined.

xflatten fails to update two fields in a nested record

How to reproduce:

#!/usr/bin/tarantool

local avro = require('avro_schema')
local log = require('log')
local json = require('json')

local schema={
    name='foo',
    type='record',
    fields={
        {name='bar', type={
            name='bar_t',
            type='record',
            fields={
                {name='value_1', type='string'},
                {name='value_2', type='string'},
            }}  
        },  
    }   
}

local _, schema_handler = avro.create(schema)
local _, model = avro.compile(schema_handler)

-- uncomment to produce case 2
-- model.unflatten({"ccc", "ddd"})

local obj_partial = {bar={value_1="aaa",value_2="bbb"}}
local ok, exps = model.xflatten(obj_partial)
log.info('ok: ' .. json.encode(ok))
log.info('exps: ' .. json.encode(exps))

Example of output:

./avro_case_3.lua: ok: false
./avro_case_3.lua: exps: "Internal error: unknown code"

Or with uncommented unflatten:

./avro_case_3.lua: ok: true
./avro_case_3.lua: exps: [["=",2,"bbb"],"value_2",["=",1,"aaa"]]

gdb output after break in unparse_msgpack() ('unknown code' case, w/ commented unflatten):

(gdb) p nitems
$1 = 10
(gdb) p state->ot
$4 = (uint8_t *) 0x9427d0 "\v\v\022\004\b"
(gdb) p state->ot+5
$9 = (uint8_t *) 0x9427d5 ""
(gdb) p state->ot+6
$10 = (uint8_t *) 0x9427d6 "\v\022\004\b"
(gdb) p *typeid
$14 = 0 '\000'

I will follow up with temporary workarounds in the comments.

Double definition of field with the same name leads to unintelligible error

This is what I do:

#!/usr/bin/env tarantool

local avro = require('avro_schema')
local json = require('json')

local model = [[
[
    {
        "name": "typeA",
        "type": "record",
        "fields": [
            {"name": "foo", "type": "int"},
            {"name": "foo", "type": "int"}
        ]
    }
]
]]


local dict = json.decode(model)

local ok, schema = avro.create(dict)

if not ok then
    error(schema)
end

Actual result:

./avro_test.lua:25: .../scratch/.rocks/share/tarantool/avro_schema/frontend.lua:508: attempt to index local 'next_node' (a number value)

Expected result:

An intelligible message saying that field 'foo' has been declared multiple times.

Incorrect behavior of `get_types()` for complex types

avro.get_types() на такую схему

{
   "type":"record",
   "fields":[
         {"type":"long","name":"individual_id"},
         {
              "type":[
                    "string",
                     {"type":"array","items":"string"}
                ],
                "name":"last_name"
          },
          {"type":"string","name":"first_name"}
    ],
    "name":"individual"
}

возвращает ["long",null,null,"string"]

Permit forward type references

Please implement a flag that will allow to reference types that are declared below their usage point.

This is needed for business customers to simplify schema definition

Investigate random test failures

validate/196: FAILED (schema.validate: "Not a \"Hello!\": Hello!" instead of "Not a double: Hello!")

Presumably LuaJIT bug.

can not use complex type in record fields

tarantool> schema

type: record
name: test
fields:
- type: string
  name: data
- type: enum
  name: status
  symbols:
  - A
  - B
    ...

tarantool> avro.create(schema)

false
'test/status: Unknown Avro type: enum'
...

tarantool>

but here create call is expected to success

Implement schema fingerprinting and parsing canonical form

Please refer to the spec: https://avro.apache.org/docs/1.8.1/spec.html#Schema+Fingerprints

Fingerprinting is necessary to tag objects with a schema ID to figure out which schema was used to produce them

get_schema_names inconsistency

Have problem with nested records:
Output

---
- a.b
- a.c
- d
- e
- f
- j
...

path a is alse valid =(

`get_names()`, `get_types()` do not take service fields into account

Service fields that are used for flattening and unflattening are not considered in mentioned functions

Nullability feature does not work for `get_names()` and `get_types()`

Schema:

{
        "type": "record",
        "name": "person",
        "fields": [
            {"name": "uid", "type": "long"},
            {"name": "last_name", "type": "string*"},
            {"name": "first_name", "type": "string"},
            {"name": "additional_name", "type": "string*"}
        ]
 }

get_types(handle): ["long","string","string","string"]
get_names(handle): ["uid","last_name","first_name","additional_name"]

I proposed changes for get_types() in commit 6bef3cb

flatten({}) fails on a 'record' schema with all fields have default values

#!/usr/bin/tarantool

local avro = require('avro_schema')
local yaml = require('yaml')

local schema={
    name='example',
    type='record',
    fields={
        {name='f1', type='long', default=0},
        {name='f2', type='long', default=0},
        {name='f3', type='long', default=0},
    }   
}

local _, created = avro.create(schema)
local _, compiled = avro.compile(created)
local _, fl = compiled.flatten({})
print(yaml.encode(fl))

Expected:

--- [0, 0, 0]
...

Got:

--- Expecting MAP, encountered ARRAY
...

Generate tuple:update() operation from a sequence of schema field updates

Make it possible to assign a value of a Tarantool tuple without actually fetching and de-flattening it, but purely based on its schema, e.g. when I want to set a single field.

Create Tarantool update statement from input

xflatten method does not work for nullable fields

We have schema as follows:

    "user": {
        "type": "record",
        "name": "service",
        "fields": [
            {"name": "bar", "type": "long*"}
            {
                  "name": "nested",
                  "type": {
                      "name": "nested",
                      "type": "record",
                      "fields": [{ "name":"foo", "type": "long*" }]
                   }
             }
        ],
     }

Does not work neither for normal nor nested fields:

tarantool> compiled.flatten({nested={foo=100}, bar=200})
---
- true
- [1, 200, 1, 100]
...

tarantool> compiled.xflatten({nested={foo=100}, bar=200})
---
- true
- [['=', 2, 1], 100, ['=', 1, 1], 200]
...

My dirty hack:

local function good_xflatten(schema, data)
    local ok, fields = schema.compiled.xflatten(data)
    if ok then
        local is_incorrect = false
        for _, field in ipairs(fields) do
            if type(field) ~= 'table' then
                is_incorrect = true
            end
        end
        if is_incorrect then
            fields = recalculate_fields_in_xflatten(schema, data)
        end
    end
    return ok, fields
end

local function recalculate_fields_in_xflatten(schema, data)
    local field_names = avro.get_names(schema.pure)
    local field_types = avro.get_types(schema.pure)
    local names, vals = keys_and_vals_from_data(data)
    local result = {}
    for id, name in ipairs(names) do
        local start_position = 4
        local offset = 0
        for i, field in ipairs(field_names) do
            if string.endswith(field_types[i], '*') then
                -- take nullability offset into account
                offset = offset + 1
            end
            if field == name then
                if string.endswith(field_types[i], '*') then
                    -- previous field is a type value
                    -- (offset - 1) since we have to exclude itself
                    local type_field_value = vals[id] == box.NULL and 0 or 1
                    table.insert(result, { "=", start_position + i + offset - 1, type_field_value })
                end
                table.insert(result, { "=", start_position + i + offset, vals[id] })
            end
        end
    end
    return result
end

namespace should be ignored in presence of fullname

schema

{
    type = "record",
    name = "ns2.ns3.system_settings_protected",
    namespace = "ns1",
...
}

is parsed like

{
    type = "record",
    name = "ns1.system_settings_protected"
...
}

and according to specification should be parsed as

{
    type = "record",
    name = "ns2.ns3.system_settings_protected"
...
}

No support for optional fields

Schema:

{
        "type": "record",
        "name": "User",
        "fields": [
            {"name": "username", "type": "string"},
            {"name": "phone", "type": "long"},
            {"name": "age", "type": ["int", "null"]}
        ]
}

and given data:

{
 "username": "tester",
 "phone": 123456789,
 "age": 123
}

returns on validator:

[
  "age: Expecting NIL or MAP, encountered LONG"
]

It should parse correctly, because Avro schema is correct.

luarocks --local install doesn't work

$ luarocks install --local avro-schema
Installing http://rocks.tarantool.org/avro-schema-scm-1.rockspec...
Using http://rocks.tarantool.org/avro-schema-scm-1.rockspec... switching to 'build' mode
Cloning into 'avro-schema'...
remote: Counting objects: 99, done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 99 (delta 14), reused 64 (delta 14), pack-reused 0
Receiving objects: 100% (99/99), 120.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (14/14), done.
Checking connectivity... done.
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find TARANTOOL (missing: TARANTOOL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  FindTarantool.cmake:26 (find_package_handle_standard_args)
  CMakeLists.txt:9 (find_package)


-- Configuring incomplete, errors occurred!
See also "/tmp/luarocks_avro-schema-scm-1-4506/avro-schema/CMakeFiles/CMakeOutput.log".

Error: Build error: Failed cmake.

Compilation fails on ubuntu precise

[ubuntu-precise] cc1plus: error: unrecognized command line option ‘-std=c++11’

Nullability is broken when inside of array of records

This testcase fails to flatten/validate:

kyukhin@claudius:/export/kyukhin/tarantool/tmp$ cat 1.lua
#!/usr/bin/env tarantool

local yaml = require 'yaml'
local avro_schema = require 'avro_schema'

schema = yaml.decode([[
--- [
{
  'type': 'record', 'name': 'S1', 'fields': [
    {'name': 'A', 'type':
      {'type': 'array', 'items':
        { 'type': 'record', 'name': 'S2', 'fields': [
          {'name': 'b', 'type': 'string*'},
          {'name': 'a', 'type': 'string'}
        ]}
     }
   }]
}
]
...
]])

object = yaml.decode([[
---
  A:
  -
   a: "g"

...]])

local ok, handle = avro_schema.create(schema)

if not ok then
    error(handle)
end

local ok, ast = avro_schema.compile({handle, dump_il='il', dump_src='ir.lua', debug=true, enable_loop_peeling=false, enable_fast_strings=false})

local ok, f = ast.flatten({S1=object})

if not ok then
    error(f)
end

Enabled debug info in runtime gives:

kyukhin@claudius:/export/kyukhin/tarantool/tmp$ ./avro_test.lua
parse_msgpack; s: 81 A2 53 31 81 A1 41 91 81 A1 61 A1 67
unparse_msgpack; *typeid: 0x00 (0) -- (zero); value: 1041
unparse_msgpack; *typeid: 0x0B (11) -- PUTARRAYC; value: 2
unparse_msgpack; *typeid: 0x04 (4) -- PUTINT / PUTLONG; value: 0
unparse_msgpack; *typeid: 0x0B (11) -- PUTARRAYC; value: 1
unparse_msgpack; *typeid: 0x0B (11) -- PUTARRAYC; value: 1
unparse_msgpack; *typeid: 0x0B (11) -- PUTARRAYC; value: 3
unparse_msgpack; *typeid: 0x00 (0) -- (zero); value: 0
unparse_msgpack; *typeid: 0x00 (0) -- (zero); value: 0
unparse_msgpack; *typeid: 0x08 (8) -- PUTSTR; value: 1
00; unparse_msgpack; in for; *typeid: 0x0B (11) -- PUTARRAYC; value: 2
01; unparse_msgpack; in for; *typeid: 0x04 (4) -- PUTINT / PUTLONG; value: 0
02; unparse_msgpack; in for; *typeid: 0x0B (11) -- PUTARRAYC; value: 1
03; unparse_msgpack; in for; *typeid: 0x0B (11) -- PUTARRAYC; value: 1
04; unparse_msgpack; in for; *typeid: 0x0B (11) -- PUTARRAYC; value: 3
05; unparse_msgpack; in for; *typeid: 0x00 (0) -- (zero); value: 0
./1.lua:42: Internal error: unknown code (0)

docs for fingerprints

fingerprints & additional attributes in ast should be documented

avro_schema.export(handle) must not change type schema for nullable fields

We define following schema

schema = json.decode([[
     {"name": "foo", "type": "record", "fields": [{"name": "bar", "type": "long*"}]}
]])
ok, handle = avro.create(schema)

After export nullable field's type will be replaced with a table { type, nullable }

avro.export(handle)
---
- type: record
  fields:
  - type:
      type: long
      nullable: true
    name: bar
  name: foo
...

This behavior is unobvious and follows internal representation which is unsuitable for external API.

It should return type: long*

Get rid of non-compliant nullable fields

Support of nullable fields is already present in avro like this: ["null", "Type"].
We can tune our "wire format" to not require a type tag in case when a union contains 2 types, one of which is null.

I'm against the asterisk symbol, because it makes our schema non-compliant with the standard.

Not a float: 452.56

I've got a strange report that valid nullable float is not recognized as a float:

#!/usr/bin/env tarantool

local avro = require('avro_schema')
local json = require('json')

local model = [[
[
    {
    "name": "Foo",
    "type": "record",
    "fields": [{"name": "bar", "type": "float*"}]
    }
]
]]


local dict = json.decode(model)

local ok, schema = avro.create(dict)

if not ok then
    error(schema)
end


local obj = {bar=452.56}

local ok, normalized = avro.validate(schema, {Foo=obj})

if not ok then
    error(normalized)
end

Non valid fingerprint returned in case of type refs

Type references are represented in AST as tables of the type itself.
Fingerprint relies directly on AST, which leads to wrong hash in case of type references.
(Instead of reference the hash of the definition is calculated)

Evaluate performance

Strange structure sizes

Found that sizes of State and schema_rt_State structures are not equal.

struct schema_rt_State {
    size_t                    t_capacity;
    size_t                    ot_capacity;
    size_t                    res_capacity;
    size_t                    res_size;
    uint8_t                  *res;
    const uint8_t            *b1;
    union {
        const uint8_t        *b2;
        const uint16_t       *b2_16;
        const uint32_t       *b2_32;
    };
    uint8_t                  *t;
    struct schema_rt_Value   *v;
    uint8_t                  *ot;
    struct schema_rt_Value   *ov;
    int32_t                   k;
};

struct State {
    size_t             t_capacity;   // capacity of t/v   bufs (items)
    size_t             ot_capacity;  // capacity of ot/ov bufs (items)
    size_t             res_capacity; // capacity of res   buf
    size_t             res_size;
    uint8_t           *res;      // filled by unparse_msgpack, others
    const uint8_t     *b1;       // bank1: input data
    const uint8_t     *b2;       // bank2: program constants
    uint8_t           *t;        // filled by parse_msgpack
    struct Value      *v;        // .......................
    uint8_t           *ot;       // consumed by unparse_msgpack
    struct Value      *ov;       // ...........................
};

We can add debug output in parse_msgpack and unparse_msgpack:

parse -> State sizeof = 88, Rt State sizeof = 96
unparse -> State sizeof = 88, Rt State sizeof = 96

nullable type reference exported as type definition

snippet

local json = require('json')
local schema = {
    name = "X",
    type = "record",
    fields = {
        {
            name = "first",
            type = {
                name = "first",
                type = "fixed",
                size = 16
            }

        },
        {
            name = "second",
            type = "first*"
        }
    }
}
print(json.encode(schema))
ok, res = avro.create(schema, {deferred_definition=true})
schema = avro.export(res)
print(json.encode(schema))
ok, res = avro.create(schema, {deferred_definition=true})
print(ok, res)

result

false	X/second/<fixed>: Type name already defined: first

Reason:
Export function uses type tables as keys to determine if it is already exported, however if type definition and type reference have diferent nullabilities, deepcopy function is called and tables have different addresses.

Implement a universal JSON verificator

The following features desired to be parts of a universal JSON verificator and does not supported by avro-schema:

~~Nullability of an object field~~ (resolved).
Unions w/o an explicit type specification:
2.1. A union as a field type.
2.2. A union as part of fields list, kinda this: {f1, f2, {{f3a, f4a}, {f3b, f4b}}, f5}.
~~removed~~
Support for custom types, type inheritance and type validators:
4.1. Each custom type must (?) have a parent, which is a basic type (like int, string) or an other custom type.
4.2. Each custom type must (?) have validator in form of a PCRE regexp or a function returns ok, err.
4.3. Validate length of an array by min / max bounds.
Support for constant field values.
A string contains hyphen as a name of an object field.
Handle the null field value and absence of the value as different cases.
Construct a schema by parts: match the same sub-schemas instead of giving an error re names clash.

The first one point (nullability) is most annoing lack of a feature.

The features will extend avro-schema standard.

Nested record bug

Found problem with nested records schema. Here is test case, we got error if deep >= 32. Also you can uncomment array and map, after that you will receive error if deep >= 30

local avro = require('avro_schema')
local fiber = require('fiber')
local yaml = require('yaml')

function deepcopy(orig)
    local orig_type = type(orig)
    local copy
    if orig_type == 'table' then
        copy = {}
        for orig_key, orig_value in next, orig, nil do
            copy[deepcopy(orig_key)] = deepcopy(orig_value)
        end
        setmetatable(copy, deepcopy(getmetatable(orig)))
    else -- number, string, boolean, etc
        copy = orig
    end
    return copy
end

local deep = 31
local pattern = {
    type='record',
    name='nested',
    fields={
         {name="test", type="long"},
         --{name="test2", type='string'},
         --{name="test3", type={type='array', items='string'}},
         --{name="test4", type={type='map', values='string'}}
    }
}

local data_pattern = {
    test=12345678,
    --test2="qwertyuioasdfghjkzxcvbn",
    --test3 = {'qwe4567', 'qwe4567','qwe4567','qwe4567','qwe4567','qwe4567','qwe4567','qwe4567'},
    --test4 = {hello="world"}
}

local schema = deepcopy(pattern)
local prev = schema

for i=1,deep do
    table.insert(prev.fields, {name='nested', type=deepcopy(pattern)})
    prev.name='nested' .. tostring(i)
    prev = prev.fields[#prev.fields].type
end

local _, sh = avro.create(schema)
local ok, compiled = avro.compile({sh})

local data = deepcopy(data_pattern)
local ptr = data

for i=1,deep do
    ptr['nested'] = deepcopy(data_pattern)
    ptr = ptr.nested
end
local _, tuple = compiled.flatten(data)
print(require('yaml').encode(tuple))

deep=31 output:

$ tarantool test.lua 
--- [12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678,
  12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678,
  12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678,
  12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678, 12345678]
...

deep=32 output:

tarantool test.lua 
parse -> State sizeof = 88, Rt State sizeof = 96
--- 'nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested/nested:
  Expecting MAP, encountered NIL'

More elaborate flattening

{ foo = 42, bar = { x = -19, y = 7} } —> {42, -19, 7}

Avro serialization and deserialization

Hi, thanks for your effort. I am looking for a Lua library to do Avro binary serialization and deserialization. After going through your library I noticed that it only deals with MsgPack array. The decoder doesn't even use the schema. It appears the current architecture is not capable of processing the Avro binary records, especially on the decoding side.
Correct me if I am wrong.

Can't reload code of avro-schema

When I set package.loaded['avro-schema'] = nil and attempt to require() it again, I get:

tarantool> ---
- error: '...antool/ib-core/.rocks/share/tarantool/avro_schema/il.lua:12: attempt
    to redefine ''schema_il_Opcode'''
...

UPD: ability to do hot code reload is very important for my development workflow

Bug with default values of `null` type

Default values are ignored for null typed fields and unions.
With other types, like int it's working as expected

tarantool>  ok, schema = avro_schema.create {
    type = "record",
    name = "test",
    fields = {
      { name = "bar", type = "null", default = msgpack.NULL },
      { name = "foo", type = {"int", "null"}, default = msgpack.NULL }
    }
}
 
 
tarantool>  avro_schema.validate(schema, { foo = { int = 42 } })
---
- false
- Field bar missing
...
 
 
tarantool> avro_schema.validate(schema, { bar = msgpack.NULL })
---
- false
- Field foo missing
...

A type that allows any valid lua table

Please add a type that will allow to declare fields that can store arbitrary Lua atomic values or tables.

This is required by our business customers who sometimes need to store unstructured data in object fields.

Nullable types inside of arrays

Here is a simple test case which should not fail, but it does:

#!/usr/bin/env tarantool

box.cfg{wal_mode = "none"}
local json = require("json")
local avro = require('avro_schema')
local msgpack = require('msgpack')
local jprint = function(data)
    print(json.encode(data))
end
local schema = {
    type="array",
    items="long"
}
local data = {3,2,1}
local ok, ash, r, fs
local flattened
ok, ash = avro.create(schema)
assert(ok)
ok, r = avro.validate(ash, data)
assert(ok)
ok, fs = avro.compile(ash)
ok, r = fs.flatten(data)
print("flatten", json.encode(r))
ok, r = fs.unflatten(r)
print("unflatten", json.encode(r))

schema = {
    type="array",
    items="long*"
}
ok, ash = avro.create(schema)
assert(ok)
ok, r = avro.validate(ash, data)
assert(ok)
ok, fs = avro.compile(ash)
ok, r = fs.flatten(data)
print("flatten", json.encode(r))
ok, r = fs.unflatten(r)
print("unflatten", json.encode(r))

schema =  {
    type = "array",
    items = {
        type = "record",
        name = "lol",
        fields = {
            {
                type="long",
                name="f1"
            },
            {
                type="long",
                name="f2"
            }
        }
    }
}
data = {{f1=1,f2=2}, {f1=1,f2=2}}
ok, ash = avro.create(schema)
assert(ok)
ok, r = avro.validate(ash, data)
assert(ok)
ok, fs = avro.compile(ash)
ok, r = fs.flatten(data)
print("flatten", json.encode(r))
ok, r = fs.unflatten(r)
print("unflatten", json.encode(r))

schema =  {
    type = "array",
    items = {
        type = "record",
        name = "lol",
        fields = {
            {
                type="long",
                name="f1"
            },
            {
                type="long*",
                name="f2"
            }
        }
    }
}
ok, ash = avro.create(schema)
assert(ok)
ok, r = avro.validate(ash, data)
assert(ok)
ok, fs = avro.compile(ash)
ok, r = fs.flatten(data)
print("flatten", json.encode(r))
ok, r = fs.unflatten(r)
print("unflatten", json.encode(r))

os.exit()