GithubHelp home page GithubHelp logo

emuxevans / colfer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pascaldekloe/colfer

0.0 2.0 0.0 211 KB

Schema-based binary format

License: Other

Go 63.29% JavaScript 10.77% Java 25.26% Protocol Buffer 0.39% HTML 0.29%

colfer's Introduction

Colfer Build Status

Colfer is a schema-based binary data format optimized for speed and size.

The project's compiler colf(1) generates source code from schema definitions to marshal and unmarshall data structures.

This is free and unencumbered software released into the public domain. The format is inspired by Protocol Buffer.

Features

  • Simple and straightforward in use
  • Support for: Go, Java and ECMAScript/JavaScript
  • No dependencies other than the core library
  • Both faster and smaller than: Protocol Buffers, FlatBuffers and MessagePack
  • The generated code is human-readable
  • Configurable data limits with sane defaults (memory protection)
  • Maximum of 127 fields per data structure
  • No support for enumerations
  • Framed; suitable for concatenation/streaming

TODO's

  • RMI
  • Lists for numbers, timestamps and binaries

Use

Download a prebuilt compiler or run go get -u github.com/pascaldekloe/colfer/cmd/colf to make one yourself.

NAME
	colf — compile Colfer schemas

SYNOPSIS
	colf [-b <dir>] [-p <path>] <language> [<file> ...]

DESCRIPTION
	Generates source code for the given language. The options are: Go,
	Java and ECMAScript.
	The file operands are processed in command-line order. If file is
	absent, colf reads all ".colf" files in the working directory.

  -b string
	Use a specific destination base directory. (default ".")
  -p string
	Adds a package prefix. Use slash as a separator when nesting.

BUGS
	Report bugs at https://github.com/pascaldekloe/colfer/issues

SEE ALSO
	protoc(1)

It is recommended to commit the generated source code to the respective version control. Maven users may disagree.

Schema

Data structures are defined per package in .colf files. The format is quite self explanatory.

package example

type member struct {
	id     int64
	name   text
	joined timestamp
	avatar binary
	allies []member
}

The following table shows how Colfer data types are applied per language.

Colfer ECMAScript Go Java
bool Boolean bool boolean
uint32 Number uint32 int †
uint64 Number ‡ uint64 long †
int32 Number int32 int
int64 Number ‡ int64 long
float32 Number float32 float
float64 Number float64 double
timestamp Date + Number time.Time java.time.Instant
text String ‡‡ string java.lang.String ††
binary Uint8Array []byte byte[]
  • † signed representation of the unsigned data
  • ‡ range limited to (1 - 2⁵³, 2⁵³ - 1)
  • †† characters limited by UTF-16 (U+0000, U+10FFFF)

Compatibility

Name changes do not affect the serialization format. Deprecated fields can be renamed to clearly discourage its use.

The following changes are backward compatible.

  • Adding new fields to the end of Colfer structs
  • Raising the bit size (which is actually just a limit) of signed integers

Performance

% go test -bench .
PASS
BenchmarkMarshal-8               	20000000	        96.7 ns/op	      52 B/op	       1 allocs/op
BenchmarkMarshalProtoBuf-8       	20000000	       104 ns/op	      52 B/op	       1 allocs/op
BenchmarkMarshalFlatBuf-8        	 1000000	      1030 ns/op	     472 B/op	      12 allocs/op
BenchmarkUnmarshal-8             	10000000	       120 ns/op	      84 B/op	       2 allocs/op
BenchmarkUnmarshalProtoBuf-8     	10000000	       151 ns/op	      84 B/op	       2 allocs/op
BenchmarkUnmarshalFlatBuf-8      	10000000	       205 ns/op	      84 B/op	       2 allocs/op
BenchmarkMarshalReuse-8          	30000000	        43.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkMarshalProtoBufReuse-8  	30000000	        53.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkMarshalFlatBufReuse-8   	 5000000	       312 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnmarshalReuse-8        	20000000	        72.5 ns/op	      20 B/op	       1 allocs/op
BenchmarkUnmarshalProtoBufReuse-8	20000000	       111 ns/op	      20 B/op	       1 allocs/op
BenchmarkUnmarshalFlatBufReuse-8 	10000000	       167 ns/op	      20 B/op	       1 allocs/op
ok  	github.com/pascaldekloe/colfer	21.206s

For Java the numbers look even better.

Running testdata.bench.bench
20M unmarshals avg 67ns
20M marshals avg 49ns
20M marshals with buffer reuse avg 34ns

Format

Data structures consist of zero or more field value definitions followed by a termination byte 0x7f. Only those fields with a value other than the zero value may be serialized. Fields appear in order as stated by the schema.

The zero value for booleans is false, integers: 0, floating points: 0.0, timestamps: 1970-01-01T00:00:00.000000000Z, text & binary: the empty string, nested data structures: null and an empty list for data structure lists.

Data is represented in a big-endian manner. The format relies on varints also known as a variable-length quantity.

Value Definiton

Each definition starts with an 8-bit header. The 7 least significant bits identify the field by its (0-based position) index in the schema. The most significant bit is used as a flag.

Boolean occurrences set the value to true.

Integers are encoded as varints. The header flag indicates negative for signed types and fixed size for unsigned types. The tenth byte for 64-bit integers is skipped for encoding since its value is fixed to 0x01.

Floating points are encoded conform IEEE 754.

Timestamps are encoded as a 32-bit unsigned integer for the number of seconds that have elapsed since 00:00:00 UTC, Thursday, 1 January 1970, not counting leap seconds. When the header flag is set then the number of seconds is encoded as a 64-bit two's complement integer. In both cases the value is followed with 32 bits for the nanosecond fraction. Note that the first two bits are not in use (reserved).

The data for text and binaries is prefixed with a varint byte size declaration. Text is encoded as UTF-8.

Lists of objects and strings are prefixed with a varint element size declaration.

colfer's People

Contributors

pascaldekloe avatar

Watchers

James Cloos avatar Emux Evans avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.