GithubHelp home page GithubHelp logo

go-faster / jx Goto Github PK

View Code? Open in Web Editor NEW
174.0 6.0 3.0 2.86 MB

json encoding and decoding

License: MIT License

Makefile 0.05% Go 99.82% Shell 0.13%
json go golang encoding decoding streaming iteration rfc7159 golang-library rfc4627 faster encoder decoder

jx's Introduction

jx stable

Package jx implements encoding and decoding of json [RFC 7159]. Lightweight fork of jsoniter.

go get github.com/go-faster/jx

Features

  • Mostly zero-allocation and highly optimized
  • Directly encode and decode json values
  • No reflect or interface{}
  • Pools and direct buffer access for less (or none) allocations
  • Multi-pass decoding
  • Validation

See usage for examples. Mostly suitable for fast low-level json manipulation with high control, for dynamic parsing and encoding of unstructured data. Used in ogen project for json (un)marshaling code generation based on json and OpenAPI schemas.

For example, we have following OpenTelemetry log entry:

{
  "Timestamp": "1586960586000000000",
  "Attributes": {
    "http.status_code": 500,
    "http.url": "http://example.com",
    "my.custom.application.tag": "hello"
  },
  "Resource": {
    "service.name": "donut_shop",
    "service.version": "2.0.0",
    "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198"
  },
  "TraceId": "13e2a0921288b3ff80df0a0482d4fc46",
  "SpanId": "43222c2d51a7abe3",
  "SeverityText": "INFO",
  "SeverityNumber": 9,
  "Body": "20200415T072306-0700 INFO I like donuts"
}

Flexibility of jx enables highly efficient semantic-aware encoding and decoding, e.g. using [16]byte for TraceId with zero-allocation hex encoding in json:

Name Speed Allocations
Decode 1279 MB/s 0 allocs/op
Validate 1914 MB/s 0 allocs/op
Encode 1202 MB/s 0 allocs/op
Write 2055 MB/s 0 allocs/op

cpu: AMD Ryzen 9 7950X

See otel_test.go for example.

Why

Most of jsoniter issues are caused by necessity to be drop-in replacement for standard encoding/json. Removing such constrains greatly simplified implementation and reduced scope, allowing to focus on json stream processing.

  • Commas are handled automatically while encoding
  • Raw json, Number and Base64 support
  • Reduced scope
    • No reflection
    • No encoding/json adapter
    • 3.5x less code (8.5K to 2.4K SLOC)
  • Fuzzing, improved test coverage
  • Drastically refactored and simplified
    • Explicit error returns
    • No Config or API

Usage

Decode

Use jx.Decoder. Zero value is valid, but constructors are available for convenience:

To reuse decoders and their buffers, use jx.GetDecoder and jx.PutDecoder alongside with reset functions:

Decoder is reset on PutDecoder.

d := jx.DecodeStr(`{"values":[4,8,15,16,23,42]}`)

// Save all integers from "values" array to slice.
var values []int

// Iterate over each object field.
if err := d.Obj(func(d *jx.Decoder, key string) error {
    switch key {
    case "values":
        // Iterate over each array element.
        return d.Arr(func(d *jx.Decoder) error {
            v, err := d.Int()
            if err != nil {
                return err
            }
            values = append(values, v)
            return nil
        })
    default:
        // Skip unknown fields if any.
        return d.Skip()
    }
}); err != nil {
    panic(err)
}

fmt.Println(values)
// Output: [4 8 15 16 23 42]

Encode

Use jx.Encoder. Zero value is valid, reuse with jx.GetEncoder, jx.PutEncoder and jx.Encoder.Reset(). Encoder is reset on PutEncoder.

var e jx.Encoder
e.ObjStart()           // {
e.FieldStart("values") // "values":
e.ArrStart()           // [
for _, v := range []int{4, 8, 15, 16, 23, 42} {
    e.Int(v)
}
e.ArrEnd() // ]
e.ObjEnd() // }
fmt.Println(e)
fmt.Println("Buffer len:", len(e.Bytes()))
// Output: {"values":[4,8,15,16,23,42]}
// Buffer len: 28

Writer

Use jx.Writer for low level json writing.

No automatic commas or indentation for lowest possible overhead, useful for code generated json encoding.

Raw

Use jx.Decoder.Raw to read raw json values, similar to json.RawMessage.

d := jx.DecodeStr(`{"foo": [1, 2, 3]}`)

var raw jx.Raw
if err := d.Obj(func(d *jx.Decoder, key string) error {
    v, err := d.Raw()
    if err != nil {
        return err
    }
    raw = v
    return nil
}); err != nil {
    panic(err)
}

fmt.Println(raw.Type(), raw)
// Output:
// array [1, 2, 3]

Number

Use jx.Decoder.Num to read numbers, similar to json.Number. Also supports number strings, like "12345", which is common compatible way to represent uint64.

d := jx.DecodeStr(`{"foo": "10531.0"}`)

var n jx.Num
if err := d.Obj(func(d *jx.Decoder, key string) error {
    v, err := d.Num()
    if err != nil {
        return err
    }
    n = v
    return nil
}); err != nil {
    panic(err)
}

fmt.Println(n)
fmt.Println("positive:", n.Positive())

// Can decode floats with zero fractional part as integers:
v, err := n.Int64()
if err != nil {
    panic(err)
}
fmt.Println("int64:", v)
// Output:
// "10531.0"
// positive: true
// int64: 10531

Base64

Use jx.Encoder.Base64 and jx.Decoder.Base64 or jx.Decoder.Base64Append.

Same as encoding/json, base64.StdEncoding or [RFC 4648].

var e jx.Encoder
e.Base64([]byte("Hello"))
fmt.Println(e)

data, _ := jx.DecodeBytes(e.Bytes()).Base64()
fmt.Printf("%s", data)
// Output:
// "SGVsbG8="
// Hello

Validate

Check that byte slice is valid json with jx.Valid:

fmt.Println(jx.Valid([]byte(`{"field": "value"}`))) // true
fmt.Println(jx.Valid([]byte(`"Hello, world!"`)))    // true
fmt.Println(jx.Valid([]byte(`["foo"}`)))            // false

Capture

The jx.Decoder.Capture method allows to unread everything is read in callback. Useful for multi-pass parsing:

d := jx.DecodeStr(`["foo", "bar", "baz"]`)
var elems int
// NB: Currently Capture does not support io.Reader, only buffers.
if err := d.Capture(func(d *jx.Decoder) error {
	// Everything decoded in this callback will be rolled back.
	return d.Arr(func(d *jx.Decoder) error {
		elems++
		return d.Skip()
	})
}); err != nil {
	panic(err)
}
// Decoder is rolled back to state before "Capture" call.
fmt.Println("Read", elems, "elements on first pass")
fmt.Println("Next element is", d.Next(), "again")

// Output:
// Read 3 elements on first pass
// Next element is array again

ObjBytes

The Decoder.ObjBytes method tries not to allocate memory for keys, reusing existing buffer.

d := DecodeStr(`{"id":1,"randomNumber":10}`)
d.ObjBytes(func(d *Decoder, key []byte) error {
    switch string(key) {
    case "id":
    case "randomNumber":
    }
    return d.Skip()
})

Roadmap

  • Rework and export Any
  • Support Raw for io.Reader
  • Support Capture for io.Reader
  • Improve Num
    • Better validation on decoding
    • Support BigFloat and BigInt
    • Support equivalence check, like eq(1.0, 1) == true
  • Add non-callback decoding of objects

Non-goals

  • Code generation for decoding or encoding
  • Replacement for encoding/json
  • Reflection or interface{} based encoding or decoding
  • Support for json path or similar

This package should be kept as simple as possible and be used as low-level foundation for high-level projects like code generator.

License

MIT, same as jsoniter

jx's People

Contributors

1046102779 avatar alextomaili avatar allenx2018 avatar bboreham avatar bbrks avatar carlcarl avatar cch123 avatar ceshihao avatar dependabot[bot] avatar dvrkps avatar elee1766 avatar ernado avatar eruca avatar fishyww avatar ggaaooppeenngg avatar javierprovecho avatar liggitt avatar nikhita avatar olegshaldybin avatar onelrdm avatar polyzy avatar quasilyte avatar superfashi avatar taowen avatar tdakkota avatar teou avatar thockin avatar toffaletti avatar yjhmelody avatar zhaitianduo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

jx's Issues

ability to stream arbitrary bytes to the underlying writer when in streaming mode

i have a use case where i would like part of my message to be json streamed encoded from reflection, but i would still like to use jx for the remaining encoding, as most requests are small and do not require any reflection.

currently, when the writer is in streaming mode, all calls to Write will cause an error.

i propose to simply flush the buffer and then pass the call to Write, which would allow now other json-encoders to stream into jx's encoder. i did this in a fork https://github.com/elee1766/jx and it seems to be working?

given that the package does not want to do any reflection based encoding, this seems like a good compromise?

test: cleanup

  • Use Go test naming convention (TestDecoder_Skip instead of Test_skip)
  • Unify benchmarks and delete duplicates (BenchmarkValid vs BenchmarkSkip)
  • Populate testdata with different cases (real world objects, big array of primitives, use some benchmark sets)
    • #12 (added floats.json)
    • #15 (added small/medium/large/etc)
    • #18 (added bools.json and nulls.json)
    • e00d36c (added corpus from JSONTestSuite)
  • Use table tests, generate cases for in-memory and streaming decoding

perf: research performance impact of bytesets

Currently we use some bytesets to improve matching speed

  • To match space characters

    jx/dec_read.go

    Lines 15 to 17 in 3ed0c1e

    var spaceSet = [256]byte{
    ' ': 1, '\n': 1, '\t': 1, '\r': 1,
    }
  • To match digits and skip number

    jx/dec_skip.go

    Lines 47 to 66 in 97d4e11

    skipNumberSet = [256]byte{
    '0': 1,
    '1': 1,
    '2': 1,
    '3': 1,
    '4': 1,
    '5': 1,
    '6': 1,
    '7': 1,
    '8': 1,
    '9': 1,
    ',': 2,
    ']': 2,
    '}': 2,
    ' ': 2,
    '\t': 2,
    '\n': 2,
    '\r': 2,
    }
  • To match and validate hex and fast string escaping

    jx/dec_skip.go

    Lines 242 to 256 in 97d4e11

    escapedStrSet = [256]byte{
    '"': 1, '\\': 1, '/': 1, 'b': 1, 'f': 1, 'n': 1, 'r': 1, 't': 1,
    'u': 2,
    }
    hexSet = [256]byte{
    '0': 1, '1': 1, '2': 1, '3': 1,
    '4': 1, '5': 1, '6': 1, '7': 1,
    '8': 1, '9': 1,
    'A': 1, 'B': 1, 'C': 1, 'D': 1,
    'E': 1, 'F': 1,
    'a': 1, 'b': 1, 'c': 1, 'd': 1,
    'e': 1, 'f': 1,
    }

    jx/w_str.go

    Lines 211 to 219 in c048666

    var safeSet = [256]byte{
    // First 31 characters.
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1,
    '"': 1,
    '\\': 1,
    }
  • To make HTML-safe escaping

    jx/w_str.go

    Line 14 in c048666

    var htmlSafeSet = [utf8.RuneSelf]bool{

Totaly we have 6 sets, 5 of [256]byte and one [128]byte, it tooks 256*5 + 128 = 1408 bytes.
In benchmarks it performs well because we use only 1-2 of them.
But it may cause cache pollution and slow down parser in realistic cases.

We need to research real impact of such optimizations and possibly use comparsion instead or merge some sets and use bit masks.

perf: get rid of allocation in float decoding slow path

Currently, float decoder fallbacks to strconv.ParseFloat and causes []byte -> string allocation. Possibly we could avoid such conversion by vendoring float implementation from Go library. Also we should try to improve current float parsing algorithm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.