GithubHelp home page GithubHelp logo

dikunongithub / fastavro Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fastavro/fastavro

0.0 0.0 0.0 3.23 MB

Fast Avro for Python

License: MIT License

Makefile 0.13% Python 98.81% Shell 0.87% Dockerfile 0.06% Batchfile 0.13%

fastavro's Introduction

fastavro

Build Status Documentation Status codecov

Because the Apache Python avro package is written in pure Python, it is relatively slow. In one test case, it takes about 14 seconds to iterate through a file of 10,000 records. By comparison, the JAVA avro SDK reads the same file in 1.9 seconds.

The fastavro library was written to offer performance comparable to the Java library. With regular CPython, fastavro uses C extensions which allow it to iterate the same 10,000 record file in 1.7 seconds. With PyPy, this drops to 1.5 seconds (to be fair, the JAVA benchmark is doing some extra JSON encoding/decoding).

fastavro supports the following Python versions:

  • Python 3.6
  • Python 3.7
  • Python 3.8
  • Python 3.9
  • PyPy3

Supported Features

  • File Writer
  • File Reader (iterating via records or blocks)
  • Schemaless Writer
  • Schemaless Reader
  • JSON Writer
  • JSON Reader
  • Codecs (Snappy, Deflate, Zstandard, Bzip2, LZ4, XZ)
  • Schema resolution
  • Aliases
  • Logical Types
  • Parsing schemas into the canonical form
  • Schema fingerprinting

Missing Features

  • Anything involving Avro's RPC features

Documentation

Documentation is available at http://fastavro.readthedocs.io/en/latest/

Installing

fastavro is available both on PyPi

pip install fastavro

and on conda-forge conda channel.

conda install -c conda-forge fastavro

Contributing

  • Bugs and new feature requests typically start as github issues where they can be discussed. I try to resolve these as time affords, but PRs are welcome from all.
  • Get approval from discussing on the github issue before opening the pull request
  • Tests must be passing for pull request to be considered

Developer requirements can be installed with pip install -r developer_requirements.txt. If those are installed, you can run the tests with ./run-tests.sh. If you have trouble installing those dependencies, you can run docker build . to run the tests inside a docker container. This won't test on all versions of python or on pypy, so it's possible to still get CI failures after making a pull request, but we can work through those errors if/when they happen. .run-tests.sh only covers the Cython tests. In order to test the pure Python implementation, comment out FASTAVRO_USE_CYTHON=1 python setup.py build_ext --inplace and re-run.

NOTE: Some tests might fail when running the tests locally. An example of this is this codec tests. If the supporting codec library is not availabe, the test will fail. These failures can be ignored since the tests will on pull requests and will be run in the correct environments with the correct dependecies set up.

Releasing

We release both to pypi and to conda-forge.

We assume you have twine installed and that you've created your own fork of fastavro-feedstock.

Changes

See the ChangeLog

Contact

Project Home

fastavro's People

Contributors

tebeka avatar scottbelden avatar rhaarm avatar kkirsanov avatar barrywhart avatar pkoch avatar kurtostfeld avatar natb1 avatar rodcarroll avatar rouge8 avatar dacjames avatar jquast avatar regisb avatar spenczar avatar fthyssen avatar gudjonragnar avatar marcosschroh avatar ryan-williams avatar ksunden avatar theianrobertson avatar matpuk avatar mtth avatar nicusor97 avatar nobo728x avatar mcguipat avatar pawelrubin avatar pbabics avatar chobeat avatar lemurchik avatar jmgpeeters avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.