GithubHelp home page GithubHelp logo

teskalabs / cysimdjson Goto Github PK

View Code? Open in Web Editor NEW
355.0 6.0 16.0 3.22 MB

Very fast Python JSON parsing library

License: Apache License 2.0

Python 0.28% C++ 99.46% Dockerfile 0.01% C 0.03% Cython 0.21% Shell 0.02%
python cython json simdjson

cysimdjson's Introduction

cysimdjson

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
It is Python bindings for the simdjson using Cython.

Standard Python JSON parser (json.load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

Whilst there are other fast Python JSON parsers, such as pysimdjson, libpy_simdjson or orjson, they don't reach the raw speed that is provided by the brilliant SIMDJSON project. SIMDJSON is C++ JSON parser based on SIMD instructions, reportedly the fastest JSON parser on the planet.

Python 3.11 Python 3.10
Python 3.9 Python 3.8 Python 3.7

Usage

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes)

# Access using JSON Pointer
print(json_element.at_pointer("/foo/2/0"))

Note: parser object can be reused for maximum performance.

Pythonic drop-in API

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access in a Python way
print(json_parsed.json_parsed['foo'])

The json_parsed is a read-only dictionary-like object, that provides an access to JSON data.

WARNING: This method of the access will be deprecated in the future, likely in favour of JSON Pointer.

Trade-offs

The speed of cysimdjson is based on these assumptions:

  1. The output of the parser is read-only, you cannot modify it
  2. The output of the parser is not Python dictionary, but lazily evaluated dictionary-like object
  3. The parser output is valid only until JSONParser object is still alive (not destroyed), otherwise you will get ugly errors
  4. If you convert the parser output into a Python dictionary, you will lose the speed

If your design is not aligned with these assumptions, cysimdjson is not a good choice.

Documentation

JSONParser.parse(json_bytes)

Parse JSON json_bytes, represented as bytes.

JSONParser.parse_in_place(bytes)

Parse JSON json_bytes, represented as bytes, assuming that there is a padding expected by SIMDJSON. This is the fastest parsing variant.

JSONParser.parse_string(string)

Parse JSON json_bytes, represented as str (string).

JSONParser.load(path)

Installation

pip3 install cysimdjson

Project cysimdjson is distributed via PyPI: https://pypi.org/project/cysimdjson/ .

If you want to install cysimdjson from source, you need to install Cython first: pip3 install cython.

Performance

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          510291.81 EPS (  1.00)  1223.17 MB/s
* libpy_simdjson loads      374615.54 EPS (  1.36)   897.95 MB/s
* pysimdjson parse          362195.46 EPS (  1.41)   868.18 MB/s
* orjson loads              110615.70 EPS (  4.61)   265.15 MB/s
* python json loads          72096.80 EPS (  7.08)   172.82 MB/s
----------------------------------------------------------------

SIMDJSON: 543335.93 EPS, 1241.52 MB/s
----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2556.10 EPS (  1.00)  1614.22 MB/s
* libpy_simdjson loads        2444.53 EPS (  1.05)  1543.76 MB/s
* pysimdjson parse            2415.46 EPS (  1.06)  1525.40 MB/s
* orjson loads                 387.11 EPS (  6.60)   244.47 MB/s
* python json loads            278.63 EPS (  9.17)   175.96 MB/s
----------------------------------------------------------------

SIMDJSON: 2536.16 EPS,  1527.28 MB/s
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             284.67 EPS (  1.00)   640.81 MB/s
* pysimdjson parse             284.62 EPS (  1.00)   640.70 MB/s
* libpy_simdjson loads         277.13 EPS (  1.03)   623.84 MB/s
* orjson loads                  81.80 EPS (  3.48)   184.13 MB/s
* python json loads             22.52 EPS ( 12.64)    50.68 MB/s
----------------------------------------------------------------

SIMDJSON: 307.95 EPS, 661.08 MB/s
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             775.61 EPS (  1.00)  2581.09 MB/s
* pysimdjson parse             743.67 EPS (  1.04)  2474.81 MB/s
* libpy_simdjson loads         654.15 EPS (  1.19)  2176.88 MB/s
* orjson loads                 166.67 EPS (  4.65)   554.66 MB/s
* python json loads            113.72 EPS (  6.82)   378.43 MB/s
----------------------------------------------------------------

SIMDJSON: 703.59 EPS, 2232.92 MB/s
----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         3972376.53 EPS (  1.00)    27.81 MB/s
* orjson loads             3637369.63 EPS (  1.09)    25.46 MB/s
* libpy_simdjson loads     1774211.19 EPS (  2.24)    12.42 MB/s
* pysimdjson parse          977530.90 EPS (  4.06)     6.84 MB/s
* python json loads         527932.65 EPS (  7.52)     3.70 MB/s
----------------------------------------------------------------

SIMDJSON: 3799392.10 EPS

CPU: AMD EPYC 7452

More performance testing:

Tests are reproducible

pip3 install orjson
pip3 install pysimdjson
pip3 install libpy_simdjson
python3 setup.py build_ext --inplace
PYTHONPATH=. python3 ./perftest/test_benchmark.py

Manual build

python3 setup.py build_ext --inplace

cysimdjson's People

Contributors

andrewdelong avatar ateska avatar lemire avatar plesoun avatar premyslcerny avatar vizonex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cysimdjson's Issues

Is performance of JSONObject.get() and JSONObject[] method slower than python dict?

In my case, the speed of parser.parse() is about 4x faster than orjson.loads(), great!
However, both JSONObject.get() and [] method to access inner attribute is 2x slower than python dict.get() and [] mthod.

Is there any good ideas to load json use this module, and use python dict to access value? ps., export() method is too heavily.
Or is there any other more efficient method to access the value?

Thanks

More wheels?

Wow, is this library ever fast!

Might it be possible to add Windows and Alpine Linux (musllinux) wheels?

Thank you!

Segfault when installed from pip

$ pip3 install --user cysimdjson
Collecting cysimdjson
  Downloading cysimdjson-21.11-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 1.3 MB/s
Installing collected packages: cysimdjson
Successfully installed cysimdjson-21.11
$ python3 convert-pa-json.py infile.geojson
ID,Name,Polygon
Segmentation fault (core dumped)

Implement __iter__ on JSONArray

When looping over a long array (e.g. 100,000 elements), the current JSONArray implementation with len and getitem manifests quadratic running time. If a new iter method on JSONArray were implemented using the yield keyword and the loop pattern already used in contains, this could be reduced to the expected linear time.

(The same logic would apply to implementing iter on JSONElement.)

How to convert to a dict when needed?

What is the best way to convert to a dict? I tried dict(obj) and was able to get a dict but some keys are still things like <cysimdjson.cysimdjson.JSONArray object at 0x7f523a0ff090>

Would creating a method to recursively go through the fields and convert be the best method or is there a built-in method to convert a JSON with nested fields / arrays into a dict? I understand the sacrifice in speed but we would only be doing this with JSON objects that match a specific key value within the JSON.

Thanks so much for this amazing module! It is amazingly fast.

Doesn't build on Windows + MSVC

Installing cysimdjson using pip on Windows with MSVC fails.

It is due to MSVC not understanding the compiler flag -Wno-deprecated when compiling cysimdjson.cpp.

Also, MSVC won't understand -std=c++17 and will ignore the flag (and use C++11). The syntax for MSVC is /std:c++17.

Here is the relevent error log:

  building 'cysimdjson.cysimdjson' extension
  creating build\temp.win-amd64-cpython-37
  creating build\temp.win-amd64-cpython-37\Release
  creating build\temp.win-amd64-cpython-37\Release\cysimdjson
  creating build\temp.win-amd64-cpython-37\Release\cysimdjson\pysimdjson
  creating build\temp.win-amd64-cpython-37\Release\cysimdjson\simdjson
  "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -Icysimdjson "-IC:\Program Files\Python37\include" "-IC:\Program Files\Python37\Include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.19041.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.19041.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.19041.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.19041.0\\cppwinrt" /EHsc /Tpcysimdjson/cysimdjson.cpp /Fobuild\temp.win-amd64-cpython-37\Release\cysimdjson/cysimdjson.obj -std=c++17 -Wno-deprecated
  cl : command line error D8021 : Invalid numeric argument '/Wno-deprecated'
  error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.31.31103\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
  ----------------------------------------
  ERROR: Failed building wheel for cysimdjson

load to dict is slower than ujson

A big json_file convert python dict, use simdjson.Parser() Object.as_dict().It‘s load speed is slower than ujson.

  • python ujson loads 1.43 EPS ( 6.30) 67.02 MB/s
  • pysimdjson parse 1.42 EPS ( 6.36) 66.36 MB/s
  • python json loads 1.19 EPS ( 7.57) 55.76 MB/s

Get " ambiguous template specialization ‘get<>’" error when trying to install with PIP

This is on CentOS 7 with gcc 7.3.0 trying to install for Python 3.10. Is the latter maybe not supported yet?

Here is the entire PIP error trace:

Collecting cysimdjson
  Using cached cysimdjson-21.4a4.tar.gz (370 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: cysimdjson
  Building wheel for cysimdjson (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3.10 /usr/local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmp1cpu0ci3
       cwd: /tmp/pip-install-7c4qotgp/cysimdjson_9b536c74e59e4472990b6003d8d6d3e9
  Complete output (263 lines):
  /tmp/pip-build-env-tzj3aepc/overlay/lib/python3.10/site-packages/setuptools/dist.py:487: UserWarning: Normalizing '21.4-a4' to '21.4a4'
    warnings.warn(tmpl.format(**locals()))
  running bdist_wheel
  running build
  running build_py
  package init file 'cysimdjson/__init__.py' not found (or not a regular file)
  running build_ext
  building 'cysimdjson' extension
  creating build
  creating build/temp.linux-x86_64-3.10
  creating build/temp.linux-x86_64-3.10/cysimdjson
  creating build/temp.linux-x86_64-3.10/cysimdjson/pysimdjson
  creating build/temp.linux-x86_64-3.10/cysimdjson/simdjson
  /usr/local/bin/gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icysimdjson -I/usr/local/include/python3.10 -c cysimdjson/cysimdjson.cpp -o build/temp.linux-x86_64-3.10/cysimdjson/cysimdjson.o -std=c++17 -O3
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:6262:38: warning: ‘Iterator’ is deprecated [-Wdeprecated-declarations]
     inline Iterator(const Iterator &o) noexcept;
                                        ^~~~~~~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:6265:49: warning: ‘Iterator’ is deprecated: Use the new DOM navigation API instead (see doc/basics.md) [-Wdeprecated-declarations]
     inline Iterator& operator=(const Iterator&) = delete;
                                                   ^~~~~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:6259:97: note: declared here
   class [[deprecated("Use the new DOM navigation API instead (see doc/basics.md)")]] dom::parser::Iterator {
                                                                                                   ^~~~~~~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:6265:49: warning: ‘Iterator’ is deprecated [-Wdeprecated-declarations]
     inline Iterator& operator=(const Iterator&) = delete;
                                                   ^~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22965:58: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::array> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<array> document::get() & noexcept { return get_array(); }
                                                            ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22966:59: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::object> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<object> document::get() & noexcept { return get_object(); }
                                                             ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22967:68: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::raw_json_string> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<raw_json_string> document::get() & noexcept { return get_raw_json_string(); }
                                                                      ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22968:69: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<std::basic_string_view<char> > simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<std::string_view> document::get() & noexcept { return get_string(); }
                                                                       ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22969:59: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<double> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<double> document::get() & noexcept { return get_double(); }
                                                             ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22970:61: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<long unsigned int> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<uint64_t> document::get() & noexcept { return get_uint64(); }
                                                               ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22971:60: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<long int> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<int64_t> document::get() & noexcept { return get_int64(); }
                                                              ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22972:57: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<bool> simdjson::fallback::ondemand::document::get() &’
   template<> simdjson_really_inline simdjson_result<bool> document::get() & noexcept { return get_bool(); }
                                                           ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22974:68: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::raw_json_string> simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<raw_json_string> document::get() && noexcept { return get_raw_json_string(); }
                                                                      ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22975:69: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<std::basic_string_view<char> > simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<std::string_view> document::get() && noexcept { return get_string(); }
                                                                       ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22976:59: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<double> simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<double> document::get() && noexcept { return std::forward<document>(*this).get_double(); }
                                                             ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22977:61: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<long unsigned int> simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<uint64_t> document::get() && noexcept { return std::forward<document>(*this).get_uint64(); }
                                                               ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22978:60: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<long int> simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<int64_t> document::get() && noexcept { return std::forward<document>(*this).get_int64(); }
                                                              ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:22979:57: error: ambiguous template specialization ‘get<>’ for ‘simdjson::simdjson_result<bool> simdjson::fallback::ondemand::document::get() &&’
   template<> simdjson_really_inline simdjson_result<bool> document::get() && noexcept { return std::forward<document>(*this).get_bool(); }
                                                           ^~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20099:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept {
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20105:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::fallback::ondemand::document::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept {
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:23146:104: error: ambiguous template specialization ‘get<simdjson::fallback::ondemand::document>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::document> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &’
   template<> simdjson_really_inline simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document> simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>::get<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>() & noexcept = delete;
                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20335:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept;
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20336:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept;
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:23147:104: error: ambiguous template specialization ‘get<simdjson::fallback::ondemand::document>’ for ‘simdjson::simdjson_result<simdjson::fallback::ondemand::document> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &&’
   template<> simdjson_really_inline simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document> simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>::get<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>() && noexcept {
                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20335:66: note: candidates are: template<class T> simdjson::simdjson_result<T> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &
     template<typename T> simdjson_really_inline simdjson_result<T> get() & noexcept;
                                                                    ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20336:66: note:                 template<class T> simdjson::simdjson_result<T> simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get() &&
     template<typename T> simdjson_really_inline simdjson_result<T> get() && noexcept;
                                                                    ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:23151:46: error: ambiguous template specialization ‘get<simdjson::fallback::ondemand::document>’ for ‘simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(simdjson::fallback::ondemand::document&) &’
   template<> simdjson_really_inline error_code simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>::get<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>(SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document &out) & noexcept = delete;
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20338:58: note: candidates are: template<class T> simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(T&) &
     template<typename T> simdjson_really_inline error_code get(T &out) & noexcept;
                                                            ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20339:58: note:                 template<class T> simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(T&) &&
     template<typename T> simdjson_really_inline error_code get(T &out) && noexcept;
                                                            ^~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:23152:46: error: ambiguous template specialization ‘get<simdjson::fallback::ondemand::document>’ for ‘simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(simdjson::fallback::ondemand::document&) &&’
   template<> simdjson_really_inline error_code simdjson_result<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>::get<SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document>(SIMDJSON_BUILTIN_IMPLEMENTATION::ondemand::document &out) && noexcept {
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from cysimdjson/pysimdjson/errors.h:3:0,
                   from cysimdjson/cysimdjson.cpp:668:
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20338:58: note: candidates are: template<class T> simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(T&) &
     template<typename T> simdjson_really_inline error_code get(T &out) & noexcept;
                                                            ^~~
  cysimdjson/pysimdjson/../simdjson/simdjson.h:20339:58: note:                 template<class T> simdjson::error_code simdjson::simdjson_result<simdjson::fallback::ondemand::document>::get(T&) &&
     template<typename T> simdjson_really_inline error_code get(T &out) && noexcept;
                                                            ^~~
  error: command '/usr/local/bin/gcc' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for cysimdjson
Failed to build cysimdjson
ERROR: Could not build wheels for cysimdjson which use PEP 517 and cannot be installed directly

GIL?

Hello,

I was wondering if there's a reason not to drop the GIL during simdjson parse() invocations.

Thanks!

OBJECT HOOK OR DEFAULTS while parsing

import cysimdjson
from datetime import datetime
json_bytes = b'{"a":"'+str(datetime.now()).encode()+b'"}'
print(json_bytes)
parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes).export()
print(type(json_element["a"])) # it must return dict

#expected result ->>> <class 'datetime.datetime'>
#got ->>>> <class 'str'>

I want to parse objects back to classes like example shown above, in stdjson it has option for object_hook and defaults
which adds $date etc. and when parsed it returns datetime.datetime or other class.

how to achieve such things here?

I am working on nosql database system which is faster than mongodb and redis now i am looking to to optimize more by faster parsing.

Performance comparison from readme seems a bit unfair

Hi!

First of all thanks for this library! This sounds like a very good idea.

However, what I noticed is that since it evaluates fields lazily, comparing it directly to other JSON libraries that provide you with the full dictionary right away is a bit unfair.

Assuming you will use the whole dict anyway, IMHO, a more fair comparison would be with .export() call.

>>> _json_string = '{"a fairly": "expensive", "json": "goes-in", "here": 121}'
>>> timeit(lambda: cysimdjson_parser.parse_string(_json_string).export(), number=100000)
3.6677306999990833
>>> timeit(lambda: orjson_parser.loads(_json_string), number=100000)
2.9754124999963096

Then however, it is slower than orjson.

It gets a lot faster if you will not use the whole dictionary.

>>> timeit(lambda: cysimdjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
0.4328116000033333
>>> timeit(lambda: orjson_parser.loads(_json_string)[7]['revisionNumber'], number=100000)
3.0126906000004965

However, in my experience this is rarely the case.

Documenting JSON pointer usage?

This seems to do what I expect...

import cysimdjson
parser = cysimdjson.JSONParser()
parser.parse("[1,2,[3]]".encode('utf8')).at_pointer("/2/0")

Might I suggest that it be covered, if only briefly, under the README ?

parsing dicts in a list (support a list on top level of a JSON document)

Thanks for the great wrapper, awsome results. I get data which contain dicts in a list, sth like this:

m = b'[{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},{"evl":"Quest","DAY":"Monday","classes":[1,2,6,9],"sub1":10.23,"sub2":-13.123,"sub3":2.01,"hours1":200,"hours2":4000,"date":1607614205,"signature":1},]'

Have tried it with cysimdjson.JSONParser() but no success, is there any possible workaround? Thanks..

implement JSONObject.get()

Implementing method get() of JSONObject would make a better dict compatibility.

I'm using this library to import a large GeoJSON. Trying to feed geometry object directly to shapely library gives an exception:

Traceback (most recent call last):
  File "workspace/convert-pa-json.py", line 18, in <module>
    polygon = geometry.shape(feat['geometry']['coordinates'])
  File ".local/lib/python3.9/site-packages/shapely/geometry/geo.py", line 102, in shape
    geom_type = ob.get("type").lower()
AttributeError: 'cysimdjson.JSONArray' object has no attribute 'get'

Support for serializing

Hi!
Does cysimdjson not support dumping/serializing a loaded JSON? I want to extract certain parts and then write them back to disk.

Thank you,
Patrick

Memory leak

I am observing a memory leak

Part of the code
metadata_parser, gamedata_parser = cysimdjson.JSONParser(), cysimdjson.JSONParser()
with lz4.frame.open(filepath) as file:
  for line in file:
      idx, metadata, gamedata = line.rstrip(b'\n').split(chr(31).encode())
      metadata, gamedata = metadata_parser.parse(metadata), gamedata_parser.parse(gamedata)
      for key, value in gamedata.at_pointer('/0/common').items():
          if key not in test_data['common']:
              test_data['common'][key] = []
          value_type = str(type(value))
          if value_type not in test_data['common'][key]:
              test_data['common'][key].append(value_type)
      for _, player in gamedata.at_pointer('/1').items():
          for key, value in player.items():
              if key not in test_data['player']:
                  test_data['player'][key] = []
              value_type = str(type(value))
              if value_type not in test_data['player'][key]:
                  test_data['player'][key].append(value_type)
      for _, vehicles in gamedata.at_pointer('/0/vehicles').items():
          vehicles = [vehicles] if isinstance(vehicles, dict) else vehicles
          for vehicle in vehicles:
              if isinstance(vehicle, str):
                  continue
              for key, value in vehicle.items():
                  if key not in test_data['vehicle']:
                      test_data['vehicle'][key] = []
                  value_type = str(type(value))
                  if value_type not in test_data['vehicle'][key]:
                      test_data['vehicle'][key].append(value_type)

I am analyzing a large data dump, over 100gb, and memory leaks are preventing the process from completing successfully. The leak is somewhere on the C side of the extension, since profiling the python part didn't show anything. I followed the first manual and ran valgrind

valgrind log

Gist

I can provide more information, just tell me what and how ))

Reusing parser modifies previous results

Installed via PyPI

from cysimdjson import JSONParser
parser = JSONParser()
result = parser.parse(b'{"hello": "world"}')
print(result['hello'])  # "world"
new_result = parser.parse(b'{"hello": "universe"}')
print(result['hello'])  # "universe"

This only happens when reusing a previous parser instance. I'm not sure whether this is by design, but if it is, it should probably be explicitly documented to avoid confusion.

It also becomes especially iffy when mixing different types:

result = parser.parse(b'{"hello": "world"}')
print(type(result))  # JSONObject
print(list(result.keys()))  # ['hello']
new_result = parser.parse(b'[1,2,3]')
print(type(result))  # JSONObject
print(type(new_result))  # JSONArray
print(list(result.keys()))  # ['hello', 'hello', 'hello']

So if it is by design, it might be worthwhile to somehow invalidate the previous reference when a new document is parsed.

TypeError: expected bytes, str found

Im not computer science guy (mechanical eng.), so hopefully the question is not too off. My project involves streaming millions of data (speed is very important) via websocket client delivered as string messages containing a system's operational parameters in json/dict format. One pushed message is a list (actual list in string) containing several dictionaries through which we iterate and analyse in real time. A 3 abbreviated separately pushed messages would look sth like this:

[{"t":1611064750942,"Signal003":"Q","Signal003_MODE":1,"Signal003_TARGET":11,"Signal003_NEXT":8,"Signal003_IRV":9.05,"Signal003_ONOFF":9.06,"Signal003_TEMP":5,"Signal001_PRES":98, ,"Signal001_ENT":3}]
[{"t":1611064750943,"Signal033":"T","Signal003_MODE":1,"Signal003_TARGET":5170,"Signal001_ONOFF":19,"Signal001_TEMP":91.28,"s":6445,"Signal033_IRV":[12],"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":2,"Signal003_TARGET":5171,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":100,"Signal033_IRV":[12],"t":1611064750943,"Signal003_ENT":3}]
[{"t":1611064750943,"Signal065":"T","Signal003_MODE":3,"Signal003_TARGET":5172,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":1000,"Signal033_IRV":[12],"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":5173,"Signal001_ONOFF":8,"Signal001_TEMP":9.04,"s":100,"Signal033_IRV":[12],"t":1611064750943,"Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":7116,"Signal001_ONOFF":12,"Signal001_TEMP":9.04,"s":10,"Signal033_IRV":[14,12,37,41],"t":1611064750943,Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":961,"Signal001_ONOFF":19,"Signal001_TEMP":9.04,"s":4,"Signal033_IRV":[14,12,37,41],"t":1611064750943,Signal003_ENT":3},{"Signal003":"T","Signal003_MODE":1,"Signal003_TARGET":962,"Signal001_ONOFF":19,"Signal001_TEMP":9.04,"s":10,"Signal033_IRV":[14,12,37,41],"t":1611064750943,"Signal003_ENT":3}]

On your frontpage, I saw the example and if understood it correctly, the message should be in byte format, hence I understand the TypeError Im getting:

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

Would it be possible to parse a string message within the cysimdjson library as we have no influence on the type of the pushed message from the websocket? I hope the question is not too off, but since websockets libraries in conjunction with json formats are widely used, I was thinking the problem might be worth looking into. Also, I can imagine that one could do the string conversion somehow in python, but C speed would probably be affected. At the moment, I use orjson, which is pretty fast and working well, but looking at your results, it ignites the interest. Regardless of the answer, thank you for the efforts with the library.

Accessing results outside of scope where parser was referenced leads to segfault?

I found this experimentally: is it assumed that access to the results are only valid as long as the parser object lives / is in scope?

I noticed that when I have a function that instantiates the parser and returns the result of the parse output I will regularly get segfaults. This isn't very Python and might need a bigger warning and/or some utilities to assist managing the lifecycle ...

Documentation for Pythonic drop-in API is wrong

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access using JSON Pointer
print(json_parsed.json_parsed['foo'])
  1. passing bytes to parser.loads returns a TypeError: the param is expected to be str https://github.com/TeskaLabs/cysimdjson/blob/main/cysimdjson/cysimdjson.pyx#L454
  2. even passing str json_parsed has no json_parsed element, like it is written in the print example

I ended up doing something like:

parser.parse(payload).get_value()

I still have to validate that's what I need :)
(at the moment I'm parsing only and not accessing the object)

Error when installing 21.11b2 from pypi

Hi.
When i try to install latest version from pypi i get this error: ValueError: 'cysimdjson/cysimdjson.pyx' doesn't match any files
Looks like cysimdjson.pyx is missing.

Readme example raises TypeError

Example given in readme throws the following error:

    json_parsed = parser.loads(json_bytes)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Argument 'content' has incorrect type (expected str, got bytes)

Example copied from read me that throws this error:

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access in a Python way
print(json_parsed.json_parsed['foo'])

Installation error

No matter where I try to install it, I get the following error...

$ pip3 install git+https://github.com/TeskaLabs/cysimdjson.git
çCollecting git+https://github.com/TeskaLabs/cysimdjson.git
  Cloning https://github.com/TeskaLabs/cysimdjson.git to /tmp/pip-req-build-p05qbrke
  Running command git clone -q https://github.com/TeskaLabs/cysimdjson.git /tmp/pip-req-build-p05qbrke
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmppry884i2 get_requires_for_build_wheel /tmp/tmpexwmoc4h
       cwd: /tmp/pip-req-build-p05qbrke
  Complete output (39 lines):
  Compiling cysimdjson/cysimdjson.pyx because it changed.
  [1/1] Cythonizing cysimdjson/cysimdjson.pyx
  running egg_info
  creating cysimdjson.egg-info
  writing cysimdjson.egg-info/PKG-INFO
  Traceback (most recent call last):
    File "/tmp/tmppry884i2", line 280, in <module>
      main()
    File "/tmp/tmppry884i2", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/tmp/tmppry884i2", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 149, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 130, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 253, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 145, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 29, in <module>
      setup(
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 292, in run
      writer(self, ep.name, os.path.join(self.egg_info, ep.name))
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 628, in write_pkg_info
      metadata.write_pkg_info(cmd.egg_info)
    File "/usr/lib/python3.8/distutils/dist.py", line 1117, in write_pkg_info
      self.write_pkg_file(pkg_info)
    File "/tmp/pip-build-env-ctarly3y/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 165, in write_pkg_file
      for project_url in self.project_urls.items():
  AttributeError: 'set' object has no attribute 'items'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmppry884i2 get_requires_for_build_wheel /tmp/tmpexwmoc4h Check the logs for full command output.

Can be reproduced with the following container...

FROM ubuntu:20.10
 RUN apt-get update -qq
 RUN apt-get install -y vim valgrind golang llvm gdb lldb clang-format sudo pip python python-dev wget cmake g++ g++-9 git clang++-9 linux-tools-generic ruby ruby-dev python3-pip  libboost-all-dev git
 RUN  pip3 install ipython



Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.