GithubHelp home page GithubHelp logo

jsonreader's Introduction

JsonReader

CI Status License Latest Stable Version

This is a streaming pull parser - like XMLReader but for JSON.

Requirements

PHP 7.3 or higher with the Intl extension.

Installation

To install with composer:

composer require pcrov/jsonreader

Usage

JsonReader's interface and behavior is very much like XMLReader. If you've worked with that then this will feel familiar.

For examples and API documentation see the wiki.

Note

Only UTF-8 encoded JSON is supported. If you need to parse JSON in another encoding see Handling Non UTF-8 Encodings on the wiki.

jsonreader's People

Contributors

pcrov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonreader's Issues

Large Json file

Hi,

How to deal with large json files? (2GB json file). My php file gets 'Killed' by the console.
This is my code:

`<?php

require 'vendor/autoload.php';

use pcrov\JsonReader\JsonReader;

$reader = new JsonReader();
$reader->open("2018.json");

while($reader->read()) {

    print_r($reader->value()); 

}

$reader->close();
?>`

Length of File

Is it possible to know how large the file is and where the stream is currently? If I know the total size of the file and the position of the stream I could calculate a progress bar or something similar.

Handling very large JSON files

I need to be able to process 1GB JSON files, one node at a time. I was recommended this package as a solution, but I can't seem to get it to work the way I expect it to. Does this only pull out specific fields, rather than allow you to parse a huge file one node at a time?

Thanks for any help.

Should JsonReader return JSON number values as strings?

The JSON specification has a "number" type. It makes no distinction between integers and floats, nor does it set limits on their range or precision though it allows implementations to do so.

Currently JsonReader does just that - automatically casts numbers to the appropriate PHP type, float or int, while imposing the limits associated with them. While this is generally useful it might be limiting to make this decision for a consumer who may prefer to work with numbers in a more precise way, e.g. via BC Math or GMP.

No type information would be lost as $reader->getNodeType() would still return JsonReader::NUMBER as expected, and the existing automatic cast would be easy enough for a user to do, e.g. $value = +$reader->getValue() (yes, this is what the lexer is doing now.)

PHP's json_decode somewhat addresses part of the issue by allowing the JSON_BIGINT_AS_STRING option to return numbers greater than PHP_INT_MAX as strings instead of casting them to floats, but this is of limited help and the type you end up with still varies depending on the value.

In any case I'm not interested in adding an option for the behavior. Sane behavior needs no option and I'm sure there's a right way to go - I just don't know which yet.


Example:

use pcrov\JsonReader\JsonReader;

$reader = new JsonReader();
$reader->json('[42, 42.0, "42", "42.0"]');

while ($reader->read()) {
    switch ($reader->getNodeType()) {
        case JsonReader::NUMBER:
            echo "Number: ";
            var_dump($reader->getValue());
            break;
        case JsonReader::STRING:
            echo "String: ";
            var_dump($reader->getValue());
            break;
    }
}
$reader->close();

Current result:

Number: int(42)
Number: float(42)
String: string(2) "42"
String: string(4) "42.0"

Proposed:

Number: string(2) "42"
Number: string(4) "42.0"
String: string(2) "42"
String: string(4) "42.0"

Search file for ID and return full value of matching objects

Hi there and first of all, thank you for this amazing parser. A true life saver.

I'm currently trying to fully understand how it works, but have run into a problem that I can't really figure out how to solve.

I have a large Json file that looks like this (part of it):

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2568,
    "name": "Andy",
    "parentCategory": 2552,
    "url": "andy",
    "dateUpd": "2016-06-23 13:55:13",
    "dateAdd": "2016-05-13 11:29:32",
    "urlImages": [
      "http://imageurl.com/2568_header.jpg",
      "http://imageurl.com/2568_menu.jpg",
      "http://imageurl.com/2568_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

What I'm trying to do is search through this file after all instances where "parentCategory" equals 2570 and then print/echo the whole object that this ID is part of.

So far, this is what I've got:

$reader = new JsonReader();
$reader->json($json);

while($reader->read("parentCategory")) {
    $parentID = $reader->value();
    if ($parentID == 2570) {
      echo $reader->value()."\n";
    }
}
$reader->close();

This prints the parentCategory ID, but what I need is to be able to use the parentCategory name and value to identify the whole object it belongs to and in the end return the following:

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

Is this achievable with your parser?

Thank you so much for any help you can give me!

Feature - Skip file content

Hello,

Thank you for this great package. Im trying to use it to read a .json generated by phpmyadmin. It have several comments at the top of the file /* */ and // which throw an exception and no way to remove them when the file is generated, except by updating the code source of phpmyadmin.

pcrov\JsonReader\Parser\ParseException : Line 1: Unexpected '/'.

Is there a way to skip a part of the file or to allow comment ?

Thanks

Add support for PSR7 StreamInterface

Is it possible to extend the API with a new method to use a PSR7 StreamInterface.
This would make it easier to parse the JSON from the request response with HTTP Clients like Guzzle.
My currenty workarround contains a StreamWrapper to map from a StreamInterface to a resource and it took me a while to find the following solution:

use GuzzleHttp\Psr7\StreamWrapper;
use pcrov\JsonReader\JsonReader;

$stream = $response->getBody();
$resource = StreamWrapper::getResource($stream);
$reader = new JsonReader();
$reader->stream($resource);

This could be realised with a new InputStream implementation for the PSR7 StreamInterface.

Example: how to iterate an unnamed array of objects

Thanks for JsonReader.

Below is very typical for a response from an API or contents of a JSON file, in my case historical weather data, with this structure.

[
    { object 1 },
    { object 2 },
    ...
]

An example showing how to iterate through this so that each value would be each top object would be very nice. At least I couldn't figure it out. What's in each object varies, so each value should be the complete (sub) object for later analysis.

When I tried JsonReader::ARRAY, then value used up all memory (100 MB JSON with ~260000 objects (30 years * 365 days * 24 hours)).

Thanks in advance.

Empty arrays throw MalformedJsonException

If a JSON object contains a key, whose value is an empty array, MalformedJsonException is thrown

Adding the following to readValue() resolves the issue

 if ($character == ']') {
            $this->structStack->top()->setState(self::STATE_ARRAY_END);
            $this->value = null;
            return;
}

next() should skip the current array or object's end node

Currently calling next() while on an object or array takes you to the end of that object or array. In a certain way this makes sense but it's not intuitive and differs from the behavior of XMLReader which will skip the current item's associated end node.

Maybe the wrong place -- Wiki Edit

I'm kind of a newbie in github,
Thanks a lot for this amazing script, instead of 20s loading time it takes less than a second!

I think many people will enjoy this following example so please consider adding to wiki (as I can't seem to be able to) -- for some reason this is missing from the examples yet it should be basic:

Found part of it in stackoverflow and it is modified a bit
`/* data.json

[{"event_id":"943815","name":"AT&T Byron Nelson Golf Championship - Thursday Competition","url":"http://www.ticketmonster.com/buy-AT&T-Byron-Nelson-Golf-Championship---Thursday-Competition-tickets-at-TPC-Four-Seasons-Resort-Las-Colinas-Irving-Texas-05-18-2017/943815?aid=20770","datetime":"2017-05-18T08:00:00Z","performers":[{"performer_id":"24797"},{"performer_id":"1714920"}],"categories":[{"category_id":"1"},{"category_id":"10"}],"venue":{"city":"Irving","state":"Texas","name":"TPC Four Seasons Resort Las Colinas","venue_id":"20017"}},{"event_id":"993368","name":"Scranton/Wilkes-Barre RailRiders vs. Pawtucket Red Sox","url":"http://www.ticketmonster.com/buy-Scranton/Wilkes-Barre-RailRiders-vs.-Pawtucket-Red-Sox-tickets-at-PNC-Field-Scranton-Pennsylvania-05-18-2017/993368?aid=20770","datetime":"2017-05-18T10:35:00Z","performers":[{"performer_id":"15966"},{"performer_id":"3629"}],"categories":[{"category_id":"1"},{"category_id":"5"}],"venue":{"city":"Scranton","state":"Pennsylvania","name":"PNC Field","venue_id":"12805"}}]

*/

$reader = new JsonReader();
$reader->open("data.json");

$reader->read(); // Outer array.
$depth = $reader->depth(); // Check in a moment to break when the array is done.

$reader->read(); // Step to the first object.
while ($reader->next() && $reader->depth() > $depth) {// Read each sibling.
print_r($reader->value());
}

$reader->close();`

Decrease peak memory usage when getting value of single JSON element

Thanks for so useful library to operate with large JSON objects in PHP! But I have a problem with large peak memory usage when using value() function.
For example, if I try to get value of one inner JSON element (array item) with length of 4,2 megabytes, peak memory usage is 257 megabytes, so more than 60 times larger than actual JSON data!

Can you recommend some ways to decrease peak memory usage with your library?

Composer not working

Hi,
The composer fails to install this package,

If you can have a look,

Thanks!

Incorrect control characters rejected in strings

RFC 7159 specifies that the disallowed control characters in JSON string are in the range of U+0000 through U+001F while the lexer is currently rejecting everything matched by IntlChar::iscntrl() which according to the manual includes a bit more:

  • ISO 8-bit control character (U+0000..U+001f and U+007f..U+009f)
  • IntlChar::CHAR_CATEGORY_CONTROL_CHAR (Cc)
  • IntlChar::CHAR_CATEGORY_FORMAT_CHAR (Cf)
  • IntlChar::CHAR_CATEGORY_LINE_SEPARATOR (Zl)
  • IntlChar::CHAR_CATEGORY_PARAGRAPH_SEPARATOR (Zp)

(Though testing shows that not all of these are actually matched.)

This is a regression introduced in 0.4.0.

Strictly enforce UTF-8

The only additional enforcement needed is within strings as invalid UTF-8 anywhere else will throw an exception anyway.

Make "ext-intl" dependency optional

I use your package as dependency of my package, and very often my users got confused about errors in composer, because their systems are missing of ext-intl extension. I know, that it's needed for encode symbols from JSON as UTF-8.

But in my situation all JSON files contain only ASCII characters, so UTF-8 decoder is not needed, but still required. And there are many other cases when UTF-8 decoding is not needed.

To solve this problem will be good to make "ext-intl" dependency optional, and throw an exception with describing needing of it only when we catch UTF-8 characters.

Or, as alternative, provide another clone of this package, but without "ext-intl" dependency.

What do you think about this idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.