GithubHelp home page GithubHelp logo

jsoncollectionparser's Introduction

JsonCollectionParser

Build Scrutinizer Code Quality Code Climate Coverage Status

GitHub tag Packagist Minimum PHP Version License

Event-based parser for large JSON collections (consumes small amount of memory). Built on top of JSON Streaming Parser

This package is compliant with PSR-4 and PSR-12 code styles and supports parsing of PSR-7 message interfaces. If you notice compliance oversights, please send a patch via pull request.

Installation

You will need Composer to install the package

composer require maxakawizard/json-collection-parser:~1.0

Input data format

Data must be in one of following formats:

Array of objects (valid JSON)

[
    {
        "id": 78,
        "title": "Title",
        "dealType": "sale",
        "propertyType": "townhouse",
        "properties": {
            "bedroomsCount": 6,
            "parking": "yes"
        },
        "photos": [
            "1.jpg",
            "2.jpg"
        ],
        "agents": [
            {
                "name": "Joe",
                "email": "[email protected]"
            },
            {
                "name": "Sally",
                "email": "[email protected]"
            }
         ]
    },
    {
        "id": 729,
        "dealType": "rent_long",
        "propertyType": "villa"
    },
    {
        "id": 5165,
        "dealType": "rent_short",
        "propertyType": "villa"
    }
]

Sequence of object literals:

{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
}
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}

Sequence of object and array literals:

[[{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
}]]
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
[{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}]

Sequence of object and array literals (some of objects in subarrays, comma-separated):

[
{
    "id": 78,
    "dealType": "sale",
    "propertyType": "townhouse"
},
{
    "id": 729,
    "dealType": "rent_long",
    "propertyType": "villa"
}
]
{
    "id": 5165,
    "dealType": "rent_short",
    "propertyType": "villa"
}

Usage

Function as callback:

function processItem(array $item)
{
    is_array($item); //true
    print_r($item);
}

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', 'processItem');

Closure as callback:

$items = [];

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', function (array $item) use (&$items) {
    $items[] = $item;
});

Static method as callback:

class ItemProcessor {
    public static function process(array $item)
    {
        is_array($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->parse('/path/to/file.json', ['ItemProcessor', 'process']);

Instance method as callback:

class ItemProcessor {
    public function process(array $item)
    {
        is_array($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$processor = new \ItemProcessor();
$parser->parse('/path/to/file.json', [$processor, 'process']);

Receive items as objects:

function processItem(\stdClass $item)
{
    is_array($item); //false
    is_object($item); //true
    print_r($item);
}

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects('/path/to/file.json', 'processItem');

Receive chunks of items as arrays:

function processChunk(array $chunk)
{
    is_array($chunk);    //true
    count($chunk) === 5; //true

    foreach ($chunk as $item) {
        is_array($item);  //true
        is_object($item); //false
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->chunk('/path/to/file.json', 'processChunk', 5);

Receive chunks of items as objects:

function processChunk(array $chunk)
{
    is_array($chunk);    //true
    count($chunk) === 5; //true

    foreach ($chunk as $item) {
        is_array($item);  //false
        is_object($item); //true
        print_r($item);
    }
}

$parser = new \JsonCollectionParser\Parser();
$parser->chunkAsObjects('/path/to/file.json', 'processChunk', 5);

Pass stream as parser input:

$stream = fopen('/path/to/file.json', 'r');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($stream, 'processItem');

Pass PSR-7 MessageInterface as parser input:

use Psr\Http\Message\MessageInterface;

/** @var MessageInterface $resource */
$resource = $httpClient->get('https://httpbin.org/get');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($resource, 'processItem');

Pass PSR-7 StreamInterface as parser input:

use Psr\Http\Message\MessageInterface;

/** @var MessageInterface $resource */
$resource = $httpClient->get('https://httpbin.org/get');

$parser = new \JsonCollectionParser\Parser();
$parser->parseAsObjects($resource->getBody(), 'processItem');

Supported formats

  • .json - raw JSON
  • .gz - GZIP-compressed JSON (you will need zlib PHP extension installed)

Supported sources

  • file
  • string
  • stream / resource
  • HTTP message interface PSR-7

Running tests

composer test

License

This library is released under MIT license.

jsoncollectionparser's People

Contributors

jasonhebert avatar liambest avatar maxakawizard avatar peterpp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsoncollectionparser's Issues

Support for list of objects {}, {},...

I have some json data that is presented in the format of object lists e.g.:
{object},
{object},

So in order to process htme i have to sed add [ at beginning and end ]

It seems very trivial, so it would be great if the parse could support files that are not wrapped in an array but otherwise perfectly represent the format.

Thanks

Support for document streams, objects, objects in subarrays

Currently the \JsonCollectionParser\Listener supports only:

  • objects inside array: [ { } , { } , { } , ... ]

It seems reasonable to add support for more forms of input data, at least object-oriented:

  • object: { }
  • stream of objects: { } { } { } ...
  • objects in subarrays: [ [ { } , { } , ... ] ]
  • and combination of above, e.g: [ { } , { } ] { }

Except for the first case, the data is actually a concatenation of json documents, a frequent case when working with stream data.

The underlying library unfortunately didn't support for multiple documents, hence this PR:
salsify/jsonstreamingparser#60

After and if it's accepted, please check out and evaluate this branch: https://github.com/OnkelTem/JsonCollectionParser/tree/documents-stream-support
It implements the mentioned cases.

Input data format descend

Hello,

Is it possible to descend in JSON structure? Suppose I have data format like:

         {
            "objects": [
                {
                    "uid": 1,
                    "name": "Name 1"
                },
                {
                    "uid": 2,
                    "name": "Name 2"
                }
            ]
        }

So I can walk through the objects items
Thank you

PHP8 support

Hi ๐Ÿ‘‹

I've noticed that support for PHP8 is implemented, but not released even though CHANGELOG mentions v1.8.0.

Just wanted to see if there is a plan to release a new version with PHP8 support soon?

Thanks for your work on this great package! Cheers!

500mb json file timeout error

I have a 500mb json file, which is also available here (https://archive.scryfall.com/json/scryfall-all-cards.json) When I'm trying to process this file I need to raise max execution time to something like 1000 seconds as otherwise I receive timeout in the salsify parser.

Is there any way to solve this? I mean it's great, that the files is not loaded into memory, but many servers don't allow for raising execution time to insane values neither?

The error hits at Parser.php Line 152, 197, or 201.

Thanks

Support for parsing progress

JsonStreamingParser is able to report progress of file parsing. It will be really handful to have this progress, because processing of large file can take a lot of time.

The only what you need is to add function "filePosition" to your Listener and add propagate progress values to callback (or add another callback).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.