Ijson is an iterative JSON parser with a standard Python iterator interface.
All usage example will be using a JSON document describing geographical objects:
{
"earth": {
"europe": [
{"name": "Paris", "type": "city", "info": { ... }},
{"name": "Thames", "type": "river", "info": { ... }},
// ...
],
"america": [
{"name": "Texas", "type": "state", "info": { ... }},
// ...
]
}
}
Most common usage is having ijson yield native Python objects out of a JSON stream located under a prefix. Here's how to process all European cities:
import ijson
f = urlopen('http://.../')
objects = ijson.items(f, 'earth.europe.item')
cities = (o for o in objects if o['type'] == 'city')
for city in cities:
do_something_with(city)
Sometimes when dealing with a particularly large JSON payload it may worth to not even construct individual Python objects and react on individual events immediately producing some result:
import ijson
parser = ijson.parse(urlopen('http://.../'))
stream.write('<geo>')
for prefix, event, value in parser:
if (prefix, event) == ('earth', 'map_key'):
stream.write('<%s>' % value)
continent = value
elif prefix.endswith('.name'):
stream.write('<object name="%s"/>' % value)
elif (prefix, event) == ('earth.%s' % continent, 'end_map'):
stream.write('</%s>' % continent)
stream.write('</geo>')
Ijson provides several implementations of the actual parsing in the form of backends located in ijson/backends:
yajl2_cffi
: wrapper around YAJL 2.x using CFFI, this is the fastest.yajl2
: wrapper around YAJL 2.x using ctypes, for when you can't use CFFI for some reason.yajl
: deprecated YAJL 1.x + ctypes wrapper, for even older systems.python
: pure Python parser, good to use with PyPy
You can import a specific backend and use it in the same way as the top level library:
import ijson.backends.yajl2_cffi as ijson
for item in ijson.items(...):
# ...
Importing the top level library as import ijson
uses the pure Python backend.
Python parser in ijson is relatively simple thanks to Douglas Crockford who invented a strict, easy to parse syntax.
The YAJL library by Lloyd Hilaiel is the most popular and efficient way to parse JSON in an iterative fashion.
Ijson was inspired by yajl-py wrapper by Hatem Nassrat. Though ijson borrows almost nothing from the actual yajl-py code it was used as an example of integration with yajl using ctypes.