A Scala JSON parsing benchmark experiment

Introduction

JSON data reading and writing is an "exciting" subject in Scala, because there are many competing libraries, and it seems all of them refuse to die. This project seeks to make another benchmark from these projects, more specifically from Rapture, Lift, Spray, Play and Json4s.

This project also demonstrates how to use these different JSON libraries to deserialize a Scala case class. Of course, we also provide the code to run the benchmark and to generate test data, that might be of use.ja

About the libraries

Some of these libraries (Rapture and Json4s) use Jackson, the Java JSON-parsing library. Lift has a custom native parser. Spray was based on Parboiled, but moved in late 2014 to a custom native parser too, acquiring a huge performance improvement. All of them offer similar JSON ASTs to handle the parsed data, usually with some way to query the objects, and also some way to generate and extract JSON objects from a case class definition.

It seems from the documentation that Play took their time to offer a nice automatic way to extract case classes instead of forcing you to write code to pick the values from each field. But now all these libraries work in similar ways, although there is still some room for choosing one of them to your project based on taste or need. They have different subtle restrictions on what kind of implicit you have to define in your classes for the extraction to work. And there are other practical concerns, of course. If you are using Play, Lift or Spray you probably would like to stick to their native JSON handling libraries, and some libraries are not available in just any platform.

Experiment

In our study here we were only concerned with a "big data" scenario. We have a large file with thousands of JSON objects, one per text line, and we just wanted to see which library could read them into case classes faster.

The ScalaJsonBenchmark.scala file contains classes with JSON parsers created using each of the tested libraries. That file also carries out the experiment, reading the input file to memory first as an Array[String]. This array is then processed 11 times with each parser. The parsers are executed in sequence at each of these iterations, and not just one parser at a time.

The time taken in the exeution was measured using System.currentTimeMillis(), which is not the perfect method, but was good enough to provide some interesting results. This experiment was performed in April 2015 with an old lousy notebook computer (4GB RAM + Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz ) and with Scala 2.11.6, using a jar file created with sbt-assembly.

Results from set 1

These are the test results from birds.dat, a file with roughly 10 megabytes. (We used to call that a "box of diskettes" back in the day...) This file is provided in this project for you to reproduce the experiment. We also provided the program used to generate this data from the model classes (Bird and Place) with a bunch of random data generating functions. The tables below show the minimum, median and maximum times in milliseconds taken by each library to parse all the data in the 11 iterations.

First run:

library	min	med	max
Play	759	982	4906
Lift	563	813	2125
Json4s	667	746	3012
Rapture	472	583	2501
Spray	440	544	2537

Second run:

library	min	med	max
Lift	575	1028	3090
Play	782	892	3693
Json4s	607	806	2462
Rapture	495	707	2668
Spray	439	539	1753

Results from set 2

We have also performed the same test with another file with 4MB and about 10.000 lines. The objects were similar in having a lot of fields with different types, some of them optional, and some of them were lists of objects.

First run:

library	min	med	max
Rapture	1113	1754	4229
Json4s	438	893	2221
Play	471	612	3318
Spray	516	582	2048
Lift	310	375	1882

Second run:

library	min	med	max
Rapture	1014	1566	4452
Spray	470	661	1659
Play	507	645	2490
Json4s	390	520	3924
Lift	340	444	2374

Conclusion

This experiment was quite surprising because while the performance from the libraries was consistent in different runs on the same test file, they performed quite differently on the two test cases. More specifically, Lift was kind of bad at parsing the birds.dat, but it kicked ass at the secret data file. At the same time Rapture showed the opposite performance. We haven't figured out yet what difference in the models might have caused this effect.

As for the other libraries, Spray, Play and Json4s all exhibited a roughly similar performance.

This study has demonstrated the great level of maturity being reached by all of these projects. It is pretty hard today to defend any of the based solely on speed. And also there is no obvious answer to which might be the fastest in any case. Performance may differ, but it may depend on the specific application.

Jury is still out on which library provides the best interface. But after this experiment I intend to stop caring about any library having a much worse performance, which was indeed a problem back in the day — the Scala native parser still seems to be horrid. We should just concern ourselves with the code now, as the necessary performance improvements have apparently been applied to these projects already.

nlw0 / scala-json-benchmark Goto Github PK

scala-json-benchmark's Introduction

A Scala JSON parsing benchmark experiment

Introduction

About the libraries

Experiment

Results from set 1

Results from set 2

Conclusion

scala-json-benchmark's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs