Comments (9)
I also compared QJson with Qt-Json and it seems Qt-Json is much faster. Here is my results:
QJson:
RESULT : TestJsonSerializer::deserializingBenchmark():
219 msecs per iteration (total: 219, iterations: 1)
QtJson:
RESULT : TestJsonSerializer::deserializingBenchmark():
38 msecs per iteration (total: 38, iterations: 1)
Test data is too big to attach, it is ~420Kb JSON file
from qjson.
I'm aware of that, sometimes ago I performed the same comparison. I noticed Qt-Json doesn't use any Qt class internally; last time I looked it used just plain c (maybe I'm wrong/the code changed in the meantime).
I did some profiling using valgrind but I have not been able to find the reason of the bottleneck. My guess is that using QVariant
inside of the parser slows down the execution quite a bit.
Do you have any hint?
from qjson.
After looking at the sources of QJson the only thing I might suspect is bison lexer, through didn't precisely looked at generated code yet.
from qjson.
I have been working on this issue lately and I'd like to share some of the results here.
Profiling
As you already have done, I profiled qjson in valgrind to see if something caught my eye. Specifically I've been running a slightly modified version of cmdline_tester.cpp
that parses the "Hard Times" sample 10 times through callgrind; then I used kcachegrind for analysis.
I couldn't find evidence of a specific issue either but that was more or less expected: qjson is a parser, so any subtlety throughout any core function could be important since they normally have very high call counts.
Anyway I did notice some interesting facts (see image below):
- about 50% of the time is spent in Bison stack handling methods (
yy::stack*
,yy:slice*
andyypop
). I think this is at least in part due to the fact that QVariants usually take up 12-bytes, so some copying is involved; - scanning took about 30% of overall time (
yy::yylex()
). I felt that it was a little bit much for the relatively simple JSON lexicon; - about 14% of the time spent in
yy::yylex()
isn't actually spent in scanning (JSonScanner::yylex
).
Ideas
Analyzing qjson gave me some ideas for optimizations. These include:
- analyzing scanner code to see where that 14% extra time goes;
- further analyzing and modifying (or replacing) the scanner to be faster;
- shifting some responsibilities from parser to scanner, eg. using a single token for entire strings (quote marks included) and one for entire numbers, leaving only recursive aspects to be dealt with by the parser (arrays and dictionaries). This should allow to use more non-QVariant types (at the scanner level), and also to reduce the number of expansions, thus reducing stack use;
- refactoring the grammar itself.
Optimizing qjsonDebug()
Regarding yy::yylex()
it turns out that time not spent in scanning goes in outputting debug information. These are the relevant lines in json_parser.yy:
...
int ret = scanner->yylex(yylval, yylloc);
qjsonDebug() << "json_parser::yylex - calling scanner yylval==|" << yylval->toByteArray() << "|, ret==|" << QString::number(ret) << "|";
return ret;
Calls to toByteArray()
and QString::number
do have some impact on overall parsing performance because they are called once every recognized token. This happens even if debugging information logging is turned off, because unprinted strings have to be computed anyway, see qjson_debug.h:
#ifdef QJSON_VERBOSE_DEBUG_OUTPUT
inline QDebug qjsonDebug() { return QDebug(QtDebugMsg); }
#else
inline QNoDebug qjsonDebug() { return QNoDebug(); }
#endif
I changed the above code with the following version:
#ifdef QJSON_VERBOSE_DEBUG_OUTPUT
inline QDebug qjsonDebug() { return QDebug(QtDebugMsg); }
#else
#define qjsonDebug() if(false) QDebug(QtDebugMsg)
#endif
Results:
silvio@linux-7z9g:~/software/qjson> ./tests/cmdline_tester/cmdline_tester Hard_Times-20101226190115.json
Parsing of "Hard_Times-20101226190115.json" took 344 ms
Parsing of "Hard_Times-20101226190115.json" took 344 ms
Parsing of "Hard_Times-20101226190115.json" took 346 ms
Parsing of "Hard_Times-20101226190115.json" took 346 ms
Parsing of "Hard_Times-20101226190115.json" took 345 ms
Parsing of "Hard_Times-20101226190115.json" took 347 ms
Parsing of "Hard_Times-20101226190115.json" took 346 ms
Parsing of "Hard_Times-20101226190115.json" took 345 ms
Parsing of "Hard_Times-20101226190115.json" took 344 ms
Parsing of "Hard_Times-20101226190115.json" took 345 ms
MEAN Parsing of "Hard_Times-20101226190115.json" took 345.2 ms
JOB DONE, BYE
silvio@linux-7z9g:~/software/qjson> ./tests/cmdline_tester/cmdline_tester Hard_Times-20101226190115.json
Parsing of "Hard_Times-20101226190115.json" took 285 ms
Parsing of "Hard_Times-20101226190115.json" took 285 ms
Parsing of "Hard_Times-20101226190115.json" took 282 ms
Parsing of "Hard_Times-20101226190115.json" took 284 ms
Parsing of "Hard_Times-20101226190115.json" took 283 ms
Parsing of "Hard_Times-20101226190115.json" took 298 ms
Parsing of "Hard_Times-20101226190115.json" took 290 ms
Parsing of "Hard_Times-20101226190115.json" took 307 ms
Parsing of "Hard_Times-20101226190115.json" took 284 ms
Parsing of "Hard_Times-20101226190115.json" took 285 ms
MEAN Parsing of "Hard_Times-20101226190115.json" took 288.3 ms
JOB DONE, BYE
I am not sure this is the prettiest (nor the most appropriate) way to solve the problem, so I'm asking for your comments about it.
Replacing the scanner
Since I am a fan of regular expressions I felt that the scanner should really have regular expressions support. I also wanted something quite fast, so I decided to replace the scanner with a Flex-based one.
To make things simpler I decided to write it in two steps: a first drop-in replacement of the current scanner (without any parser modification) and a second, more refined version also changing the scanner/parser interface.
To minimize regressions, I also added unit tests for JSonScanner with unit tests covering every token type and most error conditions.
Obviously the current scanner passes all the tests.
Then I re-wrote the scanner in Flex and ensured that all unit tests (scanner-wise and parser-wise) were still passing. I also run a (rather crude) integration test with a script that runs cmdline_parser on a directory of sample files, checks out the new scanner, re-builds and re-runs cmdline_parser checking for output differences.
Finally I re-checked for performance gains:
silvio@linux-7z9g:~/software/qjson2> ./tests/cmdline_tester/cmdline_tester testfiles/Hard_Times-20101226190115.json
Parsing of "testfiles/Hard_Times-20101226190115.json" took 266 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 268 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 269 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 268 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 267 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 269 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 267 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 273 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 268 ms
Parsing of "testfiles/Hard_Times-20101226190115.json" took 269 ms
MEAN Parsing of "testfiles/Hard_Times-20101226190115.json" took 268.4 ms
Conclusions & Next Steps
The new Flex scanner together with qjsonDebug()
optimization reduces parsing time by about 22% in my tests, which I think is already a positive result considering that no bisons were hacked in the process.
As a next step I'm going to modify both parser and scanner, hoping to get some extra speedup.
Any comments are welcome!
from qjson.
Hi Silvio,
first of all thanks for your efforts!
I think we can use the new version of qjsonDebug()
. I don't care if it's ugly since it improves the overall performance.
I also appreciate the new scanner and its unit tests, it looks great.
I'm looking forward your next commits :)
from qjson.
In this commit number interpretation code has been moved from parser to scanner.
Results are encouraging:
Parsing of "Hard_Times-20101226190115.json" took 105 ms
Parsing of "Hard_Times-20101226190115.json" took 102 ms
Parsing of "Hard_Times-20101226190115.json" took 103 ms
Parsing of "Hard_Times-20101226190115.json" took 104 ms
Parsing of "Hard_Times-20101226190115.json" took 102 ms
Parsing of "Hard_Times-20101226190115.json" took 104 ms
Parsing of "Hard_Times-20101226190115.json" took 102 ms
Parsing of "Hard_Times-20101226190115.json" took 105 ms
Parsing of "Hard_Times-20101226190115.json" took 103 ms
Parsing of "Hard_Times-20101226190115.json" took 104 ms
MEAN Parsing of "Hard_Times-20101226190115.json" took 103.4 ms
JOB DONE, BYE
That is, parsing time time has been reduced by 70%.
Note that performance improvement in this case is quite good because the file contains a lot of numbers - in some other tests I could measure differences ranging from 30% to 90%.
Next step: strings should get the same treatment.
Flavio: should I send you pull requests?
from qjson.
That's really really good. Please send a pull request and keep working inside of this branch.
I'll merge the code into master as soon as all the changes are in place.
from qjson.
Now strings got the same treatment. Obviously "Hard times" did not get any better, but some other string-heavy sample did, like this one.
Before:
Parsing of "performance_test_files/citylots-small.json" took 1297 ms
Parsing of "performance_test_files/citylots-small.json" took 1375 ms
Parsing of "performance_test_files/citylots-small.json" took 1364 ms
Parsing of "performance_test_files/citylots-small.json" took 1371 ms
Parsing of "performance_test_files/citylots-small.json" took 1356 ms
Parsing of "performance_test_files/citylots-small.json" took 1354 ms
Parsing of "performance_test_files/citylots-small.json" took 1355 ms
Parsing of "performance_test_files/citylots-small.json" took 1356 ms
Parsing of "performance_test_files/citylots-small.json" took 1373 ms
Parsing of "performance_test_files/citylots-small.json" took 1357 ms
MEAN Parsing of "performance_test_files/citylots-small.json" took 1355.8 ms
JOB DONE, BYE
After:
Parsing of "performance_test_files/citylots-small.json" took 781 ms
Parsing of "performance_test_files/citylots-small.json" took 860 ms
Parsing of "performance_test_files/citylots-small.json" took 853 ms
Parsing of "performance_test_files/citylots-small.json" took 848 ms
Parsing of "performance_test_files/citylots-small.json" took 845 ms
Parsing of "performance_test_files/citylots-small.json" took 847 ms
Parsing of "performance_test_files/citylots-small.json" took 851 ms
Parsing of "performance_test_files/citylots-small.json" took 849 ms
Parsing of "performance_test_files/citylots-small.json" took 849 ms
Parsing of "performance_test_files/citylots-small.json" took 849 ms
MEAN Parsing of "performance_test_files/citylots-small.json" took 843.2 ms
JOB DONE, BYE
Now I'm attacking the core grammar rules to see if any speedup can be obtained from arrays and objects. I'll keep you posted as soon as there are news.
from qjson.
Today I finished refactoring the parser with left-recursive production rules both for arrays and objects. The result is a significant, but not great, time reduction: about 15% in "Hard Times" up to 25% in "citylots-small". Some memory is saved as well but it's not substantial since most of it is used for input buffering and not for parsing.
Regardless of performance considerations I think the new parser is easier to understand, even if it adds a little extra complexity because it uses QVector*
and QVariantMap*
internally to avoid needless copying.
I think that in order to get more significant speedups QVariants
and/or QVariantLists
should be abandoned, since according to my tests more appropriate data structures, such as unions
and QLinkedLists
, could be much beneficial performance-wise. That, however, would mean to change the API significantly, and to lose much of its convenience - all in all I don't think it's worth it.
I will send a pull request soon - meanwhile please send me any test json file you have around so that I can do some extra regression testing!
from qjson.
Related Issues (20)
- Mac CMake
- Test failure: too large exponential: Number is out of range. HOT 4
- New release 0.9.1
- Build help: QJsonObject on Ubuntu 16.04 HOT 1
- Number is out of range: 19 HOT 1
- Add BOM support
- Non-stored Properties are included by QObjectHelper::qobject2qvariant
- Please review PR 107
- How can i do cmake with cross-compiler
- illegal number with scientific notation such as 1.79769313486232e+308
- 2/4 Test #2: testscanner ......................***Failed 0.01 sec [[email protected]] HOT 2
- Qt6 support HOT 3
- Encode String not behaving well with special characters.
- UTF8 QString parse problem
- Can you change the License to MIT? HOT 2
- Error compiling with Mingw 64 HOT 2
- Wrong indenting
- Request for Qt 5 release HOT 1
- CMake config depends on qtgui, while the library actually doesn't need it
- API/ABI changes review for QJson
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qjson.