chlumsky / json-cpp-gen Goto Github PK
View Code? Open in Web Editor NEWA generator of JSON parser & serializer C++ code from structure header files
License: Other
A generator of JSON parser & serializer C++ code from structure header files
License: Other
Currently the files outputted by the program are always saved with Unix-style line endings (LF). Instead, a global lineEndings
setting should be added to configuration with possible values NATIVE
, LF
, CRLF
. and possibly a value representing "same as configuration JSON file", although I'm not sure about the last one.
Not only are these functions archaic and overly complex (format string parsing etc.) but I have found that sscanf
in particular is extremely slow. This needs to be done away with ASAP. However, the implementation for floats is very complex. There should also be some way for the user to select an implementation or provide their own functions for number (de)serialization.
Currently, only a StringType
can be a key for object types. Make it possible to also use ConstStringType
as such.
The serializer function void write(const char *str);
could have the length of str
passed as a second argument, since it is a literal in the majority of the cases. This could very slightly improve performance since string length is almost surely computed when appending to the output JSON string. String API would have to be updated to allow appending a string with known length.
To avoid errors, I have put the const
keyword for references of uknown types to the safe spot after the type name. However, this is somewhat inconsistent and it is a general convention to have it at the beginning whenever possible. Since pointer and reference types are not supported and static array types do not use references (this part is resolved), I believe it would actually be safe to put const
back at the beginning without changing anything else. It also needs to be updated in the standard library API's and the Readme file. Is it a good idea though? How about normalizing it the other way around (always putting const
in this position)?
(Linux, gcc 9.4.0)
The project looks promising, I'd rather use a generator for serialization instead of code annotation and other weird hacks.
I'm using an enum as a struct field and it is not serializing / parsing as desired. The field is being ignored completely or always parsed as zero.
The desire is to get this:
{"_allkeys":[
{"_act":"NextImage","_keyval":11},
{"_act":"NextImage","_keyval":12},
{"_act":"PrevImage","_keyval":13}
]}
But I'm getting this:
{"_allkeys":[
{"_keyval":11},
{"_keyval":12},
{"_keyval":13}
]}
Note the act_
field, which is an enum, is missing.
I attach what I hope is a complete set of files to demo the problem.
For example:
struct Foo {
struct Bar {
int x;
} bar;
};
Support enumerations with an explicit base type, e.g.
enum Foo : short {
BAR
};
Simply skip the part between enum name and the opening brace.
Apparently it is possible to alias namespaces as
namespace standard_library = std;
namespace schrono = std::chrono;
Header parser would fail if this was encountered.
Parse the typedef
keyword and treat it as a valid type. Examples:
typedef std::string StringType;
typedef int FooString;
struct Foo {
typedef StringType FooString;
FooString bar;
};
Foo::bar
is std::string
and not int
!
If a "raw" string literal (R"(text)"
) was present in the parsed header files, its range may not be properly detected at least in some cases and it may break the header parser. Make sure this cannot happen.
Ignore final
in cases like
struct Foo final { };
Another idea I had was to support miscellaneous types in the structures, in different ways:
This way, a type in the structure would be converted from / to a type supported by the generator. For example a timestamp type, represented by a string in the JSON, and able to be converted from / to a std::string
. Each type like this would have an API template for both conversions.
Another possibility would be that the custom type would have to provide an implementation of the parse and serialize functions of itself. This option is just an idea and probably should not be implemented.
A sort-of hybrid between the two previous options (and raw JSON string #32) would be a type that would parse / serialize itself but not directly. Instead the parser / serializer would first construct a string object and use that to interface with the custom type's API. This would be another way to implement custom number types (#35) if they provide a conversion from / to string.
Make sure that code is generated properly even if input files or structures are not in the order they are used.
Look into
json-cpp-gen/src/HeaderParser.h
Line 35 in bb1a666
Add support for annotations in the input code. These would be specially-marked comments or parts of comments. They should allow:
class
as if it were a struct
.Many Settings
flags are currently ignored due to not being implemented yet. These include:
verboseErrors
checkMissingKeys
checkRepeatingKeys
nanPolicy
infPolicy
Some of the API commands, especially iteration for the serializer, tend to result in a long string of commands on a single line (example below). This is not only ugly but also inconsistent with the format of the rest of the generated code, which is formatted very strictly. Simply adding newlines would not be a solution because the indentation would still be messed up. I think it would be best if newlines and indentation would be added as a post-process step to pattern fill.
for (const std::pair<std::string, float> &i : value) { std::string const &key = i.first; float const &elem = i.second; if (prev) write(','); prev = true; serializeStdString(key); write(':'); serializeFloat(elem); }
The way parsers and serializers are generated could be changed so that they are configurable with macro definitions rather than settings for the generator, with a notable exception of noThrow
as that would result in a mess with function return types. This would also make it easier to check different versions of the generated functions as all would be visible simultaneously. The settings can be kept for default macro values, e.g.
#ifndef JSON_CPP_STRICT_SYNTAX_CHECK
#define JSON_CPP_STRICT_SYNTAX_CHECK 0
#endif
// In actual function:
#if JSON_CPP_STRICT_SYNTAX_CHECK
// ...
For checkMissingKeys
and checkRepeatingKeys
, this would work as
if (buffer == "firstKey") {
JSON_CPP_KEY_ENCOUNTERED(0, 0x00000001)
parseXYZ(value.firstKey);
continue;
}
with JSON_CPP_KEY_ENCOUNTERED
being defined at the beginning of the file to either nothing, just flagging doneKeys
, or also checking to throw a REPEATED_KEY
based on the configuration macros.
Since #1 will not be implemented for a while, there needs to be another way to resolve cases where JSON field names (or even enumeration values) are not valid C++ names, e.g. protected
(reserved keyword) or value-with-dashes
, or when the user wants the JSON names to be different for whatever reason.
The intended behavior is to skip any structure elements with unrecognized types. I believe this works fine if the unrecognized type is the direct type of the element, but if it is used within a template argument, the header parser fails.
For example:
struct Foo {
struct Bar {
int x;
};
};
using FooAlias = Foo;
In this case, we should be able to refer to FooAlias::Bar
but due to the way nested names and alias resolution is currently implemented, this is not possible.
The generated parser should report the postition within the input string if parsing fails. This is simply cur - jsonString
. To achieve this, Error
must be a structure containing both the error code and position. The current enumeration can be renamed to ErrorType
. Proposed error structure:
struct Error {
ErrorType type;
int position;
inline Error(ErrorType type = OK, int position = -1) : type(type), position(position) { }
operator ErrorType() const;
operator bool() const;
};
For serializers, reporting the source of error would be tricky because it is an element within a structure. Providing a pointer to the faulty element would be possible but probably not too helpful, because it isn't enough to easily find it within the structure tree. Still, serializers' error enumeration should also be renamed to ErrorType
for consistency (with a possibility of typedef ErrorType Error;
.
The source code of the following functions is exatly the same between the classes ObjectContainerType
and ObjectMapContainerType
:
generateParserFunctionBody
generateSerializerFunctionBody
generateClear
generateRefByKey
generateIterateElements
However, each class must have a different base class. I think this could be fixed with templates and if that fails, simply convert these to static in one class and expose them for the other to use.
Support the using
keyword for types, e.g.
struct Foo {
using String = std::string;
String bar;
};
Similar to #9. Probably don't bother with templated using
.
Currently, if the generated files are supposed to go in directories that do not exist yet, the program will fail, unable to create the file. Perhaps, instead the directory structure could be automatically created.
Detect and honor using namespace x;
when encountered, so that users can write e.g. string
instead of std::string
.
It may be slightly beneficial for performance if the generated parser class had a common string buffer member variable instead of temporaries in individual functions, namely:
json-cpp-gen/src/types/StructureType.cpp
Lines 24 to 25 in bb1a666
json-cpp-gen/src/types/EnumType.cpp
Lines 15 to 16 in bb1a666
I had a syntax error in my struct as so:
struct Foo {
enum Bar _bar;
};
and the error code in json-cpp-gen trying to parse this was "INVALID_STRUCT_SYNTAX". Should that be "INVALID_ENUM_SYNTAX"?
[HeaderParser.cpp, parseEnum()
, line 242]
For example:
struct Foo {
struct {
int x;
} bar;
};
Or perhaps even
struct Foo {
struct {
int x;
};
};
where x
would be referred to as Foo::x
.
Probably do together with #4.
Make it possible to generate parsers and serializers in inlined form that doesn't need to have its own translation unit. The user should be able to pick between the following modes:
Add a section to the configuration file for "typedefs", (e.g. artery::MemInt = std::ptrdiff_t
). This is especially useful before #9 is implemented, but even then it may be good for types included from libraries or conditional aliases.
Release the first official stable version along with a binary. Depending on the state the project is in, it could be version 1.0.0, or something like 0.9.0.
Add support for pair and tuple types like std::pair
, represented as a JSON array. The main challenge for std::tuple
would be the need to support variadic templates in header parser and custom type definition.
Implement the Configuration::GeneratorDef::baseClass
property. This should allow users to generate a common parser / serializer for the basic types, which take up a lot of space, or even implement their own version (without sscanf
etc.), and subsequent parsers / serializers would inherit from it.
Add a new data type - a byte array represented by a base64 string in the JSON.
I have realized that when skipEmptyFields
is true, there is no way to output null
into the JSON. A possible solution for this very niche use case would be to add a NullType
class that would be used for std::nullptr_t
and would always serialize as null
and when parsed, it would just throw TYPE_MISMATCH
if the value is anything else. I would like to add this mainly because it's a pretty elegant use of the available nullptr_t type.
If certain parts of the JSON tree don't have a static structure or we don't care about their structure but want to preserve it, it might be useful to provide a special string type to store the subtree without parsing it, and writing it into output JSON as-is. This would also allow users to use another JSON parser for these portions or delay the parsing until it's requested. There could be an option to preserve / strip whitespace formatting for these portions.
Add an option to allow values of enum
variables not corresponding to a named value. These values would be serialized / parsed as simple integers.
A comprehensive test project should be prepared to verify that everything works correctly and no new bugs are introduced with additional changes. It should be ready before the release of version 1.0.
Some compilers may have problems with expressions such as std::vector<std::vector<int>>(orig)
due to potentially interpreting the two closing brackets as a shift operator. To maximize portability of generated code, it should be ensured that a space is added in such scenarios.
Currently, if parse / serialize functions for different types end up with the same name, the conflict is resolved by simply adding an underscore at the end of the function name until it is unique. This needs to be improved. I also think that multiple underscores in a row have a special meaning so it may even be an error. On the other hand, I don't think these cases would happen too often, and more or less only in cases such as pair of types std::string
and StdString
.
If the program fails, it is pretty hard to guess why without using a debugger. Try to provide some useful information regarding the cause of the failure.
Add a public
static const char * errorString(Error error);
to parsers and serializers if enabled in Settings
. Also get rid of listing all error types in multiple places while at it, instead putting them in a list or a macro iterator, e.g.
json-cpp-gen/src/ParserGenerator.cpp
Lines 296 to 304 in bb1a666
Add the possibility to define custom number types (integer or real), e.g. a dynamic-sized "big integer" type. The parser API could look like this:
clear
- sets $S
to zeroappendDigit
- appends (decimal) digit $X
to the whole part ot $S
- equivalent to 10*$S+$X
, $E is VALUE_OUT_OF_RANGE
error statementappendFractionalDigit
- appends $I
-th (decimal) fractional digit $X
to $S
, if left blank, the type is assumed to be integer-onlysetExponent
- multiplies $S
by 10 to the power of $X
makeNegative
- changes the value to negative, guaranteed to be called at the endOr, instead of the last two, there could be finalize
with arguments for sign and exponent.
Support parsing simple macros such as
#define STRING_TYPE std::string
#define MAX_ARRAY_LENGTH 256
Do not bother with macros with arguments, nested macros, etc.
E.g.
struct Foo {
struct Bar bar;
};
With the currently available data, the program could not only generate parsers and serializers, but also the JSON schema of the root structures. Add schemas
array on the same level as parsers
and serializers
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.