GithubHelp home page GithubHelp logo

chlumsky / json-cpp-gen Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 2.0 465 KB

A generator of JSON parser & serializer C++ code from structure header files

License: Other

CMake 0.27% C++ 99.73%
code-generation json parser serializer

json-cpp-gen's People

Contributors

chlumsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

json-cpp-gen's Issues

Configurable line endings

Currently the files outputted by the program are always saved with Unix-style line endings (LF). Instead, a global lineEndings setting should be added to configuration with possible values NATIVE, LF, CRLF. and possibly a value representing "same as configuration JSON file", although I'm not sure about the last one.

Get rid of sscanf / sprintf

Not only are these functions archaic and overly complex (format string parsing etc.) but I have found that sscanf in particular is extremely slow. This needs to be done away with ASAP. However, the implementation for floats is very complex. There should also be some way for the user to select an implementation or provide their own functions for number (de)serialization.

Provide string length to serializer write function

The serializer function void write(const char *str); could have the length of str passed as a second argument, since it is a literal in the majority of the cases. This could very slightly improve performance since string length is almost surely computed when appending to the output JSON string. String API would have to be updated to allow appending a string with known length.

Always put const at the beginning

To avoid errors, I have put the const keyword for references of uknown types to the safe spot after the type name. However, this is somewhat inconsistent and it is a general convention to have it at the beginning whenever possible. Since pointer and reference types are not supported and static array types do not use references (this part is resolved), I believe it would actually be safe to put const back at the beginning without changing anything else. It also needs to be updated in the standard library API's and the Readme file. Is it a good idea though? How about normalizing it the other way around (always putting const in this position)?

enums not working as desired?

(Linux, gcc 9.4.0)

The project looks promising, I'd rather use a generator for serialization instead of code annotation and other weird hacks.

I'm using an enum as a struct field and it is not serializing / parsing as desired. The field is being ignored completely or always parsed as zero.

The desire is to get this:

{"_allkeys":[
{"_act":"NextImage","_keyval":11},
{"_act":"NextImage","_keyval":12},
{"_act":"PrevImage","_keyval":13}
]}

But I'm getting this:

{"_allkeys":[
{"_keyval":11},
{"_keyval":12},
{"_keyval":13}
]}

Note the act_ field, which is an enum, is missing.

I attach what I hope is a complete set of files to demo the problem.

actionKeysTest.zip

Support enum base type

Support enumerations with an explicit base type, e.g.

enum Foo : short {
    BAR
};

Simply skip the part between enum name and the opening brace.

Namespace alias support

Apparently it is possible to alias namespaces as

namespace standard_library = std;
namespace schrono = std::chrono;

Header parser would fail if this was encountered.

typedef support

Parse the typedef keyword and treat it as a valid type. Examples:

typedef std::string StringType;
typedef int FooString;

struct Foo {
    typedef StringType FooString;
    FooString bar;
};

Foo::bar is std::string and not int!

Support raw string literals

If a "raw" string literal (R"(text)") was present in the parsed header files, its range may not be properly detected at least in some cases and it may break the header parser. Make sure this cannot happen.

Custom general type

Another idea I had was to support miscellaneous types in the structures, in different ways:

Converted type

This way, a type in the structure would be converted from / to a type supported by the generator. For example a timestamp type, represented by a string in the JSON, and able to be converted from / to a std::string. Each type like this would have an API template for both conversions.

Custom parse / serialize implementation

Another possibility would be that the custom type would have to provide an implementation of the parse and serialize functions of itself. This option is just an idea and probably should not be implemented.

String conversion

A sort-of hybrid between the two previous options (and raw JSON string #32) would be a type that would parse / serialize itself but not directly. Instead the parser / serializer would first construct a string object and use that to interface with the custom type's API. This would be another way to implement custom number types (#35) if they provide a conversion from / to string.

Out of order declaration

Make sure that code is generated properly even if input files or structures are not in the order they are used.

Look into

bool parseStructNamesOnly = false; // prepass in case input files are in the wrong order
which suggests that simply running the parser in two passes using this flag may be enough to resolve this.

Annotations

Add support for annotations in the input code. These would be specially-marked comments or parts of comments. They should allow:

  • Ignoring a section (structure, member, ...),
  • representing a structure member under a different key in the JSON,
  • parsing a class as if it were a struct.

Full settings support

Many Settings flags are currently ignored due to not being implemented yet. These include:

  • verboseErrors
  • checkMissingKeys
  • checkRepeatingKeys
  • nanPolicy
  • infPolicy

Format multi-statement API commands

Some of the API commands, especially iteration for the serializer, tend to result in a long string of commands on a single line (example below). This is not only ugly but also inconsistent with the format of the rest of the generated code, which is formatted very strictly. Simply adding newlines would not be a solution because the indentation would still be messed up. I think it would be best if newlines and indentation would be added as a post-process step to pattern fill.

api.iterateElements = "for (const std::pair<$U, $T> &$I : $S) { $U const &$K = $I.first; $T const &$V = $I.second; $F }";

for (const std::pair<std::string, float> &i : value) { std::string const &key = i.first; float const &elem = i.second; if (prev) write(','); prev = true; serializeStdString(key); write(':'); serializeFloat(elem); }

Implement settings with ifdefs

The way parsers and serializers are generated could be changed so that they are configurable with macro definitions rather than settings for the generator, with a notable exception of noThrow as that would result in a mess with function return types. This would also make it easier to check different versions of the generated functions as all would be visible simultaneously. The settings can be kept for default macro values, e.g.

#ifndef JSON_CPP_STRICT_SYNTAX_CHECK
#define JSON_CPP_STRICT_SYNTAX_CHECK 0
#endif
// In actual function:
#if JSON_CPP_STRICT_SYNTAX_CHECK
// ...

For checkMissingKeys and checkRepeatingKeys, this would work as

if (buffer == "firstKey") {
    JSON_CPP_KEY_ENCOUNTERED(0, 0x00000001)
    parseXYZ(value.firstKey);
    continue;
}

with JSON_CPP_KEY_ENCOUNTERED being defined at the beginning of the file to either nothing, just flagging doneKeys, or also checking to throw a REPEATED_KEY based on the configuration macros.

Name aliases in configuration

Since #1 will not be implemented for a while, there needs to be another way to resolve cases where JSON field names (or even enumeration values) are not valid C++ names, e.g. protected (reserved keyword) or value-with-dashes, or when the user wants the JSON names to be different for whatever reason.

Ignore elements with unrecognized template arguments

The intended behavior is to skip any structure elements with unrecognized types. I believe this works fine if the unrecognized type is the direct type of the element, but if it is used within a template argument, the header parser fails.

Nested types of aliased structures cannot be found

For example:

struct Foo {
    struct Bar {
        int x;
    };
};

using FooAlias = Foo;

In this case, we should be able to refer to FooAlias::Bar but due to the way nested names and alias resolution is currently implemented, this is not possible.

Output position when JSON parser fails

The generated parser should report the postition within the input string if parsing fails. This is simply cur - jsonString. To achieve this, Error must be a structure containing both the error code and position. The current enumeration can be renamed to ErrorType. Proposed error structure:

struct Error {
    ErrorType type;
    int position;

    inline Error(ErrorType type = OK, int position = -1) : type(type), position(position) { }
    operator ErrorType() const;
    operator bool() const;
};

For serializers, reporting the source of error would be tricky because it is an element within a structure. Providing a pointer to the faulty element would be possible but probably not too helpful, because it isn't enough to easily find it within the structure tree. Still, serializers' error enumeration should also be renamed to ErrorType for consistency (with a possibility of typedef ErrorType Error;.

Deduplicate code in object map container type

The source code of the following functions is exatly the same between the classes ObjectContainerType and ObjectMapContainerType:

  • generateParserFunctionBody
  • generateSerializerFunctionBody
  • generateClear
  • generateRefByKey
  • generateIterateElements

However, each class must have a different base class. I think this could be fixed with templates and if that fails, simply convert these to static in one class and expose them for the other to use.

using support

Support the using keyword for types, e.g.

struct Foo {
    using String = std::string;
    String bar;
};

Similar to #9. Probably don't bother with templated using.

Automatically create directories for output files

Currently, if the generated files are supposed to go in directories that do not exist yet, the program will fail, unable to create the file. Perhaps, instead the directory structure could be automatically created.

using namespace support

Detect and honor using namespace x; when encountered, so that users can write e.g. string instead of std::string.

Common parser string buffer

It may be slightly beneficial for performance if the generated parser class had a common string buffer member variable instead of temporaries in individual functions, namely:

// TODO make key a class member to reduce the number of allocations
body += indent+generator->stringType()->name().variableDeclaration("key")+";\n";

// TODO make str a class member to reduce the number of allocations
body += indent+generator->stringType()->name().variableDeclaration("str")+";\n";

Incorrect error?

I had a syntax error in my struct as so:

struct Foo {
   enum Bar _bar;
};

and the error code in json-cpp-gen trying to parse this was "INVALID_STRUCT_SYNTAX". Should that be "INVALID_ENUM_SYNTAX"?

[HeaderParser.cpp, parseEnum(), line 242]

Anonymous structure support

For example:

struct Foo {
    struct {
        int x;
    } bar;
};

Or perhaps even

struct Foo {
    struct {
        int x;
    };
};

where x would be referred to as Foo::x.

Probably do together with #4.

Allow generation of header-only code

Make it possible to generate parsers and serializers in inlined form that doesn't need to have its own translation unit. The user should be able to pick between the following modes:

  • Header + source file (current output)
  • Single header with implementation directly inlined in class definition
  • Single header with implementation directly following the class definition in the same file
  • Definition header + inline implementation header which is included at the bottom of the first one

Type aliases in configuration

Add a section to the configuration file for "typedefs", (e.g. artery::MemInt = std::ptrdiff_t). This is especially useful before #9 is implemented, but even then it may be good for types included from libraries or conditional aliases.

First official release

Release the first official stable version along with a binary. Depending on the state the project is in, it could be version 1.0.0, or something like 0.9.0.

Support pair / tuple

Add support for pair and tuple types like std::pair, represented as a JSON array. The main challenge for std::tuple would be the need to support variadic templates in header parser and custom type definition.

Common parser / serializer base class

Implement the Configuration::GeneratorDef::baseClass property. This should allow users to generate a common parser / serializer for the basic types, which take up a lot of space, or even implement their own version (without sscanf etc.), and subsequent parsers / serializers would inherit from it.

Support nullptr_t

I have realized that when skipEmptyFields is true, there is no way to output null into the JSON. A possible solution for this very niche use case would be to add a NullType class that would be used for std::nullptr_t and would always serialize as null and when parsed, it would just throw TYPE_MISMATCH if the value is anything else. I would like to add this mainly because it's a pretty elegant use of the available nullptr_t type.

Raw JSON string element

If certain parts of the JSON tree don't have a static structure or we don't care about their structure but want to preserve it, it might be useful to provide a special string type to store the subtree without parsing it, and writing it into output JSON as-is. This would also allow users to use another JSON parser for these portions or delay the parsing until it's requested. There could be an option to preserve / strip whitespace formatting for these portions.

Integer enumeration fallback

Add an option to allow values of enum variables not corresponding to a named value. These values would be serialized / parsed as simple integers.

Test suite

A comprehensive test project should be prepared to verify that everything works correctly and no new bugs are introduced with additional changes. It should be ready before the release of version 1.0.

Add space inbetween multiple closing template brackets

Some compilers may have problems with expressions such as std::vector<std::vector<int>>(orig) due to potentially interpreting the two closing brackets as a shift operator. To maximize portability of generated code, it should be ensured that a space is added in such scenarios.

Improve function name collision resolution

Currently, if parse / serialize functions for different types end up with the same name, the conflict is resolved by simply adding an underscore at the end of the function name until it is unique. This needs to be improved. I also think that multiple underscores in a row have a special meaning so it may even be an error. On the other hand, I don't think these cases would happen too often, and more or less only in cases such as pair of types std::string and StdString.

Improve error reporting

If the program fails, it is pretty hard to guess why without using a debugger. Try to provide some useful information regarding the cause of the failure.

Add error to string conversion to parsers & serializers

Add a public

static const char * errorString(Error error);

to parsers and serializers if enabled in Settings. Also get rid of listing all error types in multiple places while at it, instead putting them in a list or a macro iterator, e.g.

code += std::string(INDENT INDENT)+Error::JSON_SYNTAX_ERROR+",\n";
code += std::string(INDENT INDENT)+Error::UNEXPECTED_END_OF_FILE+",\n";
code += std::string(INDENT INDENT)+Error::TYPE_MISMATCH+",\n";
code += std::string(INDENT INDENT)+Error::ARRAY_SIZE_MISMATCH+",\n";
code += std::string(INDENT INDENT)+Error::UNKNOWN_KEY+",\n";
code += std::string(INDENT INDENT)+Error::UNKNOWN_ENUM_VALUE+",\n";
code += std::string(INDENT INDENT)+Error::VALUE_OUT_OF_RANGE+",\n";
code += std::string(INDENT INDENT)+Error::STRING_EXPECTED+",\n";
code += std::string(INDENT INDENT)+Error::UTF16_ENCODING_ERROR+",\n";

Custom number type

Add the possibility to define custom number types (integer or real), e.g. a dynamic-sized "big integer" type. The parser API could look like this:

  • clear - sets $S to zero
  • appendDigit - appends (decimal) digit $X to the whole part ot $S - equivalent to 10*$S+$X, $E is VALUE_OUT_OF_RANGE error statement
  • appendFractionalDigit - appends $I-th (decimal) fractional digit $X to $S, if left blank, the type is assumed to be integer-only
  • setExponent - multiplies $S by 10 to the power of $X
  • makeNegative - changes the value to negative, guaranteed to be called at the end

Or, instead of the last two, there could be finalize with arguments for sign and exponent.

Basic macro support

Support parsing simple macros such as

#define STRING_TYPE std::string
#define MAX_ARRAY_LENGTH 256

Do not bother with macros with arguments, nested macros, etc.

Generate JSON schema

With the currently available data, the program could not only generate parsers and serializers, but also the JSON schema of the root structures. Add schemas array on the same level as parsers and serializers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.