GithubHelp home page GithubHelp logo

Comments (25)

keenerd avatar keenerd commented on July 16, 2024

Bump.

I've gotten a bug report in my program (jshon) from this one. For example

# test case
jshon -e test <<< '{"test":200.123456789}'
# ver >= 2.1 (JSON_ENCODE_ANY)
200.12345678899999
# ver < 2.1
200.123457

Neither of these outcomes are good.

from jansson.

akheron avatar akheron commented on July 16, 2024

So you'd like to decode the value as a string? Or is the imprecise decimal representation the problem?

from jansson.

keenerd avatar keenerd commented on July 16, 2024

I would like to skip decoding altogether and access the original string. The imprecise decimal notation is only a symptom of why it is needed. Jshon does not do any math - it is only supposed to extract and spit out chunks of json. (Arguably one could attach a precision field to the json_t struct and dig this value out when it comes time to printf and generate a printf format string on the fly. But that seems silly.)

from jansson.

akheron avatar akheron commented on July 16, 2024

Ok. But this decoding flag would give you a string, i.e. "200.123456789", not a special value that's a number but whose contained value is a string. So you would lose the information that the value in JSON input was a number.

from jansson.

keenerd avatar keenerd commented on July 16, 2024

That seems not good. And breaks a chunk of the json spec, since the type information would be screwed up. Two ideas of the top of my head...

Two more int fields in the structs. These int fields would contain the index and length of substring from the original input that the decoded value was derived from. Probably accessed by a json_get_source() function? Hacky, but less hacky than denying that numbers exist at all.

A new non-numerical number type. It would be neither int nor float, just javascript-style "number". Internally represented as a string, up to the user to decode however they want. This also makes the people who want a single JS-style numerical type happy. Probably enabled by a parse flag?

from jansson.

akheron avatar akheron commented on July 16, 2024

Adding a precision field to the json_real_t struct sounds the best option this far, only used by the decoder and encoder (if set).

from jansson.

keenerd avatar keenerd commented on July 16, 2024

Darn. I was hoping for a cleaner fix, something that did not involve throwing more patches onto a string -> number -> string conversion, by way of using sprintf to dynamically generate format string for printf. Yuck.

To that end, I've got half of the code put together for a new SNUMBER type at https://github.com/keenerd/jansson

Only the code for basic loading and dumping has been written. It is missing the almost all the helper functions (setting, deleting, etc) and is a little kludgy because the parse flags are not available in the lexing stage (can't have a TOKEN_SNUMBER). It does build, but I have not been able to test it properly because cmake refuses to build shared .so libraries. Figuring out how to make cmake build these was more complicated that writing the prototype SNUMBER code.

from jansson.

akheron avatar akheron commented on July 16, 2024

Introducing a new type for this use case doesn't sound so good. The ultimately correct fix would be to replace sprintf("%.17g") with the algorithm in David Gay's dtoa.c or similar.

from jansson.

keenerd avatar keenerd commented on July 16, 2024

For an example of how another library does this: yajl-tree simply stores the original string and lets you access it directly: https://github.com/lloyd/yajl/blob/master/src/api/yajl_tree.h#L81

from jansson.

akheron avatar akheron commented on July 16, 2024

Yeah, this is also an option. But allocating the extra memory to store input strings of every number doesn't sound good for people using this on embedded devices. It could of course be enabled by a decoding flag to make it optional.

Would if be beneficial to do this also for strings? They have many possible input forms because any Unicode code point can be escaped with \u or represented directly in UTF-8.

from jansson.

 avatar commented on July 16, 2024

Because of the decimal precision issue we made a change to our copy of Jansson to allow all numbers to be treated as strings. This might not be efficient for many use cases but for ours it's actually more efficient. In brief:

  • New flag JSON_DECODE_NUMBER_AS_STRING causes json_load* to decode all numbers as strings.
  • New int field in json_string_t to indicate if a string is a number. Set to true when decoding if the source field is a number. When encoding, if true json_dump* will encode the string as a number (no quotes).
  • New function json_string_is_number() returns true if a string is number.
  • New function json_string_set_is_number() to set the "string is a number" indicator.

If there's interest in adding this to Jansson let me know and I'll create a pull request.

from jansson.

nertpinx avatar nertpinx commented on July 16, 2024

@dbelliveau If it's still possible to submit your patches as PR, then it might help with #425 (my guess is that it's taking similar approach but it comes from a different source).

from jansson.

davebelliveau avatar davebelliveau commented on July 16, 2024

Yes I can do that, but I probably won't get to it until the weekend.

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

@akheron any reason you are against creating a new json_type for this (maybe JSON_RAW)? My complaint about the current patches is that it expands storage of json_string_t. I realize adding a new type will require more care but I think it's worth the effort. One thing I noticed about the patch @keenerd created is that JSON_SNUMBER is added to the middle of the enum, it would need to be last on the enum otherwise ABI would change. Naming it JSON_RAW instead of JSON_SNUMBER would leave us the option to use that same type to support raw / non-decoded strings as well (possible micro-optimization for some use cases).

from jansson.

nertpinx avatar nertpinx commented on July 16, 2024

For us new type would work as well

from jansson.

akheron avatar akheron commented on July 16, 2024

I'm not against adding a new type if it's not only for working around quirks of floating point formatting.

What are the use cases that we're trying to support here? I think I've lost track :)

Supporting numbers that don't have a native representation ("bignums") come to my mind, but are there other use cases?

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

In addition to dealing with numbers that cannot be reliably round-trip parsed then dumped I could see JSON_RAW being useful for certain resource constrained situations / performance critical code paths. A few ways I could see it used:

  • Not converting numbers to native representation as you mentioned.
  • Not decoding / unescaping strings.
  • A limited recursion json_load.
{
  "jsonrpc": "2.0",
  "id": "123",
  "method": "request-from-client",
  "params": {"dest": "server2", "payload": {"key": "value"}}
}

I can't directly post a good example JSON so assume that the params.payload is an object with 1000 keys or an array with 1000 elements. By default we parse all sub-objects and arrays. This can use large amounts of memory for stuff that does not need to be split apart. I believe JSON_RAW could be used to store payload items which do not need to be understood.

I think more thought is required to decide how to control json_load recursion. Maybe we could have a few flags like JSON_DECODE_RECURSIVE_0 or JSON_DECODE_RECURSIVE_1 that would serve most needs in a simple way. Another way might be an interface like json_pack / json_unpack where a format string would control how contents are parsed, or maybe even a callback based interface. In any case I think this would be a follow-up, I'm just making the case that JSON_RAW would have additional uses beyond handling numbers that cannot be represented natively.

from jansson.

akheron avatar akheron commented on July 16, 2024

Alright, sounds good.

Having a JSON_RAW value, how does the user know what it contains (number/string/etc)?

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

I was thinking that JSON_RAW would use json_string_t in the background and thus would not store further information about the contents. In theory we could provide a function json_type json_raw_typeof(json_t *json) identify the contents by checking the first character of a JSON_RAW:

  • t or f means it is a boolean
  • n means it is a null
  • " means it is a string
  • { means it is an object
  • [ means it is an array
  • -0123456789 would be a number
  • Anything else is an error

We would have json_raw support functions matching the json_string functions, the create/set functions would function slightly differently than json_string functions. Create/set would strip white-space at the start / end but not within.

  • json_raw() / json_raw_set would perform a validation only parse with JSON_DECODE_ANY enabled.
  • _nocheck variants would not perform any validation but would still trim white-space.

The following code would print strcmp: 0:

json_t *raw = json_raw(" \n\t[ 1, 2\n\t ] \n\t");
printf("strcmp: %d\n", strcmp(json_raw_value(raw), "[ 1, 2\n\t ]"));

from jansson.

akheron avatar akheron commented on July 16, 2024

Would it be useful to have an option to not store anything at all in JSON_RAW? That would make the JSON structure non-encodable, but on the other hand it would save memory if only a part of the input is interesting for the user.

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

I hadn't thought of that, I'm not sure. My idea was that JSON_RAW would be used where the value is not locally interesting, but still needs to be saved to disk or forwarded to clients / other servers for processing.

from jansson.

akheron avatar akheron commented on July 16, 2024

Well, I guess it can be added later if needed.

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

@akheron I've done some work on an actual implementation of the JSON_RAW type, I think that the correct way to determine what a JSON_RAW contains would be to parse it with JSON_DECODE_ANY and look at the result. In some cases you could make assumptions based on the code used to create the object (you might know it's only possible for certain types to be within the JSON_RAW fields).

I've posted #446 with the first attempt implementation of JSON_RAW.

from jansson.

AllenX2018 avatar AllenX2018 commented on July 16, 2024

Is there any progress on this issue? @coreyfarrell Are you still working on the new json raw type?

from jansson.

coreyfarrell avatar coreyfarrell commented on July 16, 2024

#446 is the last I've worked on this. Unfortunately I've been busy with other commitments lately. Until I get some critical feedback on my PR it's unlikely to see any further progress.

from jansson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.