GithubHelp home page GithubHelp logo

jeroen / jsonlite Goto Github PK

View Code? Open in Web Editor NEW
369.0 17.0 37.0 7.16 MB

A Robust, High Performance JSON Parser and Generator for R

Home Page: http://arxiv.org/abs/1403.2805

License: Other

R 32.16% TeX 18.15% C 48.84% C++ 0.84%
json r rstats parser

jsonlite's Introduction

jsonlite

A Robust, High Performance JSON Parser and Generator for R

CRAN_Status_Badge CRAN RStudio mirror downloads

A reasonably fast JSON parser and generator, optimized for statistical data and the web. Offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API. The implementation is based on the mapping described in the vignette (Ooms, 2014). In addition to converting JSON data from/to R objects, 'jsonlite' contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.

Have a look at the vignette to get started!

Code of Conduct

Please note that the jsonlite project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

jsonlite's People

Contributors

aekoroglu avatar ameliamn avatar batpigandme avatar cat-zeppelin avatar coolbutuseless avatar duncantl avatar gaborcsardi avatar george-chamoun avatar hongooi73 avatar isomorphisms avatar jasonelaw avatar jeroen avatar johnnygenomics avatar kbroman avatar krlmlr avatar maarten-vermeyen avatar maelle avatar michaelchirico avatar nfultz avatar ralmond avatar saraemoore avatar shrektan avatar wch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonlite's Issues

UTF-8 character not handled correctly

Example:

url <- "http://api.trakt.tv/show/episode/summary.json/7a708cd94fa162478591660d981e2daa/lost/3/14"
t1 <- content(GET(url), , as = "text", encoding = "UTF-8")
> jsonlite::fromJSON(t1)$episode$title
[1] "Expos\xe9"
> rjson::fromJSON(t1)$episode$title
[1] "Exposé"

The API itself returns "Expos\xe9", and httr::content seems to use jsonlite to parse the returnes JSON, so content() without as = "text" also returns the wrong string.

Currently I'm using rjson::fromJSON in this scenario as a workaround, but I would much rather be able to rely on only one JSON package.

Oh, and I should probably also note that of course

> jsonlite::fromJSON(url)$episode$title
 [1] "Expos\xe9"

Also returns the escape sequence instead of the character.
And yes, my locale is set to en_US.UTF-8 all the way.

can it work on windows ?

Trying to install jsonlite on windows, I get an error

install_github("jeroenooms/jsonlite")

.... gives ....

`      -----------------------------------
* installing *source* package 'jsonlite' ...
Warning: running command 'sh ./configure.win' had status 127
ERROR: configuration failed for package 'jsonlite'
* removing 'C:/Users/famille/AppData/Local/Temp/Rtmp0kndNo/Rinstd2c505d1de/jsonl
ite'
      -----------------------------------
ERROR: package installation failed
Erreur : Command failed (1)

nested matrix

Add unit tests for

x <- data.frame(foo=1:2);
x$bar <- matrix(1:6, 2);
toJSON(x)

More precision in decimals?

Hi there,

I was wondering how to get more precision in decimals converted toJSON from R. I have 5-10 decimal places in values in a list, but the JSON coming back only has 2 decimal places.

There's probably an option somewhere to control this?

Thanks!

flattening data frames

Need more testing for flattening. E.g.

test1 <- fromJSON("https://api.github.com/users/hadley/repos")
test2 <- fromJSON("https://api.github.com/users/hadley/repos", flatten=TRUE)
names(test1)
names(test2)

Handling of JSON attributes of 'Object' type

It seems to me that 'jsonlite' doesn't correctly handle JSON attributes of 'Object' type. The following is a reproducible example, which is essentially the same as in the first of my two relevant e-mail messages to Jeroen on 03/04/2014.

For a particular JSON record in AngelList API reply, we have:

{
    "id":341570,
    "hidden":false,
    "community_profile":false,
    "name":"BrainControl",
    "angellist_url":"https://angel.co/braincontrol-1",
    "logo_url":"https://s3.amazonaws.com/photos.angel.co/startups/i/341570-c6f28e9934a51d0cdc05cf425932ccab-medium_jpg.jpg?buster=1392223054",
    "thumb_url":"https://s3.amazonaws.com/photos.angel.co/startups/i/341570-c6f28e9934a51d0cdc05cf425932ccab-thumb_jpg.jpg?buster=1392223054",
    "launch_date":null,
    "quality":1,
    "product_desc":"BrainControl is the world's first pure HTML5 full-featured Bitcoin wallet. It does not store any Bitcoin on any device and can be embedded into anything with an internet connection and LocalStorage capability. \n\nNo need to download the blockchain - get started instantly. \n\nNo server side scripting required - can be hosted anywhere.",
    "high_concept":"HTML5 Bitcoin Wallet without Server",
    "follower_count":7,
    "company_url":"http://braincontrol.me",
    "created_at":"2014-02-12T16:37:41Z",
    "updated_at":"2014-02-18T00:37:57Z",
    "crunchbase_url":null,
    "twitter_url":null,
    "blog_url":null,
    "video_url":"",
    "markets":[
            {
               "id":59,
               "tag_type":"MarketTag",
               "name":"open source",
               "display_name":"Open Source",
               "angellist_url":"https://angel.co/open-source"
            },
            {
               "id":93839,
               "tag_type":"MarketTag",
               "name":"bitcoin",
               "display_name":"Bitcoin",
               "angellist_url":"https://angel.co/bitcoin"
            }
         ],
    "locations":[
            {
               "id":2114,
               "tag_type":"LocationTag",
               "name":"kuala lumpur",
               "display_name":"Kuala Lumpur",
               "angellist_url":"https://angel.co/kuala-lumpur"
            }
         ],
    "company_type":[

         ],
    "status":{
            "id":125125,
            "message":"Uploaded our first investor deck - http://braincontrol.me/presentations/bc-deck-01.pdf - see the story behind the HTML5 Bitcoin Wallet",
            "created_at":"2014-02-17T12:53:25Z"
         },
    "screenshots":[
            {
               "thumb":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/fb5da2ddedfb3404ba8eb0b04e98313d-thumb_jpg.jpg",
               "original":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/fb5da2ddedfb3404ba8eb0b04e98313d-original.jpg"
            },
            {
               "thumb":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/b2929bbe752b77a008a2a9f5a7a0431f-thumb_jpg.jpg",
               "original":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/b2929bbe752b77a008a2a9f5a7a0431f-original.jpg"
            },
            {
               "thumb":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/082afcb835a9225626405e0c99546004-thumb_jpg.jpg",
               "original":"https://s3.amazonaws.com/screenshots.angel.co/f8/341570/082afcb835a9225626405e0c99546004-original.jpg"
            }
         ],
    "fundraising":{
            "round_opened_at":"2014-02-12",
            "raising_amount":100000,
            "pre_money_valuation":500000,
            "discount":null,
            "equity_basis":"equity",
            "updated_at":"2014-02-18T00:17:45Z",
            "raised_amount":0.0
         }
}

With the same JSON record as above, let us consider ‘jsonlite’ functionality in analyzing how field ‘status’ is converted. According to original JSON data, record “id: 341570 – BrainControl” has status with id = 125125. However, after fromJSON() and first rbind.fill() (please see my code here: https://github.com/abnova/diss-floss/blob/master/import/getAngelListData.R), I expect ‘status’ to be a data frame, belonging to the corresponding BrainControl’s record (id: 341570). However, its ‘status’ data is scattered across multiple startups, like this:

id: 344277 – c(NA, NA, NA, 125125, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)

id: 343878 – c(NA, NA, NA, "Uploaded our first investor deck - http://braincontrol.me/presentations/bc-deck-01.pdf - see the story behind the HTML5 Bitcoin Wallet", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)

id: 342820 – c(NA, NA, NA, "2014-02-17T12:53:25Z", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) 

id: 341570 – EMPTY field

While, for id: 341570, I expect the following data:

c(125125, "Uploaded our first investor deck - http://braincontrol.me/presentations/bc-deck-01.pdf - see the story behind the HTML5 Bitcoin Wallet", "2014-02-17T12:53:25Z")

...EXACTLY each fourth value from the previous three records. It feels to me like somebody carried out the ‘transpose’ operation on the data...

Regards,
Aleksandr Blekh

Error with SSL

R version 3.1.0
jsonlite version 0.9.7

After:
bitcurex <- fromJSON("https://pln.bitcurex.com/data/orderbook.json")
I have result:
Błąd wfunction (type, msg, asError = TRUE) :
error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure

It looks like standard SSL error. How to avoid it with fromJSON function?

Handling of Scalars

I read your vignette in detail and I understand the reasoning behind the decision to convert even scalars to vectors in JSON, to ensure consistency. For example

jsonlite::toJSON(1)
# "[ 1 ]"

However, this makes it impossible to use jsonlite to create JSON where a scalar is expected and NOT an array.

The RJSONIO package handles this by allowing the user to pass a parameter called container that changes default behaviour. SO

RJSONIO::toJSON(1)
# "[ 1 ]"
RJSONIO::toJSON(1, container = F)
# " 1"

Is it possible to add such an option to jsonlite. I really like jsonlite and I am considering switching from RJSONIO to jsonlite for rCharts. This is the only limitation I am encountering so far, and would appreciate if you had some thoughts on how to handle this.

Automatically validate in fromJSON

In some cases, presumably when it is obvious, fromJSON throws the error "String does not seem to be valid JSON". In other less-obvious cases, like when quotation marks are unbalanced, fromJSON does not error and instead just leaves much of the structure unparsed. Example:

good.json <- '{"a":{"b":"c","d":"e"}}'
fromJSON(good.json)
# as expected:
# $a
# $a$b
# [1] "c"
# 
# $a$d
# [1] "e"


## unbalance by turning "b" into "b"
bad.json <- '{"a":{"b:"c","d":"e"}}'
fromJSON(bad.json)
# should error, instead:
# $a
# list()

box()

Could we have the complement of unbox() to use when auto_unbox = TRUE ?

Character mapped to array - how to avoid this

I'm trying to build JSON for a request to the Adobe Analytics API, and am having trouble generating the required format with jsonlite's toJSON method.

Here is a very simple example of the JSON I am trying to generate.

{
    "reportDescription":{
        "reportSuiteID":"my_reportsuite_id"
    }
}

I am trying to generate this using:

report.description <- c()
report.description$reportDescription <- c(data.frame(matrix(ncol=0, nrow=1)))
report.description$reportDescription$reportSuiteID <- "my_reportsuite_id"

cat(jsonlite::toJSON(report.description,pretty=TRUE))

But that converts the character (reportSuiteID) to a single element array, which the API doesn't handle.

I have also tried:

report.description <- list(reportDescription="my_reportsuite_id")
cat(jsonlite::toJSON(report.description,pretty=TRUE))

Again, that converts the character (reportSuiteID) to a single element array.

Is there any way to force toJSON to treat characters as strings rather than converting them to arrays?

Other than this problem, I'm finding the library really useful. Saves loads of time, and results are much easier to deal with than with other R JSON packages.

Possible uncovered json structure?

I loaded the package vignette with the proposed mapping and then I tried to obtain, by the toJson function, this structure: {"span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 }}}
I used nested data frame, but I'm unable to achieve the structure above. Is there a method to achieve this or it's an uncovered structure?

R proxy settings not respected on Windows

On Windows, the ‘fromJSON()’ function doesn’t seem to respect the proxy settings in R.

Example:

library(jsonlite)
setInternet2()
d = fromJSON("http://data.ssb.no/api/v0/dataset/49623.json?lang=no")

The file is downloaded directly, not using the proxy settings. It's perhaps easiest to see the effect if you change the IE proxy settings to an invalid proxy server.

(The actual result for me is that the file is not downloaded at all, and results in this error message:
Error in function (type, msg, asError = TRUE) : couldn't connect to host
The reason is that our corporate firewall blocks all download requests that don't go via our proxy server.)

However, downloading the file using the HTTP support in R works:

l = readLines("http://data.ssb.no/api/v0/dataset/49623.json?lang=no",
            encoding="UTF-8", warn=FALSE)
d = fromJSON(l)

So if jsonlite would use the built-in download feature of R, as shown above, everything would fine.

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] jsonlite_0.9.12

NA in data frames

Add more tests to check if the na argument traverses properly through deeply nested structures.

Should have some way to control precision of numbers

It would be really useful to be able to control the precision of numbers, not just the number of digits after the decimal point. This is especially important when working with small numbers. (This is blocking rstudio/shiny#606.)

compare <- function(x, ...) {
  resr <- jsonlite::minify(RJSONIO::toJSON(x, ...))
  resj <- jsonlite::minify(jsonlite::toJSON(x, dataframe="columns",
            null="null", na="null", auto_unbox=TRUE, ...))

  if (!identical(resr, resj)) {
    cat(paste0("Results differ.",
      "\nRJSONIO:  ", resr,
      "\njsonlite: ", resj
    ))
  } else {
    cat(resj)
  }

  invisible()
}


compare(
  c(1.2323123231, 432.2344e5, 5.12345678901234e19, 0.00000012345),
  digits = 2
)
# Results differ.
# RJSONIO:  [1.2,4.3e+07,5.1e+19,1.2e-07]
# jsonlite: [1.23,43223440,5.12345678901234e+19,1.23e-07]

It looks like this is what's happening:

  • For numbers >1, digits controls the max number of digits after the decimal.
  • For numbers <1, digits controls the precision -- that is, the number of digits after the decimal in the mantissa.

I don't think it really makes sense to have these two different behaviors depending on the size of the number.

To me, the RJSONIO behavior makes more sense in general, although @jeroenooms, I know you made the conscious decision to move away from it. It would be good to at least have a precision argument (mutually exclusive with digits) that yields in that behavior.

Setting class in C

Use SET_CLASS in C to set the json class to prevent R from making copies of the json blob.

fromJSON no longer returns data.frames in 0.9.7

The following lines used to return data.frames in <= 0.9.6:

strings <- fromJSON("http://assets.wurstmineberg.de/json/strings.json")
achievements <- fromJSON("http://assets.wurstmineberg.de/json/achievements.json")
deaths <- fromJSON("http://api.wurstmineberg.de/server/deaths/latest.json")
itemData <- fromJSON("http://assets.wurstmineberg.de/json/items.json")  

But since I upgraded to 0.9.7, all these return lists now, which is… not exactly what I wanted, since I chose jsonlite specifically over RJSONIO and rjson for my project because I could not handle the output as lists. I have made multiple attempts to recreate the old behavior via simplifyDataFrame or flatten, but right now my only option seems to be to manually try recreating the old output from lists, which sadly is kind of above my head (or at least, difficult and unpleasant).

Is this a bug? Am I missing something? Or is my inability to handle lists just finally catching up with me?

Any hints regarding my problem would be appreciated.

EDIT: I fixed my problems as of now, but I'm still curious about the changed behaviour.

How to encode NULL

Currently NULL is encoded as {}. The reasoning behind this is that json null is reserved for NA values, and in R, NULL is technically an empty pairlist:

> identical(NULL, pairlist())
[1] TRUE

However more recent versions of jsonlite, the null values get to NA only in the context of an atomic vector:

> fromJSON('{"x":null, "y":[1,null,3]}')
$x
NULL

$y
[1]  1 NA  3

Same holds for data frames:

> fromJSON('[{"x":null, "y":"bar"}]')
   x   y
1 NA bar

So we could consider encoding NULL as null rather than {}. The benefit if this is that it is better reversible and compatible with RJSONIO and rjson. One disadvantage is that we need a work-around for the edge case toJSON(NULL), because null in itself is not valid json.

unbox() recursively

unbox() is very useful to generate JSON strings. I think it would be great if unbox() can process list elements recursively.

It is somehow tiresome to generate JSONs with multiple nested single values like this:

{
    "jsonrpc": "2.0",
    "method": "method1",
    "params": {
        "param1": "value1",
        "param2": "value2"
    },
    "auth": "auth1",
    "id": 1
}

In order to generate this by jsonlite, we have to write many unboxes:

> toJSON( list(
+     jsonrpc = unbox("2.0"),
+     method = unbox("method1"),
+     params = list(
+         param1 = unbox("value1"),
+         param2 = unbox("value2")
+     ),
+     auth = unbox("auth1"),
+     id = unbox(1)
+ ))
{"jsonrpc":"2.0","method":"method1","params":{"param1":"value1","param2":"value2"},"auth":"auth1","id":1}

or use some *apply function that applies unbox() recursively.

> x <- list(
+     jsonrpc = "2.0",
+     method = "method1",
+     params = list(
+         param1 = "value1",
+         param2 = "value2"
+     ),
+     auth = "auth1",
+     id = 1
+ )
> x <- rapply(x, function(a) if(length(a) == 1) unbox(a) else a, how="list")
> toJSON(x)
{"jsonrpc":"2.0","method":"method1","params":{"param1":"value1","param2":"value2"},"auth":"auth1","id":1} ```

I need recursive option that works like this:

> toJSON( unbox(x, recursive = TRUE) )
{"jsonrpc":"2.0","method":"method1","params":{"param1":"value1","param2":"value2"},"auth":"auth1","id":1} 

nested data.frame as data.table

I have quite complex result in json like this: https://blockchain.info/address/1DP2zBa3X561sRLUcFAh9emEeQXkqkHphA?format=json
It is parsed as list with data.frame with data.frame with data.frame.
Is there any trick (option?) I can make to have all those data.frame already stored as data.table?

Kind of simple setDT() (man) on each created data.frame by jsonlite.

The bests would be to have it during the parsing to do not have overheat related to recursive checking whole parsed json with data.frames.

If there is nothing special we can do maybe I can make a FR for an option like that:
options("jsonlite.on.data.frame" = my_custom_postprocess_fun_to_be_applied_on_all_DF())
?

Feature request: toJSON method for environments

Hi Jeroen,

and kudos for jsonlite!!

Quick question: would it be possible to implement a method for jsonlite::toJSON (or jsonlite::asJSON) that transforms nested environment structures to JSON (instead of data frames and/or lists)?

I kind of like environments a lot because of their pass-by-reference feature and sometimes they seem to be the better choice over lists and data frames (e.g. for custom options, registries [sorry, the README is not fully refactored yet; see the last unit tests for setShinyReactive for my currently best approach w.r.t. reactivity] etc.).

See my current approach to breaking everything down to a list again before I feed it to jsonlite::toJSON: https://github.com/Rappster/nestr/blob/master/R/toJson.r. Works just fine, but I'm sure you would have some awesome Rcpp Kung-Fu at hand to make that much more efficient ;-)

Thanks a lot should you ever consider this. If not: then thanks for jsonlite and openCPU in any event! ;-)

toJSON/fromJSON asymmetry

Detailed demonstration of the bug: google groups

Long story short:

  • Inside R: list(list(d=data.frame,m=matrix),list(d=data.frame,m=matrix))
  • (R -> JS) list(list(d=data.frame,m=matrix),list(d=data.frame,m=matrix)) (same)
  • (JS -> R) no mods, R sees data.frame(m=list(list,list),d=list(data.frame,data.frame))

handling attributed vector

First of all, thanks very much for the package, i like it more than RJSONIO, etc..

Current implementation of jsonlite ignores the name attributes of a vector, to see this, consider

> cat(jsonlite::toJSON(c(a=1,b=2)))
[ 1, 2 ]

The resulting json is an array.
Comparing to RJSONIO, which keep the attributes and change the output to a dictionary.

> cat(RJSONIO::toJSON(c(a=1,b=2)))
{
 "a":      1,
"b":      2 
}

It may be useful to add an option use.name.
If use.name is off, output [ 1, 2 ], otherwise output { "a": 1, "b": 2}. RJSONIO actually has a similar option .withNames.

vignette and S3 objects/classes

In the vignette, it's written that jsonlite is S3-based, and that "Users in R can easily extend this system by implementing additional toJSON methods for other classes."
This doesn't seem to be the case, however.
The toJSON method itself is not generic (either S3 or S4), and the delegation to the (hidden) asJSON method relies on the S4 class system.
Should the vignette be updated to reflect this?

AsIs in toJSON

Is there a reason for not handling AsIs in toJSON, or it was just left out by mistake?

library(httr)
library(jsonlite)
M <- content(GET("http://rpkg.igraph.org/Matrix/all"), as="text")
toJSON(fromJSON(M, simplifyVector=FALSE))
# Error: No method asJSON S3 class: AsIs

A smaller example:

foo <- structure(list(Package = "Matrix", releases = structure(list(), class = "AsIs")), 
               .Names = c("Package", "releases"))
toJSON(foo)
# Error: No method asJSON S3 class: AsIs

unsupported S3 class: table

@jeroenooms, it seems that table is not supported:

> asJSON(table(mtcars$am))
Error: could not find function "asJSON"

While:

> rjson::toJSON(table(mtcars$am))
[1] "{\"0\":19,\"1\":13}"
> RJSONIO::toJSON(table(mtcars$am))
[1] "[\n19, 13\n]"

Is it intentional? Also, wouldn't it be a good idea to have toJSON.default for not supported classes?

validate error messages appear later as attributes in parsed JSON

When using jsonlite::validate on an invalid json character string, the resulting error message can appear later in the parsed JSON for valid false JSON data.

For example, when running this code:

library(jsonlite)
sessionInfo()
fromJSON('false') # This works just fine
validate('')      # Validate that '' is not valid JSON
fromJSON('false') # This now picks up the error message from validate('')
fromJSON('true')  # But this is fine

The output produced is:

> library(jsonlite)

Attaching package: ‘jsonlite’

The following object is masked from ‘package:utils’:

    View

Warning message:
package ‘jsonlite’ was built under R version 3.1.2
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] jsonlite_0.9.13
> fromJSON('false') # This works just fine
[1] FALSE
> validate('')      # Validate that '' is not valid JSON
[1] FALSE
attr(,"err")
[1] "parse error: premature EOF\n                                       \n                     (right here) ------^\n"
> fromJSON('false') # This now picks up the error message from validate('')
[1] FALSE
attr(,"err")
[1] "parse error: premature EOF\n                                       \n                     (right here) ------^\n"
> fromJSON('true')  # But this is fine
[1] TRUE
>

Why is toJSON slow in jsonlite?

I did a quick benchmark of speeds of toJSON conversions across packages. Here is the code

library(microbenchmark)
microbenchmark(
  jsonlite::toJSON(mtcars),
  RJSONIO::toJSON(mtcars),
  rjson::toJSON(mtcars),
  times = 10
)

The results seem to indicate that jsonlite is much slower as compared to the other two packages. Is this because of the more verbose serialization?

Unit: microseconds
                     expr        min         lq      median         uq        max neval
 jsonlite::toJSON(mtcars) 137884.341 142062.506 145352.0815 150982.395 177053.330    10
  RJSONIO::toJSON(mtcars)   1187.170   1280.926   1301.4975   1487.285   2240.849    10
    rjson::toJSON(mtcars)    226.272    230.432    271.4605    275.248    325.430    10

C code for special case of numeric vector

Optimized asJSON C code for numeric vectors: https://gist.github.com/wch/562de64335cf986322d4

Some thoughts:

  • About 3x to 4x speedup for numeric vectors.
  • Supports both types of NA encoding.
  • Has somewhat different behavior on switching to scientific notation than the R version. Currently does not use options(scipen).
  • Perhaps try to decouple the converting from the collapsing so that we can have a vectorized version as well, and get consistent number formatting for row an column based data frame encoding.

Add encoding argument to fromJSON

Is it possible to allow users to specify encoding in fromJSON()? Sometimes I have to download a json file from remote and explicitly readLines(encoding = "UTF-8") and use fromJSON() to correctly read that file, otherwise the resulted list cannot be correctly read directly.

It would be very convenient to support encoding in fromJSON(), or there are better solutions for this?

UTF-8 data not handled correctly for local files

The help page for ‘fromJSON()’ says that UTF-8 is used by default. This doesn’t seem to be true for local files. Using the JSON file http://data.ssb.no/api/v0/dataset/49623.json?lang=no downloaded as a local file, I get this result on my Windows version of R:

d = fromJSON("c:/tmp/49623.json")
d$dataset$source

[1] "Statistisk sentralbyrå"

The last word should ‘sentralbyrå’. The ‘Ã¥’ is what happens if you interpret the UTF-8 sequence for ‘å’ as an ISO 8859-1 sequence (the å character take up two bytes in UTF-8).

R itself can correctly handle UTF-8 on Windows, as shown by the following code:

l = readLines("c:/tmp/49623.json", encoding="UTF-8", warn=FALSE)
gsub(".*(Statistisk[^\"]+)\".*", "\\1", l)

[1] "Statistisk sentralbyrå"

If you change the "UTF-8" to "ISO-8859-1" in the above code, you get the same (incorrect) result as returned by ‘fromJSON()’. The ‘fromJSON()’ uses readLines internally, but doesn’t specify the encoding argument (on Linux, this will probably work, but not on Windows). A default of "UTF-8" should be reasonable (at least much better than the platform default), as ~all downloaded JSON files will be in UTF-8.

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] jsonlite_0.9.12

pretty = TRUE produces invalid JSON if newline character '\n' appears in Key

Here's a quick example:

text_with_spaces <- c(
  "A long name 

    with a newline  
    or two",
  "some other name"
)

x <- data.frame(1:5, 5:1)
names(x) <- text_with_spaces

Without pretty = TRUE it works fine, jsonlite::toJSON(x) will produce valid JSON:

[{"A long name \n  \n    with a newline  \n    or two":1,"some other name":5},{"A long name \n  \n    with a newline  \n    or two":2,"some other name":4},{"A long name \n  \n    with a newline  \n    or two":3,"some other name":3},{"A long name \n  \n    with a newline  \n    or two":4,"some other name":2},{"A long name \n  \n    with a newline  \n    or two":5,"some other name":1}]

But adding pretty = TRUE strips out the escaped newline characters and replaces them with real spaces:

[
    {
        "A long name 

    with a newline  
    or two" : 1,
        "some other name" : 5
    },
    {
        "A long name 

    with a newline  
    or two" : 2,
        "some other name" : 4
    },
    {
        "A long name 

    with a newline  
    or two" : 3,
        "some other name" : 3
    },
    {
        "A long name 

    with a newline  
    or two" : 4,
        "some other name" : 2
    },
    {
        "A long name 

    with a newline  
    or two" : 5,
        "some other name" : 1
    }
]

This behaviour only happens for keys and not values:

y <- data.frame(col1 = text_with_spaces, col2 = 1:2)
jsonlite::toJSON(y)

produces

[{"col1":"A long name \n  \n    with a newline  \n    or two","col2":1},{"col1":"some other name","col2":2}]

and jsonlite::toJSON(y, pretty = TRUE) produces:

[
    {
        "col1" : "A long name \n  \n    with a newline  \n    or two",
        "col2" : 1
    },
    {
        "col1" : "some other name",
        "col2" : 2
    }
]

Which leaves the escaped newline characters intact.

"successful" parsing of invalid JSON

Both RJSONIO and jsonlite will parse invalid JSON, and return "valid" R objects.
I tend to think that any invalid JSON string (e.g. those that fail the validator at http://jsonlint.com) should result in an error when calling fromJSON.
Example to replicate:

> library(jsonlite)
> fromJSON("{foobar:barfoo}")
$ooba
NULL

This code block fails on two issues:
(1) The JSON object's key is invalid, since it's not a properly quoted string.
The first and last character of the string at the key's location are used in lieu of quotation marks, but this behavior seems very risky and introduces the potential for many downstream parsing/syntax bugs.
(2) The JSON object's value is also invalid, since it's not a properly quoted string.
Here, rather than using the first & last characters as pseudo-quotation characters (as done when parsing the key), the parser simply decides it's bad JSON and returns NULL.
I'd argue this behavior is closer to what might be expected, but should still (IMHO) throw and error instead (or at the very least, a warning).

While I understand the desire to keep parsing flexible to help with rapid development of R programs, perhaps a compromise would be to have an optional argument specifying a "strict" mode.
For example:

> fromJSON("{foobar:barfoo}", strict = FALSE)

would behave as above, but

> fromJSON{"{foobar:barfoo}", strict = TRUE)

would result in an error.

Parsing JSONP

fromJSON(RCurl::getURL("http://live.nhle.com/GameData/GCScoreboard/2014-11-10.jsonp"))
Error in parseJSON(txt) : lexical error: invalid char in json text.
                                       loadScoreboard({"games":[{"usna
                     (right here) ------^

Add scalar function

How about adding:

scalar <- function(x) {
  stopifnot(is.atomic(x))
  class(x) <- c("scalar", class(x))
  x
}

to use instead of auto_unbox.

Handling of Unicode characters

See how the behavior of rjsonlite differs from rjson in the following case:

A short snippet of code that demonstrates the problem:

> library(jsonlite)
> nchar(fromJSON("{ \"S\": \"L\\u00e9vis\"\n        }")$S)
Error in nchar(fromJSON("{ \"S\": \"L\\u00e9vis\"\n        }")) : 
  invalid multibyte string 1

See how ‘rjson’ handles the same string, which does not cause the error:

> library(rjson)
> fromJSON("{ \"S\": \"L\\u00e9vis\"\n        }")
$S
[1] "Lévis"
> detach("package:rjson", unload=TRUE)
> library(jsonlite)
> fromJSON("{ \"S\": \"L\\u00e9vis\"\n        }")
$S
[1] "L\xe9vis"

It is worth mentioning that RJSONIO has the exact same issue, so this might be located in the code that rjsonlite shares with it.

New versions of 'jsonlite' can't handle embedded newline characters

New versions of 'jsonlite' (since version 0.9.12) can't parse JSON values with embedded newline characters. Here's minimal reproducible example:

json <- '[{"request": "impRQ", "SQL": "
SELECT *
FROM theTable
WHERE a < 100
"}]'

# correct behavior (older versions)
fromJSON(json)
  request                                        SQL
1   impRQ \nSELECT *\nFROM theTable\nWHERE a < 100\n

# incorrect behavior (versions 0.9.12+)
fromJSON(json)
Error in parseJSON(txt) : lexical error: invalid character inside string.
          [{"request": "impRQ", "SQL": " SELECT * FROM theTable WHERE 
                     (right here) ------^

Switch to stream parser in yajl

Currently, we first parse the entire json document using yajl_tree_parse before converting it to an R list. However it might be faster and more memory efficient to use yajl callbacks to create the R objects immediately while parsing the json tree, in a similar way as the prettify functions.

jsonlite and dates

I'm using jsonlite to send data from a database via PHP to openCPU.
How would you approach automatic handling of dates?

  1. In PHP I know which fields are dates and times etc. I could transmit that metadata as another JSON object and use it to transform dates. Seems like a lot of overhead.
  2. I can do some string parsing in R and coerce matches. Because all my dates follow a known format, it doesn't need to be super-flexible. Seems very expensive to do...
  3. Ideally, I'd like to mark up my dates in JSON in a way that is recognisable by jsonlite, but that doesn't seem possible yet. Could I do it somehow make it easier using lists?

simplify() returns list, when data frame is expected

I have the following snippet of code, using jsonlite package:

convData <- jsonlite::fromJSON("http://pastebin.com/raw.php?i=EhpsgKdA",
                               simplifyVector = FALSE)
myResult <- jsonlite:::simplify(convData, simplifyDataFrame = TRUE)

By using simplifyDataFrame = TRUE, I was expecting that this simplify call should return a data frame. However, it appears that, depending on the contents of convData, the call actually returns either data frame, or list. For the case of returning a list, myData consists of the following (validated) JSON data. I'd appreciate a clarification of jsonlite's behavior in this situation.

Nested Tables

First of all congrats on creating such a wonderful package Jeroen! I very much enjoyed reading the mapping between JSON and R vignette and I'd like to chat about nested tables section (2.4.5). You say:

In R, the only way to model [a one-to-many] relation is as a column containing a list of data frames, one separate data frame for each row

I think a list of data frames also works (provided you haven't lost parent-to-child relations) and fits better into the tidy data framework (cc @hadley). In my XML2R package, I achieve this with the combination of three verbs. To demonstrate, I'll borrow your one-to-one example and extend it to one-to-many in XML format.

library(XML2R)
o <- XML2Obs("https://gist.githubusercontent.com/cpsievert/85e340814cb855a60dc4/raw/651b7626e34751c7485cff2d7ea3ea66413609b8/mariokart.xml")
# o is a named (flat) list of one row matrices which allows
# us to 'recycle key(s)' from parents to children
o <- add_key(o, parent = "mariokart//driver", recycle = "name")
# collapse observations with the same name into a common matrix
collapse_obs(o)
$`mariokart//driver`
     name     occupation
[1,] "Bowser" "Koopa"   
[2,] "Peach"  "Princess"

$`mariokart//driver//vehicle`
     speed weight XML_value           name    
[1,] "55"  "25"   " Wario Bike "      "Bowser"
[2,] "40"  "67"   " Piranha Prowler " "Bowser"
[3,] "54"  "29"   " Royal Racer "     "Peach" 
[4,] "50"  "34"   " Wild Wing "       "Peach" 

In the JSON case, I don't see an easy way achieve this using jsonlite. Perhaps there is a reasonable solution and I just don't see it, but this doesn't seem right to me:

jsonlite::fromJSON("https://gist.githubusercontent.com/cpsievert/b55ac4a210842da78ed9/raw/f744d44e834e54295977accf9bf828710d2f892a/mariokart.json")
  driver occupation                                    vehicles
1 Bowser      Koopa Wario Bike, Piranha Prowler, 55, 40, 25, 67
2  Peach   Princess      Royal Racer, Wild Wing, 54, 50, 29, 34

Provided there isn't already a good solution, do you think this idea is worth bringing into jsonlite? Clearly it wouldn't work the same as XML2R since there is no equivalent of a tag in JSON, but maybe you just always recycle the first key/value pair to any existing children?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.