GithubHelp home page GithubHelp logo

memDecompress error about aws.s3 HOT 22 CLOSED

cloudyr avatar cloudyr commented on August 19, 2024
memDecompress error

from aws.s3.

Comments (22)

Serenthia avatar Serenthia commented on August 19, 2024 1

Thanks! 0.2.5 looks perfect πŸ‘

from aws.s3.

leeper avatar leeper commented on August 19, 2024

This looks like it might be a bug. Are you able to get the object as a raw vector using get_object(object = "SAMHDA/RAWdata/vcat.08-14.rds", bucket = "chek1")?

from aws.s3.

achekroud avatar achekroud commented on August 19, 2024

Yeah, the command executes. I wasn't sure what the output does/means though.

from aws.s3.

leeper avatar leeper commented on August 19, 2024

I am unable to reproduce this. Given that you can read the file using get_object(), it seems it is probably an issue with the file rather than with this package. I'm closing for now. Feel free to open a new issue or follow-up here if you continue to experience issues.

from aws.s3.

yasminlucero avatar yasminlucero commented on August 19, 2024

I had this exact behavior as well. Notably, the RDS that failed was a large file (85MB). The s3readRDS worked fine on a small file (1KB). Oh, and I verified that I can read the file via other means (s3 fs mount). So, there is no reason to expect that the file is corrupt.

big.test <- s3readRDS(object = "bigtest.RDS", bucket = "grv-myexamplebucket")

Error in memDecompress(from = as.vector(r), type = "gzip") : 
  internal error -3 in memDecompress(2)

big.test.raw <- get_object(object = "bigtest.RDS", bucket = "grv-myexamplebucket")

  attr(big.test.raw, 'content-type')
[1] "application/octet-stream"
  attr(big.test.raw, 'content-length')
[1] "88697837"

I haven't figured out yet how to parse the raw object.

The error is on line 6045 ish: https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/main/connections.c

from aws.s3.

vicmayrink avatar vicmayrink commented on August 19, 2024

I'm experiencing exactly the same issue. Did you find any solution?

from aws.s3.

mjpdenver avatar mjpdenver commented on August 19, 2024

Likewise - I get the same response as yasminlucero trying to read an RDS file.

Thanks

from aws.s3.

fanghaolei avatar fanghaolei commented on August 19, 2024

I'm experiencing the exact same issue. It appears to me that this memDecompress error only occurs when I sync some .rds file to a bucket via aws CLI tool first and then try to download it with S3readRDS().

Thanks!

from aws.s3.

ieaves avatar ieaves commented on August 19, 2024

I have no idea if this is related to the issue everyone else is seeing but in my use case s3saveRDS requires headers=list("x-amz-server-side-encryption" = "AES256") like so:

s3saveRDS(my_object, bucket=my_bucket, object=my_file_name, headers=list("x-amz-server-side-encryption" = "AES256"))

however, attempting to use s3readRDS with the same headers results in the cryptic memDecompress error.

Removing the headers from the readRDS call like s3readRDS(bucket=my_bucket, object=my_file_name) allowed me to successfully load from s3.

from aws.s3.

leonawicz avatar leonawicz commented on August 19, 2024

I am experiencing the same issue with package version aws.s3_0.2.2.

First I tried to use s3readRDS on .rds files I had previously uploaded to an AWS S3 bucket using the S3 web GUI uploader. This give the same memDecompress error noted above. I can always read the raw vector with get_object.

The second way I did this was to use put_object to upload .rds files to my bucket. Trying to load such a file with s3readRDS results in the same error.

The third way I tried was to upload rds files to my bucket strictly using the s3saveRDS wrapper. Only if uploaded in this manner can I then subsequently load .rds files using s3readRDS.

I am not sure what is different about these files based on method of upload. I was hopeful that at least the second approach using put_object on local .rds files for uploading would have been a solution, because it is analogous to the approach I have to use for uploading .RData files, using put_object directly instead of s3save (see issue #128)

For the time being, it seems that uploading strictly via s3saveRDS will avoid the reading errors with s3readRDS. Not ideal, but this is working for me. And at least at a glance (haven't fully tested) doing so fortunately does not appear to lead to file size bloat like in the above referenced issue.

Regards,
Matt

from aws.s3.

leeper avatar leeper commented on August 19, 2024

@leonawicz Can you give this a try on the latest version from GitHub?

from aws.s3.

leonawicz avatar leonawicz commented on August 19, 2024

I can confirm with the latest github version aws.s3_0.2.4 I can load an object into R via s3readRDS regardless of which of the three methods of upload to AWS I'd previously used: upload R object directly with s3saveRDS, upload a previously saved (using base saveRDS) local .rds file via put_object, or upload previously saved .rds file using the AWS GUI uploader utility.

from aws.s3.

Serenthia avatar Serenthia commented on August 19, 2024

FYI, this change has meant that I can't read any binary files I previously saved to S3 with the old method, which is a breaking change as far as I'm concerned.

Re-uploading them with the new s3saveRDS method means they then can be read, however I can't do this for thousands of past files...

from aws.s3.

leeper avatar leeper commented on August 19, 2024

@Serenthia what error do you get when trying to read a previously uploaded RDS?

from aws.s3.

leonawicz avatar leonawicz commented on August 19, 2024

@leeper I also noticed just now that I could no longer read .rds files uploaded with the previous package version. I had to delete them all from AWS and reupload before I could read them with the newer package version s3readRDS. The error is:

Error in readRDS(tmp) : unknown input format

This occurs trying to read older .rds files. Newer ones are fine. It seems somehow the file created was dependent on the aws.s3 package version. Hopefully, it was a bug unique to the old version? I'm unsure why when reading a .rds file with s3readRDS it would matter how it was created and uploaded to AWS. But for some reason it seems to matter with which package version the file was made.

from aws.s3.

Serenthia avatar Serenthia commented on August 19, 2024

Can confirm that that's the same behaviour and error message that I'm experiencing. Thanks for the reopen!

from aws.s3.

leeper avatar leeper commented on August 19, 2024

Okay, I think I've tracked this down to being a decompression issue. Just to confirm that you're experiencing it the same way (@Serenthia, @leonawicz), if you do this for one of the older files:

o <- get_object("s3://yourbucket/yourobject")
unserialize(memDecompress(o, "gzip"))

Do you get back what you expect?

from aws.s3.

Serenthia avatar Serenthia commented on August 19, 2024

@leeper Yes - using that, I can successfully read a file that returns the unknown input format error using readRDS.

from aws.s3.

leeper avatar leeper commented on August 19, 2024

Okay, I've tracked this down to the previous behavior being a bug (specifically, serialize() sets xdr = TRUE by default (writing to big endian), which is (basically) never what we want. The current behavior is correct and more consistent with using saveRDS() and readRDS() directly.

However, because it would be annoying to figure this out for a given file, s3readRDS() now tries to read and then tries to unserialize if that fails, so it should work on both older (incorrect) and new files.

Let me know if not and I'll continue to patch.

from aws.s3.

drorata avatar drorata commented on August 19, 2024

What about non-RDS files? I fail to load a compressed JSON from S3.

from aws.s3.

drorata avatar drorata commented on August 19, 2024

I found the workaround to be something like:

read_gzip_json_from_s3_to_df <- function(path) {
  #' Read a single gzipeed JSON file from S3 location into a dataframe
  #'
  #' The compressed JSON should contain a single object per line
  #' with no commas of array structure wrapping the objects
  #'
  #' @param path S3 location of an object; e.g. s3://my-bucket/some/folders/file.json.gz
  raw_data <- path %>% get_object %>% rawConnection %>% gzcon %>% jsonlite::stream_in() %>% jsonlite::flatten()
  raw_data
}

from aws.s3.

dmaupin12 avatar dmaupin12 commented on August 19, 2024

I just had this happen on a fairly large dataset as well. The following code is how i upload to server. Is there a better way to do this to avoid this happening in the future?

tmp <- tempfile()
saveRDS(full_data, tmp)
put_object(tmp, object = paste0(s3_path,"full_data.rds"), show_progress = TRUE, multipart = TRUE)

from aws.s3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.