Comments (17)
I believe that discussion is happening here, if it's of interest: irods/irods_client_rest_cpp#160
from irods_client_library_rirods.
Ok, so when you say chunked, you're talking about manually sending parts of the file, not HTTP's chunked encoding. That's good.
I noticed you're passing truncate=0
. Try changing it to truncate=false
and seeing if it works.
from irods_client_library_rirods.
oh my this works: truncate=false
from irods_client_library_rirods.
how big is the maximum buffer?
can it be increased... should it be a parameter?
what is the downside to increasing such a value?
from irods_client_library_rirods.
Should buffer size be defined in the header of the curl request? If so, then I can include another parameter or define a usefull value. I am actually also unsure how one knows what buffer size is needed and what's it for anyway.
from irods_client_library_rirods.
It sounds like the count
parameter is what you want.
It is specified as part of the URL just like logical-path
. See https://github.com/irods/irods_client_rest_cpp#stream
&count=<integer>
The /stream endpoint is meant to be used like POSIX read/write functions. You must make multiple calls to stream large amounts of data. The count
parameter tells the endpoint how many bytes are to be read or written.
As for the size of the buffer, larger values for count
mean fewer network calls. Smaller values for count
mean more network calls. Try starting with a count of 8192 bytes and increasing it to see what kind of performance you get. The value you land on can be the default. The user can then choose to override that if they feel it is too small or large.
from irods_client_library_rirods.
I see. I did set it to 1000 by default, so I will change that but leave it accessible for the user.
from irods_client_library_rirods.
I made an example mostly written in a bash script with only a piece of R that creates an 10,000 row-sized matrix. This mimics about what the R function does but I can't seem to fix count
in such a way that this works without exceeding the buffer.
#!/bin/sh
# get token
export SECRETS=$(echo -n rods:rods | base64)
export TOKEN=$(curl -X POST -H "Authorization: Basic ${SECRETS}" http://localhost/irods-rest/0.9.3/auth)
# R object
Rscript -e "foo <- matrix(1:10000); saveRDS(foo, 'foo.rds')"
# url encode with php
export LPATH=$( php -r "echo urlencode('/tempZone/home/rods/foo.rds');"; )
# create file
curl -X PUT -H "Authorization: ${TOKEN}" \
-H "Accept-Encoding: gzip, deflate, br" \
-d @foo.rds \
"http://localhost/irods-rest/0.9.3/stream?logical-path=${LPATH}&offset=0&count=8192"
# delete file
rm foo.rds
from irods_client_library_rirods.
How big is 'foo.rds' on the disk? That is the number that 'should' work for the count=
.
from irods_client_library_rirods.
How big is 'foo.rds' on the disk? That is the number that 'should' work for the
count=
.
That is true if it is small. The REST API allocates a buffer of size count
bytes. If count
is too large, the server could throw a std::bad_alloc
exception.
offset
and count
must be used to read/write large files.
from irods_client_library_rirods.
Does that mean R has to chop the foo.rds
file in pieces and then send the pieces over the REST API to one and the same location?
from irods_client_library_rirods.
That is correct.
The C++ REST API does not support parallel transfer yet, so the speed of a transfer will depend on the size of the file. That is a known issue and will be worked on in a future release of the API.
from irods_client_library_rirods.
I am now able to upload larger object with chunking. You can see here that local and irods file sizes match:
library("rirods")
rirods:::local_create_irods()
iauth("rods", "rods")
# big object
foo <- matrix(1:100000)
# save locally
saveRDS(foo, "foo.rds")
# check size
file.size("foo.rds")
#> [1] 212424
# put in irods
iput(foo)
# check size on irods
ils(stat = TRUE)
#> logical_path status_information.last_write_time
#> 1 /tempZone/home/rods/foo.rds 1670143173
#> status_information.size type
#> 1 212424 data_object
But something goes wrong during the upload, and on closer inspection by looking at the raw vector, it seems like that the chunks in the front get overwritten with 00 00 00
bytes. Whereas the last chunk represents the source truthfully.
Any ideas?
Created on 2022-12-04 with reprex v2.0.2
from irods_client_library_rirods.
So only the last part of the data is correct?
How is the chunked transfer implemented in rirods?
Is there a loop?
What does the HTTP request look like?
What API parameters are being set on the request?
from irods_client_library_rirods.
Technically a loop is at work, which outputs n times this response:
<- HTTP/1.1 200 OK
<- Server: nginx/1.23.1
<- Date: Tue, 06 Dec 2022 16:54:23 GMT
<- Content-Length: 33
<- Connection: keep-alive
<- Access-Control-Allow-Origin: *
<- Access-Control-Allow-Headers: *
<- Access-Control-Allow-Methods: AUTHORIZATION,ACCEPT,GET,POST,OPTIONS,PUT,DELETE
The request would look like this (2 steps at the end of the loop/file):
-> PUT /irods-rest/0.9.3/stream?logical-path=%2FtempZone%2Fhome%2Frods%2Fbaz.rds&offset=206356&count=3034&truncate=0 HTTP/1.1
-> Host: localhost
-> User-Agent: httr2/0.2.2 r-curl/4.3.3 libcurl/7.68.0
-> Accept: */*
-> Accept-Encoding: deflate, gzip, br
-> Authorization: <REDACTED>
-> Content-Length: 3034
-> PUT /irods-rest/0.9.3/stream?logical-path=%2FtempZone%2Fhome%2Frods%2Fbaz.rds&offset=209390&count=3034&truncate=0 HTTP/1.1
-> Host: localhost
-> User-Agent: httr2/0.2.2 r-curl/4.3.3 libcurl/7.68.0
-> Accept: */*
-> Accept-Encoding: deflate, gzip, br
-> Authorization: <REDACTED>
-> Content-Length: 3034
from irods_client_library_rirods.
We've made a tweak to the C++ REST API to allow larger HTTP requests. Before the change, the /stream endpoint for PUT was limited to about 4096 bytes (not good for uploads).
If that seems important to have for your talk, you can try building a package with that change. The PR is available at the following:
Generating a package can be accomplished using the Docker builder provided by the project. See the instructions at the following:
And here's instructions for running with Docker Compose.
from irods_client_library_rirods.
We're going to change that in the next release to accept 0/1 instead of true/false.
from irods_client_library_rirods.
Related Issues (20)
- proposal for iget and iput behavior HOT 7
- new ichksum function HOT 2
- iquery should not interpret GenQuery column names HOT 2
- new function: info HOT 1
- Authentication to try out the package HOT 20
- Document multiple operations in `imeta()`
- Metadata columns in wrong order when some item has no metadata
- Should `icd()` behave like `setwd()` but on iRODS? HOT 1
- Allow relative logical path in `ils()`
- Create dedicated GitHub workflows for the R-CMD check and HTTP snapshots
- iput/iget should not write temp files to disk before streaming to iRODS HOT 2
- CRAN review HOT 2
- To-do HOT 2
- new logo - spacing and color HOT 7
- update to use the new iRODS HTTP API HOT 5
- Pin irods REST (HTTP) API version to package version
- Release rirods 0.1.2 HOT 3
- Make GitHub repo more discoverable. HOT 3
- rirods 0.1.2 has no GitHub release HOT 6
- Link to multiple APIs HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from irods_client_library_rirods.