irods / irods_client_library_rirods Goto Github PK
View Code? Open in Web Editor NEWrirods R Package
Home Page: https://rirods.irods4r.org
License: Other
rirods R Package
Home Page: https://rirods.irods4r.org
License: Other
With the current behavior of ils()
, the default value is the current collection, but if the user offers a different path it has to be an absolute path. A relative path throws an error.
library(rirods)
library(tibble)
create_irods("http://localhost/irods-rest/0.9.3", "/tempZone/home", overwrite = TRUE)
iauth('rods', 'rods')
ils()
#> logical_path type
#> 1 /tempZone/home/rods/collection collection
#> 2 /tempZone/home/rods/foo.rds data_object
ils('collection')
#> Error in `resp_abort()`:
#> ! HTTP 400 Bad Request.
#> • Logical path [collection] is not accessible.
#> Backtrace:
#> ▆
#> 1. └─rirods::ils("collection")
#> 2. └─rirods:::irods_rest_call("list", "GET", args, verbose)
#> 3. └─httr2::req_perform(req)
#> 4. └─httr2:::resp_abort(resp, error_body(req, resp))
#> 5. └─rlang::abort(...)
ils('/tempZone/home/rods/collection')
#> This collection does not contain any objects or collections.
Created on 2023-03-17 with reprex v2.0.2
This is inconsistent, in that the default '.'
is in itself relative to the current collection.
I would suggest replacing the following if-chain in ils()
:
irods_client_library_rirods/R/navigation.R
Lines 142 to 147 in 9aace28
icd()
already has some functionality to deal with relative paths (but see #24): if it is transferred to an internal function that finds the right logical path based on the argument, it can be used by both icd()
and ils()
for a more robust implementation.
Currently, icd()
(1) has no default value, although the documentation says so and (2) does not accept a trailing slash in path names.
The first point is consistent with the behavior of setwd()
but not with the iCommands behavior of icd
or the command-line behavior of cd
, which defaults to the top-most level ('/zone/home/' for iRODS). That's what the documentation of icd()
suggests, but, as the example below shows, icd()
currently throws an error instead.
The second point is more concerning: based on the behavior of setwd()
, icd()
and cd()
, a trailing slash should make no difference. Below it's shown that using a relative path (either "collection_name" or "..") or an absolute path (e.g. "/tempZone/home/rods") works well if there are no trailing slashes, but fails with a trailing slash. ipwd()
doesn't catch the issue because it returns the absolute version of the provided path, but ils()
shows that the contents are not correct.
library(rirods)
create_irods("http://localhost/irods-rest/0.9.3", "/tempZone/home", overwrite = TRUE)
iauth('rods', 'rods')
ipwd()
#> [1] "/tempZone/home/rods"
icd() # this fails because `dir` is not provided
#> Error in icd(): argument "dir" is missing, with no default
icd('.') # this brings us to where we already are
ipwd()
#> [1] "/tempZone/home/rods"
ils() # current content of top collection
#> logical_path type
#> 1 /tempZone/home/rods/collection collection
#> 2 /tempZone/home/rods/foo.rds data_object
icd('collection') # this brings us to 'collection' inside our pwd
ipwd()
#> [1] "/tempZone/home/rods/collection"
ils()
#> logical_path type
#> 1 /tempZone/home/rods/collection/subcollection collection
icd('../') # this fails
ipwd() # it's not obvious here because the right path is returned
#> [1] "/tempZone/home/rods/"
ils() # the contents are not correct: this is not the right path!
#> This collection does not contain any objects or collections.
icd('/tempZone/home/rods/') # absolute path is not the solution: trailing slash is the problem
ipwd()
#> [1] "/tempZone/home/rods/"
ils()
#> This collection does not contain any objects or collections.
icd('/tempZone/home/rods') # with no trailing slash, results are correct
ils()
#> logical_path type
#> 1 /tempZone/home/rods/collection collection
#> 2 /tempZone/home/rods/foo.rds data_object
icd('collection')
ils()
#> logical_path type
#> 1 /tempZone/home/rods/collection/subcollection collection
icd('..') # with no trailing slash, it works
ipwd()
#> [1] "/tempZone/home/rods"
ils()
#> logical_path type
#> 1 /tempZone/home/rods/collection collection
#> 2 /tempZone/home/rods/foo.rds data_object
Created on 2023-03-17 with reprex v2.0.2
I'm leaving this here to report the problem, but I also offer to try to fix it if you agree that the trailing slash should be accepted (or, if it won't, then a message should be shown).
If a data object or collection carries some more metadata items, the ils
command output becomes a bit too cluttered.
> ils(path="/bobZone/home/christine/test/foo", metadata = TRUE)
logical_path
1 /bobZone/home/christine/test/foo
metadata
1 foo, key1, key2, key3, key4, key5, bar, value1, value2, value3, value4, value5, baz, , , , ,
type
1 data_object
We start reading in a text file:
helloWorld = read.delim("helloWorld.txt", sep=" ")
Content of the file:
> helloWorld
[1] Hello World.
<0 rows> (or 0-length row.names)
Now we store the file helloWorld.txt (not the variable) in iRODS:
iput("helloWorld.txt")
ils()
logical_path type
1 /bobZone/home/christine/test collection
2 /bobZone/home/christine/foo data_object
3 /bobZone/home/christine/helloWorld.txt data_object
We rename our local file helloWorld.txt to helloWorld1.txt and download helloWorld.txt again from iRODS:
iget("helloWorld.txt")
read.delim("helloWorld.txt", sep=" ")
[1] helloWorld.txt
<0 rows> (or 0-length row.names)
I can confirm that the iput
is overwriting the content, since I get the same content of the file with the commands:
cstaiger@rirods:~$ iget helloWorld.txt
cstaiger@rirods:~$ cat helloWorld.txt
helloWorld.txt
I think the solution is to make a connection to iRODS and then stream from memory to the final destination. Although I do not know how this looks like with a connection to iRODS, locally this could look like this:
# test object
x <- matrix(1:100, 10, 10)
# serialize r object write in memory to vector (connection = NULL)
y <- serialize(x, connection = NULL)
# length of object for chunking
size_y <- length(y)
# make a file -> this should then be an object on iRODS
fil <- tempfile()
# this is an R connection (IO stream object) -> this should become a connection to iRODS REST
tmp <- file(fil)
# open the connection
open(tmp, "wb")
# chunk 1
writeBin(y[1:(size_y / 2)], tmp)
# chunk 2
writeBin(y[(size_y / 2 + 1):size_y], tmp)
# destroy connections
close(tmp)
# open connection -> this should become a connection to iRODS REST
con <- file(fil, "rb")
# read object -> back to memory
# chunk 1 (`fil` would work as well but I use a connection here as it is
# closer to the iRODS REST situation)
x1 <- readBin(con, raw(), n = size_y / 2, endian = "swap")
# chunk 2
x2 <- readBin(con, raw(), n = size_y / 2, endian = "swap")
# fuse chunks
z <- c(x1, x2)
# check if complete
all.equal(z, y)
# unserialize
unserialize(z)
I think one can include this in the Description file under "Systemrequirements".
I tried to update a metadata item (foo, bar baz) to (foo, bar, bay).
> imeta(
+ "test/foo",
+ "data_object",
+ operations =
+ list(operation = "mod", attribute = "foo", value = "bar", units = "bay")
+ )
Error in `resp_abort()`:
! HTTP 400 Bad Request.
• {"error_message":"Invalid metadata operation.","operation":{"attribute":"foo","operation":"mod","units":"bay","value":"bar"},"operation_index":0}
Run `rlang::last_error()` to see where the error occurred.
I guess it would be good for people to find the package if we add some tags; probably "r", “r-package”, and "rstats".
Read here about discoverability of packages: https://devguide.ropensci.org/maintenance_github_grooming.html
After an initial review by a CRAN member the following seems an issue:
"Please make sure that you do not change the user's options, par or working directory. If you really have to do so within functions, please ensure with an immediate call of on.exit() that the settings are reset when the function is exited. e.g.:
...
oldwd <- getwd() # code line i
on.exit(setwd(oldwd)) # code line i+1
...
setwd(...) # somewhere after
...
e.g.: R/create-irods.R"
So, we need to figure a new way to store server configuration information that is persistent. on.exit()
as suggested here would not be the solution for us as then the information is lost upon exiting the function.
I think answers can be found here: https://blog.r-hub.io/2020/03/12/user-preferences/
We probably need to set a user-level environment variable.
PRs and pushes to the repository become easier to review by separating the current workflow for R-CMD checks into a dedicated workflow for the checks triggered by pushes/PRs, and one that generates HTTP snapshots by a cron schedule.
The codecoverage badge cannot be rendered properly. This can probably be fixed If you change the name of the default branch from "master" to "main" on https://app.codecov.io/gh/irods/irods_client_library_rirods under settings.
I think iput
and iget
could be more consistent and mirror each other's behavior ... with strong defaults...
iput
should take an R variable and a target string for the name of the target data object in iRODS (with a default value of the name of the variable with an appended '.rds')
and reciprocally...
iget
should take a string of the name of the absolute or relative logical path of the data object in iRODS and return an l-value of the content to an R variable
Hmmm, but what if iput
or iget
want to just transfer a file, rather than interacting with R variables?
# puts local variable's content into iRODS - without additional .rds extension
iput(x, "logical_path")
# puts a local file into iRODS
iput("local_file", "logical_path")
and
# gets iRODS data object, tries to put the content into local variable x (what does failure look like?)
iget("logical_path", x) <-- more consistent
# OR
x <- iget("logical_path") <-- more R-ish?
# gets iRODS data object, saves to local file
iget("logical_path", "local_file")
Currently, argument x
of iget()
only allows character strings and not expressions. In the situation that one stores an R object first with iput(foo)
it would be more logical to also retrieve the object with iget(foo)
(and not iget("foo")
). This means supplying an expression and thus non-standard evaluation (nse), similar like e.g. subset(airquality, Temp > 80, select = c(Ozone, Temp))
.
Downside is that nse can make code less obvious to interpret and maintain.
In addition, the requirement of character strings for iget()
is not documented currently.
Add a new function that will get information from a new REST API endpoint, preferably prior to any authentication.
Possible function names:
This new endpoint would return zonename, server version, authentication requirements, etc.
requires upgrading to minimum REST API v0.9.3
new HTTP API: https://github.com/irods/irods_client_http_api
Two options...
A) convert all the existing calls from using the REST API to the HTTP API
OR
B) add the ability for this library to talk to the HTTP API, in addition to the REST API
It's not clear whether there will be any demand for B) once most people upgrade to the HTTP API (and iRODS 4.3.1+).
The definition and documentation of imeta()
suggest that the operations
argument requires a named list with "operation", "attribute", "value", "units".
However, it also works with a list of such lists, and the body of the function tries to determine whether it's a list or a list of lists.
I would suggest:
operations = list()
, without names.NULL
.operations
is a list of lists.If you agree, I can make the changes (at least to the documentation).
The new rirods logo is inspired, but we should update it with proper spacing and color to be in line with the intent at https://irods.org/logo
Prepare for release:
git pull
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
git push
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
git push
usethis::use_github_release()
usethis::use_dev_version()
git push
I guess this is more of an thought on the future of rirods. It was stimulated by discussing this R package: https://github.com/jspijker/ricmd made by @jspijker. Would there be a benefit of implementing interfaces to multiple iRODS APIs under a unified R package?
I would then envision a generalized rirods/ricmd (R) interface with multiple plugins for linking to the HTTP, Python, ... APIs, respectively, based on availability and user preferences. In this thought experiment some heuristics would figure out which API is available.
@jspijker, is your package still in use at the RIVM? And are you still considering to submit to CRAN?
Assign a new file name when downloading data from iRODS and when uploading data to iRODS
Case for downloading. In my workspace I already have
> list.files()
[1] "helloWorld.txt" "penguins.csv"
In iRODS I also have a file helloWorld.txt
:
> ils()
logical_path type
1 /bobZone/home/christine/test collection
...
4 /bobZone/home/christine/helloWorld.txt data_object
When I now try to download the file, I get:
> iget("helloWorld.txt")
Error: Local file aready exists. Set `overwrite = TRUE` to explicitely overwrite the object.
Is it possible to rewrite the call for get and iput and get like this:
iget("helloWorld.txt", name="helloWorld1.txt", ... <all other parameter> )
iput("helloWorld.txt", name="helloWorld1.txt", ... <all other parameters>)
Uploading larger R objects fails unexpected.
library(rirods)
rirods:::local_create_irods()
iauth("rods", "rods")
iput(iris)
#> Error in `httr2::req_perform()`:
#> ! Failed to parse error body with method defined in req_error()
#> Caused by error:
#> ! lexical error: invalid char in json text.
#> Request exceeded maximum buffer
#> (right here) ------^
#> Backtrace:
#> ▆
#> 1. ├─rirods::iput(iris)
#> 2. │ └─rirods:::irods_rest_call("stream", "PUT", args, verbose, x)
#> 3. │ └─httr2::req_perform(req)
#> 4. │ ├─httr2:::resp_abort(resp, error_body(req, resp))
#> 5. │ │ └─rlang::abort(...)
#> 6. │ │ └─rlang::is_formula(message, scoped = TRUE, lhs = FALSE)
#> 7. │ └─httr2:::error_body(req, resp)
#> 8. │ ├─rlang::try_fetch(...)
#> 9. │ │ ├─base::tryCatch(...)
#> 10. │ │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 11. │ │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 12. │ │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 13. │ │ └─base::withCallingHandlers(...)
#> 14. │ └─httr2:::req_policy_call(req, "error_body", list(resp), default = NULL)
#> 15. │ ├─rlang::exec(req$policies[[name]], !!!args)
#> 16. │ └─rirods (local) `<fn>`(`<httr2_rs>`)
#> 17. │ └─httr2::resp_body_json(resp, check_type = FALSE)
#> 18. │ └─jsonlite::fromJSON(text, simplifyVector = simplifyVector, ...)
#> 19. │ └─jsonlite:::parse_and_simplify(...)
#> 20. │ └─jsonlite:::parseJSON(txt, bigint_as_char)
#> 21. │ └─jsonlite:::parse_string(txt, bigint_as_char)
#> 22. └─base::.handleSimpleError(...)
#> 23. └─rlang (local) h(simpleError(msg, call))
#> 24. └─handlers[[1L]](cnd)
#> 25. └─rlang::abort(...)
Created on 2022-11-21 with reprex v2.0.2
Hi! I'm sorry if this is a dumb question, but I wanted to try out this package and I'm stuck at connecting with the demo REST API.
I have run the code to start the demo and get no errors:
# clone the repository
git clone --recursive https://github.com/irods/irods_demo
# start the REST API
cd irods_demo
docker-compose up -d nginx-reverse-proxy
Then I loaded the library and ran create_irods()
as indicated, with the same arguments. But when I have to authenticate with iauth()
I have no idea how to authenticate in order to get access and try this out. Is there a user-psswd combination I can use for testing? Or how can I create my own? Or what other alternative is there?
I'm running this from Ubuntu 20.04 with rstudio-server.
Thank you in advance for any guidance you can give me!
If there are multiple data objects or collections and one of them has no metadata, the order of the AVU-parts for the existing metadata is wrong: attribute-units-value, i.e. alphabetically.
rirods:::metadata_reorder()
is meant to fix this, but it's failing silently when it runs into the empty dataframe of the item with no metadata.
At the bottom of the example I suggest an alternative Map()
call for rirods:::metadata_reorder()
, although this might also end up linked to #12 (this could be dealt with in a custom printing method).
library(rirods)
library(tibble)
create_irods("http://localhost/irods-rest/0.9.3", "/tempZone/home", overwrite = TRUE)
iauth('rods', 'rods')
# Problem ----
files <- ils(metadata = TRUE)
files
#> logical_path metadata type
#> 1 /tempZone/home/rods/collection NULL collection
#> 2 /tempZone/home/rods/foo.rds foo, baz, bar data_object
# The order of the columns is alphabetical
files$metadata
#> [[1]]
#> data frame with 0 columns and 0 rows
#>
#> [[2]]
#> attribute units value
#> 1 foo baz bar
# Diagnosis ----
# Silent error in `metadata_reorder()`
# code from `ils()`:
out <- rirods:::irods_rest_call('list', 'GET', list(
`logical-path` = rirods:::.rirods$current_dir,
stat = as.integer(FALSE),
metadata = as.integer(TRUE),
offset = 0,
limit = 100
), FALSE)
x <- httr2::resp_body_json(
out,
check_type = FALSE,
simplifyVector = TRUE
)$`_embedded`
x$metadata
#> [[1]]
#> data frame with 0 columns and 0 rows
#>
#> [[2]]
#> attribute units value
#> 1 foo baz bar
# what `metadata_reorder()` tries to do
Map(function(x) {x <- x[ ,c("attribute", "value", "units")]; x}, x$metadata)
#> Error in `[.data.frame`(x, , c("attribute", "value", "units")): undefined columns selected
# because it's called silently, it doesn't tell us there is an error with the empty dataframe
# Suggestion ----
# replacement for the `Map()` call in `metadata_reorder()`
Map(function(x) {if (length(x) > 0) x[ ,c("attribute", "value", "units")] else x}, x$metadata)
#> [[1]]
#> data frame with 0 columns and 0 rows
#>
#> [[2]]
#> attribute value units
#> 1 foo bar baz
Created on 2023-03-17 with reprex v2.0.2
But it has a CRAN release.
All of the columns available to be returned by a GenQuery are possible inputs to iquery
.
It should return the column names as is, without interpretation... as there are many...
$ iquest attrs | wc -l
327
Since calculating checksums on the /stream endpoint is not viable today...
rirods could provide an ichksum("logical_path")
function that would return the value from the iRODS server.
This will require the REST API to provide a checksum endpoint.
Need to consider checksum algorithm as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.