GithubHelp home page GithubHelp logo

tidyopeneo's Introduction

tidyopeneo: tidy datacubes

The tidyopeneo package was created with the intention to make the openeo package processes more flexible. As the name itself reveals, "tidy" refers to tidyverse. The logic is to be able to apply standard tidyverse and especially dplyr versing code to the most relevant openeo functions.

The need for this package comes from the fact that the so called openeo processes are usually too "APIish". The mechanism is truly functional and helpful, although it does not look like R code. So, if a regular R user want to deep dive into the world of data cubes using the openeo package, they will usually face some difficulties getting familiat with the way of coding. Somehow, so far, when using openeo processes, the feeling is mainly that one is not handling R code per se, but a specific API.

For example, when trying a simple filter_bands(), that would filter some of the raster colour bands present, the code would look as something like this:

library(openeo)

con = connect(host = "https://openeo.cloud")
p = processes()

dc = p$load_collection(
               id = "SENTINEL_5P_L2",
               spatial_extent = list(west = 6.09, south = 46.15, east = 6.99, north = 46.57),
               temporal_extent=c("2018-07-01", "2018-10-31")
             )

dc_no2 = p$filter_bands(data = dc, bands = "NO2")

The use of the dollar sign is typical from API calls, and it may make more sense to users with more familiarity with Objected Oriented Programming (OOP), as one is actually calling a method (load_collection()) from an object (p).

Moreover, we need to remember too many of these processes in order to use them. That is exactly where the tidyverse comes into the scene. The methods from dplyr are quite straightforward and natural, and for this reason many tools and scripts are being converted into this so called "universe". In that sense, why not wrap some of the main openEO processes into dplyr functions? Why could not datacubes be treated similarly to data frames / tibbles. Therefore, the aim here was to wrap some of the processes from the openEO API into dplyr's functions, such as the famous filter, mutate, filter, and so on. A guide on how exactly this was wrapped can be seen in the figure below.

openEO processes and their relative wrapped functions in tidyopeneo

It is important to mention all the functions were wrapped into dplyr's functions, in exception to resample, which was fully newly created. For this specific case, this is more of an experiment to analyse whether just wrapping into dplyr functions is enough or maybe new functions could also be accepted. In general, as the idea is also to simplify openEO API processes, this came quickly to mind.

For sure, as this is work in progress, more processes and wrapped functions might show up. Another important topic refers to group_by and summarise. The idea is to have something very similar as in dplyr, where one first aggregates the data and later uses summarise on the aggregated data. The same logic applies. The group_by here will aggregate the cube per day, per geometry or time interval and lately, summarise will be applied. The groub_by simply creates a subclass called "grouped datacube", whilst saving the ".by" argument into the environment of the "grouped datacube" object, which is then retrieved by summarise. If the datacube in question is not a "grouped datacube", simply "reduce_dimension" process is run.

The hope is that with the development of this package, more people from the R community will feel like using openEO for Earth Observation tasks.

With the mission to aggregate and simplify some of the main processes from openeo, below one may see a first example on how to call the class "datacube" from tidyopeneo, which is understood as the first step when working with the package. The class "datacube" already sets a connection with openeo.cloud and only requires the id argument. One can also pass it to another "ProcessNode" object from openeo too.

In the example below, one can call in the package and already start with a datacube

library(tidyopeneo)
library(sf)

dc = datacube(id = "SENTINEL_5P_L2")

filter

From now on, we can work completely in a tidy syntax and use even the infamous pipes ( %>% ). The first example below demonstrates the filter_bands process and how it is wrapped in tidyopeneo.

dc_select = dc %>% select(.bands = "NO2")
dc_select = dc %>% select(.bands = "CO2")

It also includes four other different processes. Let's have a look at some of them.

dc_filtered = dc_select %>% 
  filter(.extent = c("2018-01-01", "2018-01-02")) %>% #filter_temporal
  filter(.extent = list(west = 6.09, south = 46.15, east = 6.99, north = 46.57)) #filter_bbox

#filter_spatial
lon = c(6.22, 6.24)
lat = c(46.20, 46.25)
pol_coords = dplyr::tibble(lon, lat)
pol <- pol_coords %>%
     st_as_sf(coords = c("lon", "lat"), crs = 3857) %>%
     st_bbox() %>%
     st_as_sfc()

dc_filtered = dc_select %>% filter(.geometries = pol)

Have a look on how we define the filter and how much it looks like a dplyr workflow. It is also important to mention that depending on the arguments that are passed, the wrapper function will deploy a different process. For knowing that, make sure to check not only the documentation of the wrapper functions, but also of the openeo processes. The parameters will be the same, although in tidyopeneo they can be all mixed up in one single function (the whole idea of the wrapper).

mutate

The mutate() function is also a big wrapper. It includes four processes again and they are all apply processes. In those terms, the example below demonstrates the use of this wrapper :

dc_cloud = dc %>% 
  filter(.extent = c("2018-01-01", "2018-01-02")) %>% #filter_temporal
  filter(.extent = list(west = 6.09, south = 46.15, east = 6.99, north = 46.57)) %>%  #filter_bbox
  select(.bands = "CLOUD_FRACTION")

## mask for cloud cover
threshold_ <- function(data, context) {
  p = openeo::processes()
  threshold <- p$gte(data[1], 0.5)
  return(threshold)
}

# apply the threshold to the cube
cloud_threshold = dc_cloud %>% 
  mutate(.process = threshold_)

# mask the cloud cover with the calculated mask
dc_masked <- p$mask(dc_filtered, cloud_threshold)

Here what also comes to hand is that you can use openEO API processes as usual on "datacube" classes from tidyopeneo. This is pretty useful, and the other way around is also possible, if the datacube() constructor is used to recreate a "datacube " class.

rename

The rename() function wraps only rename_dimension process. In this, the main idea is just to be able to call the functions inside the pipes.

dc_rename = dc_filtered %>% rename(.source = "t", .target = "time")

Another interesting point of tidyopeneo is to have more examples of the multiple usages coming from the different API processes. This is still lacking in openEO documentation and it is believed this package could help. This is all been built.

resample

As stated, resample() is the single function in the actual stage of tidyopeneo that is not coming from dplyr, but it is a complete new function. It is ment as an example of a way to simplify different processes into one single wrapped function.

dc_resample = dc_filtered %>% resample(.resolution = 10/111)

summarise

The understanding of summarise in tidyopeneo is quite different than the one in dplyr. In spite of working on the column level, summarise works on all pixels of a given dimension and it wraps the reduce_dimension process. Again, a single function for a single process, therefore allowing it to run in magrittr pipes.

dc_summarised = dc_resample %>% summarise(.reducer = "mean")

It is important to mention that if you are more familiar with the American English version of summarise, i.e., summarize, this is also available, just like in dplyr.

group_by

Finally, we have the group_by() function, which, as mentioned, only creates a subclass of the datacube class. "grouped datacube". Here are some example of how to make it interact with summarise.

dc_sum <- dc_resample %>%
  group_by("t") %>%
  summarise("sum")

dc_sum <- dc_resample %>%
  group_by("space") %>%
  summarise("sum")

And just for you to notice the subclass...

dc_grouped <- dc_resample %>% group_by("day")
dc_grouped %>% class() %>% print()

tidyopeneo's People

Contributors

edzer avatar hurielreichel avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

edzer

tidyopeneo's Issues

Introducing slice causes a long error in testthat()

----------- FAILURE REPORT --------------
--- failure: length > 1 in coercion to logical ---
--- srcref ---
:
--- package (from environment) ---
openeo
--- call from context ---
FUN(X[[i]], ...)
--- call from argument ---
is.environment(value) || !is.na(value)
--- R stacktrace ---
where 1: FUN(X[[i]], ...)
where 2: lapply(node$parameters, function(param) {
value = param$getValue()
if (length(value) > 0 && (is.environment(value) || !is.na(value))) {
if ("Graph" %in% class(value)) {
return(value$getVariables())
}
else if ("ProcessGraphParameter" %in% class(value) &&
length(value$getProcess()) == 0) {
return(value)
}
else if (is.list(value)) {
return(lapply(value, function(array_elem) {
if ("ProcessGraphParameter" %in% class(array_elem) &&
length(array_elem$getProcess()) == 0) {
return(array_elem)
}
return(NULL)
}))
}
}
return(NULL)
})
where 3: FUN(X[[i]], ...)
where 4: lapply(used_nodes, function(node) {
node_variables = lapply(node$parameters, function(param) {
value = param$getValue()
if (length(value) > 0 && (is.environment(value) || !is.na(value))) {
if ("Graph" %in% class(value)) {
return(value$getVariables())
}
else if ("ProcessGraphParameter" %in% class(value) &&
length(value$getProcess()) == 0) {
return(value)
}
else if (is.list(value)) {
return(lapply(value, function(array_elem) {
if ("ProcessGraphParameter" %in% class(array_elem) &&
length(array_elem$getProcess()) == 0) {
return(array_elem)
}
return(NULL)
}))
}
}
return(NULL)
})
})
where 5: variables(final_node)
where 6: doTryCatch(return(expr), name, parentenv, handler)
where 7: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 8: tryCatchList(expr, classes, parentenv, handlers)
where 9: tryCatch({
private$variables = variables(final_node)
}, error = function(e) {
})
where 10: doTryCatch(return(expr), name, parentenv, handler)
where 11: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 12: tryCatchList(expr, classes, parentenv, handlers)
where 13: tryCatch({
con = .assure_connection(con)
if (is.null(final_node)) {
stop("The final node (endpoint of the graph) has to be set.")
}
if ("ProcessNode" %in% class(final_node)) {
node_list = .final_node_serializer(final_node)
private$nodes = unname(node_list)
private$final_node_id = final_node$getNodeId()
tryCatch({
private$variables = variables(final_node)
}, error = function(e) {
})
}
else {
stop("The final node has to be a ProcessNode.")
}
invisible(self)
}, error = .capturedErrorToMessage)
where 14: initialize(...)
where 15: Graph$new(final_node = from)
where 16: asMethod(object)
where 17: as(process_graph, "Graph")
where 18: self$setProcessGraph(process_graph = process_graph)
where 19: initialize(...)
where 20: Process$new(id = NULL, process_graph = from)
where 21: asMethod(object)
where 22: as(., "Process")
where 23: openeo::toJSON(.)
where 24: rjson::fromJSON(.)
where 25: withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
where 26: suppressWarnings(.)
where 27: .p$save_result(data = .data, format = list_file_formats()$output$JSON) %>%
as("Process") %>% openeo::toJSON() %>% rjson::fromJSON() %>%
suppressWarnings()
where 28: slice.datacube(., n = 10)
where 29: slice(., n = 10)
where 30 at test-slice.R#4: dc %>% filter(.extent = c("2021-01-01", "2021-03-03")) %>% slice(n = 10)
where 31: eval(code, test_env)
where 32: eval(code, test_env)
where 33: withCallingHandlers({
eval(code, test_env)
if (!handled && !is.null(test)) {
skip_empty()
}
}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,
message = handle_message, error = handle_error)
where 34: doTryCatch(return(expr), name, parentenv, handler)
where 35: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 36: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
where 37: doTryCatch(return(expr), name, parentenv, handler)
where 38: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),
names[nh], parentenv, handlers[[nh]])
where 39: tryCatchList(expr, classes, parentenv, handlers)
where 40: tryCatch(withCallingHandlers({
eval(code, test_env)
if (!handled && !is.null(test)) {
skip_empty()
}
}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,
message = handle_message, error = handle_error), error = handle_fatal,
skip = function(e) {
})
where 41: test_code(desc, code, env = parent.frame(), reporter = reporter)
where 42 at test-slice.R#1: test_that("slice creates object of class 'datacube'", {
dc = datacube(id = "SENTINEL_5P_L2")
dc_1 = dc %>% filter(.extent = c("2021-01-01", "2021-03-03")) %>%
slice(n = 10)
dc_2 = dc %>% filter(.extent = c("2021-01-01", "2021-03-03")) %>%
slice(n = -5)
dc_3 = dc %>% filter(.extent = c("2021-01-01", "2021-03-03")) %>%
slice(prop = 0.55)
dc_4 = dc %>% filter(.extent = c("2021-01-01", "2021-03-03")) %>%
slice(prop = -0.3)
expect_equal(all(inherits(dc_1, "datacube"), inherits(dc_1,
"ProcessNode"), inherits(dc_2, "datacube"), inherits(dc_2,
"ProcessNode"), inherits(dc_3, "datacube"), inherits(dc_3,
"ProcessNode"), inherits(dc_4, "datacube"), inherits(dc_4,
"ProcessNode")), TRUE)
})
where 43: eval(code, test_env)
where 44: eval(code, test_env)
where 45: withCallingHandlers({
eval(code, test_env)
if (!handled && !is.null(test)) {
skip_empty()
}
}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,
message = handle_message, error = handle_error)
where 46: doTryCatch(return(expr), name, parentenv, handler)
where 47: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 48: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
where 49: doTryCatch(return(expr), name, parentenv, handler)
where 50: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),
names[nh], parentenv, handlers[[nh]])
where 51: tryCatchList(expr, classes, parentenv, handlers)
where 52: tryCatch(withCallingHandlers({
eval(code, test_env)
if (!handled && !is.null(test)) {
skip_empty()
}
}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning,
message = handle_message, error = handle_error), error = handle_fatal,
skip = function(e) {
})
where 53: test_code(NULL, exprs, env)
where 54: source_file(path, child_env(env), wrap = wrap)
where 55: FUN(X[[i]], ...)
where 56: lapply(test_paths, test_one_file, env = env, wrap = wrap)
where 57: doTryCatch(return(expr), name, parentenv, handler)
where 58: tryCatchOne(expr, names, parentenv, handlers[[1L]])
where 59: tryCatchList(expr, classes, parentenv, handlers)
where 60: tryCatch(code, testthat_abort_reporter = function(cnd) {
cat(conditionMessage(cnd), "\n")
NULL
})
where 61: with_reporter(reporters$multi, lapply(test_paths, test_one_file,
env = env, wrap = wrap))
where 62: test_files(test_dir = test_dir, test_package = test_package,
test_paths = test_paths, load_helpers = load_helpers, reporter = reporter,
env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,
wrap = wrap, load_package = load_package)
where 63: test_files(test_dir = path, test_paths = test_paths, test_package = package,
reporter = reporter, load_helpers = load_helpers, env = env,
stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning,
wrap = wrap, load_package = load_package, parallel = parallel)
where 64: test_dir("testthat", package = package, reporter = reporter,
..., load_package = "installed")
where 65: test_check("tidyopeneo")

--- value of length: 2 type: logical ---
[1] TRUE TRUE
--- function from context ---
function (param)
{
value = param$getValue()
if (length(value) > 0 && (is.environment(value) || !is.na(value))) {
if ("Graph" %in% class(value)) {
return(value$getVariables())
}
else if ("ProcessGraphParameter" %in% class(value) &&
length(value$getProcess()) == 0) {
return(value)
}
else if (is.list(value)) {
return(lapply(value, function(array_elem) {
if ("ProcessGraphParameter" %in% class(array_elem) &&
length(array_elem$getProcess()) == 0) {
return(array_elem)
}
return(NULL)
}))
}
}
return(NULL)
}
<bytecode: 0x55e111abcbd0>
<environment: 0x55e112297000>
--- function search by body ---
----------- END OF FAILURE REPORT --------------
Fatal error: length > 1 in coercion to logical

new functionalities

  • 1. pass .p and .con as function parameters as in Edzer's commit.
  • 2. group_by + summarise behaviour {if data cude is grouped, or not, different interactions} - add new subclass for that and therefore a (sub)method.
  • 3. pass what slice does to filter.
  • 4. year %in% (2000:2002) / group_by(year, month) [use list_connection()$extent as reference
  • 5. check mandatory params -> is.null() stop()
  • 6. "datacube" should be the first in the list of inherits(), not last
  • 7. Maths and Ops Math.datacube = function(x, ...) { p = openeo::processes(); p[.Generic] }
  • 8. Add examples for all functions + add them to the vignette
  • 9. create slice interaction more similar dplyr slice
  • 10. remove default con from all function... speed up code.

NOTEs when running check

❯ checking for future file timestamps ... NOTE
unable to verify current time

❯ checking R code for possible problems ... [12s/10s] NOTE
filter.datacube: no visible binding for global variable ‘.’
Undefined global functions or variables:
.

@edzerpebesma comments

  • create a select for bands
  • improve operation : simplify
  • Do not mask from dplyr
  • change group_by function so it can have either a reducer string (e.g. "mean") or keep as it is
  • change package name
  • three dots
  • group_by process operations

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.