GithubHelp home page GithubHelp logo

unitar's Introduction

Unitar

Unitar is a simple R package that wraps the targets package. To use unitar, you will need to already be familiar with targets, because the functionality is just an extension of targets and the naming conventions all follow the targets package approach. Unitar adds new functionality to use targets that span projects, which is outside the scope of the base targets package. With unitar, you can very easily link two targets projects, loading in built targets from other projects so that you can share caches and computing across users and projects.

Please see the vignette in the vignettes folder for installation and usage instructions.

Install

For now, install with:

devtools::install_github("databio/unitar")

unitar's People

Contributors

nsheff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

unitar's Issues

A target factory for CSV lists of targets

One of my use cases for unitar is that I have a bunch of general-purpose files that I re-use across lots of R projects. I also use these files outside of R.

For my R projects, I process the files in various ways and then re-use these derived files across many projects. I want to use targets to track the processing and caching. Then, I want to use unitar to think of this as a central repository with targets I re-use across many projects.

To make this simpler for me, I created a new target factory that builds targets from a list in a CSV file. This CSV contains 1 row per target. For example, each row can correspond to one of my resource files, and it tracks that file and specifies a function for loading it into R. I like this because the CSV file helps me keep track of each of my resource files, and it feels convenient to me that this is the way I specify targets. But in addition, a row in the CSV can also correspond to an R function call that would process data in some other way.

I created a demo repository to show how this works here: unitar resources demo. For now I put this target factory in the unitar package, but I'm now realizing it might be a more general concept. While my original intent was to use it for a "resources repository" that works with unitar's cross-project concept, in fact, it's really just a way to specify targets using a CSV file instead of traditional R functions.

@wlandau, I'm curious to hear your thoughts on this kind of a CSV-to-target factory.

A target factory for tracking, but not duplicating, external targets

In #3, I we talked about 3 use cases for unitar, and created a target factory to track external targets into a secondary project.

But a common use case (for me at least) is to want to track the upstream target as an input without duplicating it. But as @wlandau said,

I think it might be difficult to make a sufficiently flexible target factory for (3).

I've been thinking about it a bit more. I think we could create a target factory that would take as input:

  1. An upstream target name
  2. A function call on that name.

I think it may be possible to use something like:

track_external_target = function(tname, ext_tname, func) {
  fullpath = unitar_path(ext_tname)
  name_file = paste0(sample_name, "_file")
  command_data = substitute(func(fullpath), env=list(fullpath=as.symbol(name_file), func=as.symbol(func)))

  list(
    tar_target_raw(name_file, fullpath, format = "file"),
    tar_target_raw(tname, command_data)
  )  
}

I just wanted to capture this as a separate issue

Target factory to take output from one pipeline as input into another

As I propose in ropensci/targets#297 (comment), it may be possible to take output from one pipeline (A) as input into another pipeline (B). For local files, this would rely on file tracking. ("url" and "aws_*" storage formats would need different workarounds.) Sketch of the tar_target() calls:

# _targets.R
# ...
list(
  tar_target(file_a, unitar_path(...), format = "file").
  tar_target(data_a, unitar_read_from_path(file_a))
)

unitar_read_from_path() would need to be defined separately. Given a path like .../project_a/_targets/objects/target_a, it could call withr::with_dir("../project_a", tar_read_raw("target_a")) or something like that.

It should be possible to simplify the above two target calls down to a single target factory:

# _targets.R
# ...
list(
  tar_unitar_read(new_name, "other_pipeline_dir", "target_name_in_other_pipeline")
)

Use tar_read_raw()

unitar_load() calls readRDS() to load data:

return(withr::with_dir(folder, readRDS(paste0("_targets/objects/", tname))))

Maybe use tar_read_raw() instead? That way, if a target has a local storage format other than "rds", unitar_load() will still be able to read it.

Naming conventions: "load" vs "read"

In targets, I use "read" for functions that return values and "load" for functions that assign to an environment and returns NULL. In other words, in targets, "read" has a return value and no side effects, and "load" is the reverse. Interested in aligning on this, or do you prefer to stick with the name unitar_load()?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.