GithubHelp home page GithubHelp logo

checkpoint's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

checkpoint's Issues

"snapshots" of RRT repository

This feature is apparently in packrat as well as GRAN.

Seems that packrat only snapshots the packages used in the repo, that is, the packrat/ dir in the user's repo. I think GRAN snapshots the whole repo, including the users files, etc.

This is basically what git does really well, so would be nice to just use git, but can we do something similar without the user having to have git? Ideas:

  • We could store the users files and their rrt_manifest.yml file in a "versions" folder somewhere in the repo, named with repoid + version number, like 1944a6826bee36f5925f1d13f11c6dc6_v1. Then a user can switch between versions with a single function. I think this can easily be done, but we'd want to probably make a user go through a prompt to make sure they actually want to overwrite files.

A note on terminology: since we have the notion of "snapshots" in marmoset/mran, if we have the notion of versions of a user's RRT repositories on their machine, we should use a different term, perhaps "version"?

When users install using install.packages()

When users do this when they have started R in their RRT repo, the package installs in the library in the repo (this is good), but the source is just installed in a temporary dir and deleted at end of session. Perhaps look for these installed packages without source files whenever rrt_install or rrt_refresh are run, and download the sources of the approprate version

Unexpected folders in a repository cause rrt_install() to fail

If you try and run rrt_install on a repository that includes any folders OTHER THAN rrt or .git, it will fail.

> rrt_install()
Checking to make sure repository exists...
Checing to make sure rrt directory exists inside your repository...
Error in rrt_install() : rrt directory doesn't exist

The reason in this case is that my repository includes the .Rproj.user folder created by RStudio:

> repo=getwd()
> file.path(repo, "rrt", "lib", R.version$platform,
+ base::getRversion())
[1] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1.0"
> present <- list.dirs(repo)[-1]
> present
 [1] "/Users/david/RRT-example2/.Rproj.user"                                          
 [2] "/Users/david/RRT-example2/.Rproj.user/9CD34B73"                                 
 [3] "/Users/david/RRT-example2/.Rproj.user/9CD34B73/ctx"                             
 [4] "/Users/david/RRT-example2/.Rproj.user/9CD34B73/presentation"                    
 [5] "/Users/david/RRT-example2/.Rproj.user/9CD34B73/sdb"                             
 [6] "/Users/david/RRT-example2/.Rproj.user/9CD34B73/sdb/prop"                        
 [7] "/Users/david/RRT-example2/.Rproj.user/9CD34B73/sdb/s-3B2F7D3D"                  
 [8] "/Users/david/RRT-example2/rrt"                                                  
 [9] "/Users/david/RRT-example2/rrt/lib"                                              
[10] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0"                    
[11] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1"                
[12] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1.0"              
[13] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1.0/src"          
[14] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1.0/src/contrib"  
[15] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/manipulate"     
[16] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/manipulate/help"
[17] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/manipulate/html"
[18] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/manipulate/Meta"
[19] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/manipulate/R"   
[20] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/rstudio"        
[21] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/rstudio/help"   
[22] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/rstudio/html"   
[23] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/rstudio/Meta"   
[24] "/Users/david/RRT-example2/rrt/lib/x86_64-apple-darwin13.1.0/3.1/rstudio/R"      

Which causes the following test in rrt_install to fail:

    if (!all(grepl("rrt", present))) {
        stop("rrt directory doesn't exist")
    }

Setup workflow for others to test

  1. rrt_init()
  2. Create some .R files, or .Rmd, etc. with some code, including some libraries
  3. Then try rrt_refresh() - to look for new packages that will be downloaded
  4. Try rrt_install() - to install packages in the repository
  5. ...

Write MRAN snapshot ID in new field to manifest file

from discussion in #2

we need a field for snapshot ID too. I can add this to the programmatic manifest file changes when a user decides to use MRAN - when they do we can write a field called MRAN_ID or something like that

Testing system

I'm familiar with Travis-CI, but is there a preference for Jenkins or something else? @cmosetick

There is quite good support for R packages now on Travis-CI (if we go the travis route), so perhaps structuring this repo as a package, even if not essential, would give us easy way to get CI.

Add created date to manifest file

Right now we have DateCreated but that actually updates to latest date it was modified, so have both DateCreated and DateModified or DateUpdated.

Also, signify time zone in dates.

Debug paths on Windows

Currently testing on Windows. Seems to be a few bugs in how paths are found and used, looking at now.

Proposed repository layout

Thoughts on this?

Proposed layout of a RRT repository

myrepo - |
         |- manifest.txt # user specified list of pkgs
         |- code.R # any number of user supplied files
         |- rrt # this dir created by rr_init()
            |- lib # holds packages in pkgs/
                |- pkgs - |
                          |-doMC
                          |-plyr
            |- manifest.lock # RRT generated metadata file

server testing

Things to consider doing on the server

These tests should be more robust than those on CRAN obviously.

  • r cmd check
  • incorporate checks already done on CRAN?
  • force run examples
  • force run tests
  • force run vignettes
  • check downstream pkgs install and check fine
  • check reverse dependencies
  • as a separate longer term thing could test package x with all versions of its dependencies, not just current versions, so if user wants to use ver 1.2 of pkg y, we know id its compatible or not with pkg x

Test that pkgs work are compatible with one another

We talked about this on the call...While we can get dependencies, we don''t know if the packages actually work together. There is some filtering for this given the CRAN package submission process (but it's not strictly enforced)

Perhaps: A single function (and helpers) to test that packages are compatible with one another in the project, either

  • locally, or
  • on a server?

The problem with the local testing is that with a large number of packages this could take a long time, but will explore ways to get the time down.

rrt_refresh generates warnings about packageDescription

Running R from an initialized repository where the only file (myscript.R) has the following content:

## Example script using packages
require(ggplot2)
require(plyr)
require(data.table)
require(knitr)
require(lattice)

print(sessionInfo())

I run rrt_refresh:

> rrt_refresh()
Warning messages:
1: In packageDescription(y, encoding = NA) : no package 'hexbin' was found
2: In packageDescription(y, encoding = NA) : no package 'Hmisc' was found
3: In packageDescription(y, encoding = NA) : no package 'mapproj' was found
4: In packageDescription(y, encoding = NA) : no package 'maps' was found
5: In packageDescription(y, encoding = NA) : no package 'maptools' was found
6: In packageDescription(y, encoding = NA) : no package 'multcomp' was found
7: In packageDescription(y, encoding = NA) : no package 'quantreg' was found
8: In packageDescription(y, encoding = NA) : no package 'abind' was found
9: In packageDescription(y, encoding = NA) : no package 'doMC' was found
10: In packageDescription(y, encoding = NA) : no package 'foreach' was found
11: In packageDescription(y, encoding = NA) : no package 'iterators' was found
12: In packageDescription(y, encoding = NA) : no package 'itertools' was found
13: In packageDescription(y, encoding = NA) : no package 'bit64' was found
14: In packageDescription(y, encoding = NA) : no package 'chron' was found
15: In packageDescription(y, encoding = NA) : no package 'data.table' was found
16: In packageDescription(y, encoding = NA) : no package 'fastmatch' was found
17: In packageDescription(y, encoding = NA) : no package 'reshape' was found
18: In packageDescription(y, encoding = NA) : no package 'xts' was found
19: In packageDescription(y, encoding = NA) : no package 'rgl' was found
20: In packageDescription(y, encoding = NA) : no package 'testit' was found

Get dependencies

A number of options here to start off with:

I'll have a peak around each of them and figure out what else is needed. Probably better now to not have too many dependencies, so new code I think is in order.

Manifest

  • What to use for a manifest? Options include .R, .md, .txt, .json, .yml
  • What to use for the markup? This is linked to the item above of course since we wouldn't do e.g., yaml in a .R file. Could have no markup and simply a list of packages like
plyr
ggplot2
...

or could use yaml markup as in a Makefile. I like the idea of using yaml as it could be extended to have other information besides package names, e.g. , urls for the project if needed, project owner, a license, etc.

Example usecases

I was trying to think of some example workflows - do these make sense? Any others?

Proposed Workflows

No project exists yet

Install and load the pkg

devtools::install_github("RevolutionAnalytics/RRT")
library("RRT")

Initialize a repo. Defaults to creating in the current working directory.

rrt_init()

Write some code...

Then refresh the repo by running rrt_refresh() which will look through the repository again and install any new packages needed, update internal manifest, etc.

rrt_refresh()

Project (folder) already exists

The same process as above, except that since repo already exists, when rrt_init() is run, an internal rrt directory is created with associated files within the directory given by user, and packages and deps collected from the files already in the directory.

Add function to clean out installed packages

When a user wants to install all new packages this function can clean out the old packages for them, and print success message.

Most likely to be used before installing from snapshot.

Start R with package list from a RRT repository

User specify via one of: folder path, identifier (UUID?), repo name, url, etc., Parse from one parameter for user simplicity.

Look in the path to the list of repos in the .Renviron or .Rprofile file perhaps

Repository tracking

Where to keep track of various repositories on a users machine.

Perhaps write to .Renviron a record like RRT_REPO_LIST="/path/to/list.txt" where list.txt has the list of paths to repos and their names

rrt_init in an new folder fails

I started RStudio in a new project directory, and then tried rrt_init() to set it up as an RRT repository. Then I realised interactive=TRUE is the default. This is what I got:

rrt_init(interactive=FALSE)
Checking to see if repository exists already...
Checing to make sure rrt directory exists inside your repository...
Creating rrt directory /Users/david/R/RRT-example/rrt/lib/x86_64-apple-darwin13.1.0/3.1
Looking for packages used in your repository...
Error in if (repos["CRAN"] == "@cran@") { :
missing value where TRUE/FALSE needed

Command line interface

What's ideal tool to use if we have a command line command like RRT --optional flags...

I've written command line tools in Ruby and Python, but not R, but if we're trying to stick with R, maybe there is a way to do in R. Using Rub/Python I guess would introduce new deps

create RRT demo screencast

Create a short screencast demo of RRT 0.1 functionality. Maximum length of ~10 minutes, can be shorter. Due date: 6-24 by end of day.

Create metadata catalog for server

I'll start working on this locally, then we can move to test on server

  • Do all in R first - see how painful
  • tools::write_PACKAGES to create metadata for each package
  • convert to JSON? or not? if JSON (or other structured text), will be easier to add to, and parse downstream
  • add additional metadata (with a schema predefined)
    • snapshot metadata
    • snapshot diff metadata
    • include compatability (down the road) - Define key-value pair, but = NULL for now

Allow optional interactive version of rrt_init()

Just using bower init this morning on the cli, they have an interactive bower init process that asks for input to set up a bower.json file

Seems this could be easily done for RRT, asking for input on where to setup the new repository, what packages to install, authors, license, online git/svn repo, etc.

bower init suggests defaults for each question, which could be done as well i think.

Share RRT repository

This is labeled with the next milestone (v0.2), but seems like an important feature.

Possible options include wrapping up into a compressed zip or tarball, or sharing to the web via github, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.