GithubHelp home page GithubHelp logo

posterior's People

Contributors

ahartikainen avatar alyst avatar avehtari avatar ben18785 avatar helske avatar jgabry avatar jsocolar avatar karldw avatar mjskay avatar n-kall avatar paul-buerkner avatar rok-cesnovar avatar tillea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

posterior's Issues

support data frame and tibble?

Really just tibble but let users get regular data frame if they want, e.g. as_draws_df() just wraps as_draws_tibble(), etc.

Compound subsetting?

The "subset" operation is currently a fairly generic "slice" operation, right? In which case, I wonder if it makes sense to allow combined subsetting / slicing operations, like:

subset(posterior, chain = 1, iteration = 1:100, variable = c("x", "y"))

This might be a more straightforward API than a separate function for slicing along each dimension. Although it would have to enforce no slicing along draws and chains/iterations simultaneously (or whatever the solution to #6 is).

The current syntax may lead to constructions like this:

posterior %>%
  subset_chain(1) %>%
  subset_iteration(1:100) %>%
  subset_variable(c("x", "y"))

Which I think is also nice, but may be more verbose than the compound version without additional clarity. subset() is also already a base-R generic intended for this kind of slicing, I think.

How to handle discrete parameters?

Im not sure if you have discussed this before? I guess it is important to handle since many non-stan models may have discrete parameters.

Which representations of posterior draws are included?

Possibilities:

  • matrix (chains merged)
  • array (with chain dimension, maybe use new rray types?)
  • list (chains merged, in the style of rstan::extract() when it returns a list)
  • list of lists style 1 (separate chains, i.e., same as list but with separate sub-lists for each chain)
  • maybe a list of lists style 2 (nested lists corresponding to grouping structure of params, e.g. x$group_var$slope$level)
  • tidy data frames (maybe just tibbles?, coordinate with tidybayes. does @mjskay want to coauthor this package?)

Questions:

  • Which of the above types do we include?
  • Are there other types not listed to include?
  • Will we need these types to have additional classes (e.g. 'posterior_matrix') or can/should we try to avoid that?

Convert mcmc and mcmc.list objects to posterior formats

The mcmc and mcmc.list objects of the coda package have been used in a lot of places. It would be beneficial to transform them into formats the posterior package supports natively. However, for reasons detailed in #4, we don't want to fully support those formats, that is, don't want to write all the methods and transformation functions for them.

confused by different summary output for list and array formats

@paul-buerkner Am I missing something or is what I'm seeing below weird? I labelled this as "bug" but maybe it's not and I'm just overlooking something simple.

Presumably summarise_draws() should return the identical diagnostic results for all the formats that keep the chain information and different results for the matrix format (for any diagnostics that use chain info). This is what happens except for the list format, which gives very different results:

x <- draws_eight_schools
x_arr <- as_draws_array(x)
x_list <- as_draws_list(x)
x_df <- as_draws_df(x)
x_mat <- as_draws_matrix(x)

Here's for matrix (different than others but makes sense since it thinks it's all one chain):

summarise_draws(x_mat, c("rhat", "ess_bulk", "ess_tail"))
# A tibble: 10 x 4
   variable  rhat ess_bulk ess_tail
   <chr>    <dbl>    <dbl>    <dbl>
 1 mu       1.00      801.     269.
 2 tau      1.00      337.     306.
 3 theta[1] 1.00      512.     263.
 4 theta[2] 1.03      669.     285.
 5 theta[3] 0.999     528.     226.
 6 theta[4] 0.999     615.     329.
 7 theta[5] 1.00      570.     309.
 8 theta[6] 1.000     598.     294.
 9 theta[7] 1.00      538.     323.
10 theta[8] 1.01      587.     309.

Here's for array (different than matrix, which makes sense):

summarise_draws(x_arr, c("rhat", "ess_bulk", "ess_tail"))
# A tibble: 10 x 4
   variable  rhat ess_bulk ess_tail
   <chr>    <dbl>    <dbl>    <dbl>
 1 mu       1.00      878.     300.
 2 tau      0.998     387.     311.
 3 theta[1] 1.000     551.     272.
 4 theta[2] 1.04      765.     344.
 5 theta[3] 1.02      553.     246.
 6 theta[4] 0.998     655.     370.
 7 theta[5] 1.000     608.     326.
 8 theta[6] 0.998     643.     305.
 9 theta[7] 0.995     622.     345.
10 theta[8] 1.01      618.     332.

Here's for df (same as array, which makes sense):

summarise_draws(x_df, c("rhat", "ess_bulk", "ess_tail"))
# A tibble: 10 x 4
   variable  rhat ess_bulk ess_tail
   <chr>    <dbl>    <dbl>    <dbl>
 1 mu       1.00      878.     300.
 2 tau      0.998     387.     311.
 3 theta[1] 1.000     551.     272.
 4 theta[2] 1.04      765.     344.
 5 theta[3] 1.02      553.     246.
 6 theta[4] 0.998     655.     370.
 7 theta[5] 1.000     608.     326.
 8 theta[6] 0.998     643.     305.
 9 theta[7] 0.995     622.     345.
10 theta[8] 1.01      618.     332.

But here's for list (very different from the others):

summarise_draws(x_list, c("rhat", "ess_bulk", "ess_tail"))
# A tibble: 10 x 4
   variable  rhat ess_bulk ess_tail
   <chr>    <dbl>    <dbl>    <dbl>
 1 mu       1.00     3204.    1133.
 2 tau      1.00     1383.    1226.
 3 theta[1] 1.00     2049.    1069.
 4 theta[2] 1.01     2690.    1289.
 5 theta[3] 0.998    2119.    1048.
 6 theta[4] 0.998    2466.    1449.
 7 theta[5] 1.00     2281.    1246.
 8 theta[6] 0.999    2408.    1180.
 9 theta[7] 1.00     2195.    1335.
10 theta[8] 1.00     2348.    1246.

Should we rename `draws_df` to `draws_tibble` (or `draws_tbl`)?

Since we're using tibbles I guess the names draws_tibble, draws_tbl, or draws_tbl_df would be more consistent with the naming convention of draws_{class}. What do you think? Personally I like the name draws_df better (it just looks nicer I think) but perhaps we should change it?

Are these functions being used anywhere?

@paul-buerkner It seems that internal functions like .as_draws_list, .as_draws_array, .as_draws_df, etc. (i.e., the ones with names starting with .), don't seem to be called by any other functions. At least if I search for those function names in the entire package source code the only hits I get are their implementations but nothing else. Are these just older versions that can be safely removed or are they actually used somewhere that I'm just missing?

autocorrelation values, autocorrelation time, Geyer's truncation lag

It would be good to have for diagnostics

  • autocorrelation values up to some lag (to be checked as numbers or as a plot)
  • autocorrelation time (this is related to ESS, but can be useful separately to choose thin value)
  • Geyer's truncation lag (this is a safer option for thin value)
  • lag which has 95% (or user defined) cumulative autocorrelation time (produces smaller but almost as good thin value as Geyer's rule)

And make a thin function using subsetting method.

Representation as json

Im impressed by the progress. Since Im finalizing the posteriordb beta, I realized that it would be good to formalize also how a posterior is represented as json. This would also enable cross-language representations.

Have you thought about this?

I currently represent draws as a named object of numeric arrays. Think a named list of a vector in R, with one vector per parameter.

Example dataset with a matrix and >1 chains

Following on #18, it might be helpful to have an example dataset with a 2-or-more dimensional variable (maybe a covariance matrix) and 2 or more chains for the purposes of examples and testing. Anyone have a good (and small) example like that on hand?

summarize_draws: column name incorrect if only one value supplied to `probs`

This results in the column name .quantile instead of q25:

summarize_draws(example_draws(), probs = 0.25)
# A tibble: 10 x 9
   variable  mean median    sd   mad .quantile  rhat ess_bulk ess_tail
   <chr>    <dbl>  <dbl> <dbl> <dbl>     <dbl> <dbl>    <dbl>    <dbl>
 1 mu        4.56   4.49  3.36  3.45      2.26 1.00      881.     300.
 2 tau       3.85   2.90  3.32  2.65      1.46 0.998     386.     311.
 3 theta[1]  6.57   5.47  6.45  4.92      2.65 1.000     552.     272.
 4 theta[2]  4.74   4.53  4.63  4.14      1.85 1.04      767.     344.
 5 theta[3]  4.22   4.52  5.03  4.63      1.20 1.02      554.     246.
 6 theta[4]  4.79   4.95  4.45  4.65      1.74 0.998     656.     370.
 7 theta[5]  3.75   3.85  4.89  4.25      1.10 1.000     609.     326.
 8 theta[6]  4.28   4.36  4.88  4.65      1.39 0.998     644.     305.
 9 theta[7]  6.53   6.18  5.38  4.52      3.27 0.995     624.     345.
10 theta[8]  5.00   4.52  5.21  4.55      1.81 1.01      618.     332.

summarise_draws errors when called from another package

I noticed this when trying to incorporate posterior into cmdstanr, but here's an example that doesn't require cmdstanr:

# simulate calling posterior from another package
detach(package:posterior) # or just start a new session and don't load the package
f <- function(...) {
  draws <- posterior::example_draws()
  posterior::summarise_draws(draws, ...)
}
Error: value forrhatnot found

@paul-buerkner looking at the code here

https://github.com/jgabry/posterior/blob/c9ec5096474a083664040099e3827360d03c9390/R/summarise_draws.R#L79-L84

maybe this is because the functions like rhat aren't being looked for in the posterior package namespace?

Which diagnostics are included?

At a minimum I’d say

  • Effective sample sizes (new robust version)
  • R-hat (new robust version)
  • Monte Carlo standard errors

Questions:

  • Which other general MCMC diagnostics are worth including?
  • Do we include algorithm-specific diagnostics? For example for hmc and variants.
  • For things like ESS and R-hat that have new versions, do we include the old versions for comparison?

Round summaries according to Monte Carlo standard error

@avehtari and I discussed the option today to add an option to round summaries in summarise_draws according to the corresponding MCSE, to reflect the actual precision of the estimates. This should not be activated by default and we need to think of a good argument name for it.

Choose license

I don't have a strong opinion about it but it may be sensible to not use GPL3 as this causes a lot of trouble in industry (not necessarily for good reasons but still). GPL2 seems fine as does more permissive licenses as used by most tidyverse packages for instance.

Should duplicate variable names be supported?

I am currently implementing rename_variables(). To simplify things I am also implementing variables<-() (in the same way that names<-() and whatnot work). In the process of writing tests for these functions I've noticed that duplicate variable names are not handled consistently across types:

> m = matrix(11:20, ncol = 2, dimnames = list(NULL, c("a", "a")))
> variables(as_draws_matrix(m))
[1] "a" "a"
> variables(as_draws_array(m))
[1] "a" "a"
> variables(as_draws_list(m))
[1] "a"  "V2"
> variables(as_draws_df(m))
[1] "a"  "V2"

Since in theory all of these formats could support duplicate names, I think the only question is if we want to support duplicate names. Natural choices seem to be:

  1. Support duplicate names (and adjust as_draws_list / as_draws_df accordingly)
  2. Make duplicate names throw an error

version numbers for alpha and beta releases?

Now that the repo is public should we start using version numbers and tagging releases? For example, we could do something like 0.0.1 for the alpha release (incrementing the 1 if we make important changes before beta), 0.1.0 for the beta release (same comment about incrementing), and 1.0.0 for the first CRAN release. Thoughts?

Rethinking the package name for a second

I love the name posterior and am strongly in favor of it but I still want to bring up this issue. In the process of the initial discussions, we bascially switched our naming conventions away from posterior_ to draws_ as the draws we deal with may not necessarily come from a posterior distribution. Accordingly, a somewhat more accurate package name would be draws but it is much less catchy and more difficult to remember.

As I said above, I would like to keep the name posterior but make sure we have thought about it at least for a second before openly releasing it.

I would appreciate your thoughts on this! :-)

Vignette(s)

We don’t necessarily need it for the beta release, but just starting an issue to decide what content we should include in a vignette or several. I could imagine one long vignette or perhaps several smaller vignettes focused on separate things e.g. one about the formats and one about summaries and diagnostics, etc.).

Tentative list (add more items by editing this comment):

  • Vignette demonstrating the different draws formats and manipulating them (e.g., subset, mutate, etc.)
  • Vignette demonstrating summaries and diagnostics
  • Vignette explaining how to interpret and use MCSEs (see also #44)

Convert a draws_list to a draws_df

> class(x)
[1] "pdb_gold_standard_draws" "draws_list"              "draws"                  
[4] "list"                   
> posterior::as_draws_df(x)
Error in class(out) <- class_draws_df() : 
  attempt to set an attribute on NULL
>   xdf <- posterior::as_draws_matrix(x)
Error in class(out) <- class_draws_df() : 
  attempt to set an attribute on NULL
> packageVersion("posterior")
[1] ‘0.0.1’

Update README

We should update the README before the beta release. @jgabry do you think this is something you could do? My english writing is still much less clear than yours and @mjskay's.

Bind multiple draws objects

Kind of the inverse operation to subset, we should allow to find draws objects together. This involves (at least) two different kinds of bind operations.

  • bind variables together, that is, extend the variables dimension after maing sure the draws objects have the same number of draws (and iterations and chains). If draws are in the draws_df format this basically corresponds to a cbind operation.
  • bind draws together, that is extend the draws (or chains and iterations) dimension. If draws are in the draws_df format this basically corresponds to a rbind operation.

I am not sure yet about good names for these two functions. Some ad-hoc ones could be bind_* where * could be one of variables, chains, iterations, or draws. What do you think?

Do the different representations need custom classes?

@paul-buerkner and I were discussing this today. it should be possible to do the conversions between representations without using methods by just checking the input objects, but we could use classes and methods if we want. Both have their appeal. We also have to consider the diagnostics and summaries. Without clases and methods we’d end up doing lots of checking of inputs in many different parts of the package, whereas the methods know the types of their inputs already. Anyway, just something to consider and we can proceed with implementing the internals of the conversion, summary, and diagnostic functions without having deciding this.

Which summaries are included?

  • mean, median
  • sd, mad_sd
  • quantiles

Questions:

  • Which other summaries to include?
  • What are the naming conventions?
  • What is (are) the output type(s)

Improve documentation to an acceptable level

This should include at least

  • basic templating of documentation snippets
  • references between documentation pages
  • everything required for R CMD Check to pass without documentation related warnings or errors

Specifically, the following parts need to be documented:

  • convergence diagnostics
  • as_draws_{format} generics/methods
  • summarise_draws generic/methods
  • subset methods
  • repair_draws generic/methods
  • order_draws generic/methods
  • extract_one_variable_matrix generic/methods
  • package doc page in posterior-package.R

summary with one line per _named_ parameter

The discussion in #32 reminded me that one thing @andrewgelman and others have been wanting for a while (and that I think would also be useful as an alternative to the standard summary output) is a summary where each vector/matrix/array parameter only occupies a single line just like scalars. That is, each line corresponds to a named parameter rather than a parameter element.

(This could be its own thing or just an option to a custom print method for our existing summary objects.)

The quantities to display are debatable (e.g. maybe min of all the individual elements’ ESS, max of the Rhats, etc), so it would be good to get input from Andrew and anyone else interested.

Rename and mutate variables

For users it might be convenient to have rename and mutate functionality to make changes to variables similar to how we do it with the related dplyr functions. To avoid function masking, we may want to name them rename_variables and mutate_variables which would also be consistent with our naming conventions.

If we decide to implement such functions, we need to think of a proper backend for non-standard evaluation. We could of course use the one of tidyverse but I am not sure how much additional dependencies this implies. @mjskay you may have more experience with this than I have.

Subsetting draws vs. subsetting iterations

I currently think we should have both a subset_draws and a subset_iterations methods for consistency with our currenty naming conventions #5. Whay I am unsure about is what these methods should do for specific formats.

For instance, how shall we handle subsetting draws in the draws_array format. If we took draws literally there, we would have to insert missing values when the subsetted draws are not symmetric over chains.

For other formats, such as data.frames with .iteration and .draw we can easily subset for both, but when we subset via draws and then try to transform into a draws_array we may end up running to the same problem as above.

Any suggestion on how to handle this?

Subset variables via regular expressions

Not sure what would be the best interface for that in subset but it would definitely be nice to extract variables using regular expressions. We could either have a new argument variable_regex to specifc the regular expression or a flag (called regex for instance) to indicate whether variable should be interepreted as a regular expression. What would you prefer? Or would you prefer yet another approach?

Repairing indices in draws objects

When working with standard indexing operations it is easy to leave the iteration, chain, etc. indices in weird (most likely unwanted) states. For instance,

# example draws
x <- round(rnorm(200), 2)
x <- array(x, dim = c(10, 4, 5))
dimnames(x) <- list(NULL, NULL, paste0("theta", 1:5))

post <- as_draws_array(x)
(post_sub <- post[c(4, 6), , ])

gives us a valid draws object but with two iterations now named 4 and 6 instead of 1 and 2.
What I am proposing is two define a repair_indices function (we may name it differently of course),
that repairs the indices to be continuously numbered again. That is,

repair_indices(post_sub)

gives us iteration indices 1 and 2 again (instead of 4 and 6) as previously. This function should also be used automatically in subset to ensure validitidy and continuous numbering of both input and output. In fact, I would argue, the only index structure that our internal functions should expect and output is the repaired form, but this is my discussion point (1) for which I would like to have your input. I have implemented a prototype of repair_indices for us to play around with.

Discussion point (2) is how to handle broken orderings. For instance

(post_sub <- post[c(6, 4), , ])

would result in the "wrong" ordering ot the two iterations. The question is now how repair_indices (and therefore functions which depend on it) should handle those. Shall we have the mapping 6 = 1, and 4 = 2, that is just take the supplied order (not considering iteration names), or shall we have 4 = 1 and 6 = 2, that is reorder samples according to the iteration names. Here I am less certain about what is preferable. The second one may be very surprising to users as functions may "randomly" change the ordering I guess. Any thoughts from your side?

Is there a reason `.iter` is int but `.draw` is dbl?

See types in the print output below.

> print(as_draws_df(eight_schools))
# A tibble: 400 x 13
   .chain .iteration .draw    mu    tau `theta[1]` `theta[2]` `theta[3]` `theta[4]`
    <int>      <int> <dbl> <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
 1      1          1     1 -4.78  1.62      -1.61       -4.96      -5.41    -2.88  
 2      1          2     2  6.92  3.40      11.0         9.34       8.60     8.37  
...

Is there a reason or just an oversight?

Potential 'draws' name ambiguities

Currently, we use 'draws' for two different things, the object themselves and and draw indices. I think this is acceptable in general and induces not too much confusion. However, some function names are not optimal in that regard. For instance, one would assume that the draws() function, if it exists, creates a draws object, not that it returns the draw indices of an existing object.

Accordingly, I think we should at least rethink how we call the draw index extraction function to avoid any uncessary confusion.

Extractor functions of iterations, chains etc.

I think we need extractor function of iterations, chains, draws, and variables. Two questions:

  1. How do we name them? Simply iterations, chains, draws and variables?

  2. For the former 3, shall we return a vector of indices, e.g., (1, 2, 3, 4) in case of 4 chains, or simply the number of chains, etc.?

Depending on the answer for (2), we may want to go for niterations etc. as names to indicate that is is about the number of iterations not the indices themselves. Or should we even support both?

Adding generics of general Bayesian methods

We discussed at some point, that we want to have a common place where a lot of the generics should life that packages such as rstanarm, brms, etc. use. Right now, a lot of them are in rstantools but I don't think this is a good place for them as the methods are not only relevant from rstan based packages. Some methods are better suited for special packages and as such should life there, for instance, in bayesplot for all plotting related generics and in loo for all cross-validation related procedures.

Please amend this initial comment to add more methods that we may want to put into posterior.

  • posterior_predict compute predictions and return an arrary of draws
  • posterior_linpred compute linear predictor values and return an arrary of draws
  • posterior_epred compute means of the posterior predictive distribution and return an array of draws (see paul-buerkner/brms#644)
  • predictive_error computed differences between observed and predicted responses and return an array of draws
  • posterior_interval basically summarise_draws with just two quantiles
  • predictive_interval basically summarise_draws with just two quantiles after calling posterior_predict

Glossary of nouns and verbs for the package

Pinging off of @jgabry's comment in #1 I thought I'd open an issue to gather thoughts on names of basic concepts in the package. Since it's easy to change names (as long as we do it before an official release) I don't think we should wait on finalizing these before continuing to build stuff, but I wanted to make sure we keep track of what names we need to discuss.

I suggest we edit this first comment to keep track of naming decisions and use discussion on this issue to make those decisions. For now I've filled in what we seem to have so far, not to suggest that these are set in stone at all.

core nouns / concepts

  • chain: chain index
  • iteration: within-chain index
  • draw: unique index across all chains
  • variable: a single variable / parameter / etc
  • draws: a collection of variables, chain and iteration info, and draws

verbs / operations

  • as_draws_{format}(): convert draws to {format}
  • subset_draws(), subset(): select a subset of variables/iterations/chains/draws
  • thin_draws(): thin draws
  • summarise_draws(), summarize_draws(), summary(): compute summary measures for each variable
  • extract_variable_matrix(): select a given variable and return its draws in a iterations x chains format; for use in convergence diagnostics and other summary measures
  • repair_draws(): repair indices (iterations, chains, draws) of draws objects to be consistent after subsetting or related operations
  • order_draws(): (re-)order draws objects after subsetting or related operations
  • bind_draws: bind multiple draws objects together
  • rename_variables(): change names of variables
  • mutate_variables(): transform variables and add them to the draws object

If I've missed anything please edit this issue to add it.

Arrays of draws of arbitrary dimension

In brms, I need draws arrays in various places that hold mulitple dimensions of draws and I feel we should support them natively in posterior. Such arrays should have the following properties:

  • Iterations/draws are always the first dimension
  • All other dimensions can be freely named and otherwise receive default names
  • no chain information is stored unless manually done be the user (in contrast to the current draws_array class, which will remain in the current form)
  • subsetting should not lead to dropping dimensions
  • methods subset, summarise_draws, etc. work canonically on these objects

We should come up with a good name for this class. The question is mostly which of the two array classes should be called draws_array and which should receive another name.

Add a basic set of unit tests

This should at least cover all the transformation and post-processing functions.

  • as_draws_{format} methods
  • summarise_draws and aliases
  • convergence functions
  • extract_one_variable_matrix methods
  • iterations/chains etc. extracting functions
  • rename_variables
  • subset and thin methods
  • repair_draws and order_draws methods

rv-like interface

I wanted to gather conversations about a potential rv-like interface here so as not to derail other conversations (like #4).

I think a solid rv-like interface would be incredibly useful. More specifically, something that:

  • Acts as a high-level interface to "random variables" which can be vectors, matrices, arrays, etc.
  • Allows math operations on those
  • Supports nice syntax for thing like P() and E()

Rv already has those down, but would be even better if it:

  • Used a naming scheme consistent with ours
  • Supported pretty-printing in tibbles
  • Kept chain information

With all of those requirements in place, I could see tidybayes moving largely towards using tables of these rv-like objects. It would be very useful for a lot of the posterior manipulation/summarization/visualization tasks tidybayes is designed for.

If you all are interested in supporting such a format here, then the question is what's the best way to get there? The options might be:

  1. Reach out to the rv maintainer and ask if they are willing to let us take it over and make backwards-incompatible changes to it, followed by a new major release deprecating some stuff.
  2. Bring some of the existing rv code that we want to build on into this package, come up with a new class name (to not clobber "rv") and go from there.
  3. Start from scratch here.
  4. Start from scratch in a different package and add that dependency here.

(1) Could work if the new maintainer is not planning much with the package and if there aren't a lot of users. Currently the package has ~400 downloads/month and no revdeps on CRAN.
(2) Would be doable depending on license preferences (it is GPL-2). This could also be aided by the fact that rv looks to have been written by one of Andrew Gelman's former students, Jouni Kerman (@jgabry do you know him?).

I would be willing to float (1) to the current rv maintainer, Joseph Stachelek --- I've interacted with him once or twice on twitter and github so it wouldn't be a complete cold email (unless either of you know him better). If we'd rather go for (2) or (3) it might be good to reach out to Jouni Kerman to get his thoughts (either on using his code or on things he would have done differently if he wrote the package again).

should we eventually move this repo to stan-dev?

Here are some pros and cons of moving this package eventually from a personal repo to the stan-dev org. I think can think of more pros than cons but let me know if you think of more, in particular cons that I haven't thought of.

Cons:

  • it's not Stan-specific
  • possibly others, but they don't immediately come to mind

Pros:

  • users would be able to ask questions about the package on the Stan forums (this may also be able to be negotiated even without moving to stan-dev, but it woudn't be automatic)
  • more attention from GitHub users and potential contributors if repo is in stan-dev (probably, right?)
  • we are planning to use this together with/in many Stan related packages
  • stan-dev also has packages that can be used with any MCMC software (e.g. loo, bayesplot, even shinystan actually, etc.) so it's not new for stan-dev to host packages that are useful for Stan users and devs and also for non-Stan users and devs

Anyway, no pressure to agree with me on this! And this isn't an urgent question. Just wanted to raise the possibility.

Default print() output

As I've been writing rename_variables() I've found it's a little awkward to work with draws objects when the default print output at the console is typically gigantic. This also makes examples a little verbose, as it feels necessary to call summarise_draws() constantly.

Two thoughts:

  1. Any objections to making the default print() for draws objects call summarise_draws()?
  2. If we agree to do (1), we will now have three ways of getting the same info (print, summary, and summarise_draws). That possibly feels a bit overkill? I can see how they are typically used in different ways, so having them all as aliases is probably fine, but it is worth considering.

ESS bug

> x <- as.data.frame(matrix(rnorm(10000), ncol = 10))
> posterior::ess_bulk(x)
[1] 9007.076
> posterior::ess_tail(x)
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : 
  undefined columns selected

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.