rmcelreath / rethinking Goto Github PK
View Code? Open in Web Editor NEWStatistical Rethinking course and book package
Statistical Rethinking course and book package
In the textbook, p. 115, the summary says
To fit these models to data, the chapter introduced maximum a prior (MAP) estimation.
I assume what is meant and was introduced is maximum a posteriori estimation.
I really like being able to do this:
df1 <- data_frame(x=rnorm(100), z=rnorm(100)) %>% mutate(y=x*z+rnorm(n()))
lm1 <- glimmer(y~x+z, data=df1)
lm1.fit <- map(lm1$f, lm1$d)
sim(lm1.fit, data=expand.grid(x=-1:1, z=-1:1))
But I can't do this
lm2 <- glimmer(y~x*z, data=df1)
lm2.fit <- map(lm2$f, lm2$d)
sim(lm2.fit, data=expand.grid(x=-1:1, z=-1:1))
Error is "Error in eval(expr, envir, enclos) : object 'b_x_X_z' not found"
Is there a way of doing this without manually reconstructing all the interaction terms required via model matrix and regexes for replacing : with X?
Would you please advise how I can address the following installation error?
install.packages(c("coda","mvtnorm","devtools"))
There is a binary version available but the source version is later:
Error in install.packages : 8 arguments passed to .Internal(format) which requires 9
I am running current version of R (3.2.3) on 64-bit Windows 10.
I have installed Rtools33.exe and verified that all is OK in R with the commands:
Sys.getenv("PATH") #Check Rtools installation and path
system('g++ -v')
system('where make')
I have also run the following commands in R:
install.packages("loo") ## per #13
install.packages("rstan", dependencies = TRUE)
library(rstan) # observe startup messages
Any advice will be appreciated.
Red Owl (yes, my real name)
I can't seem to get glimmer
to work when family
is "binomial"
:
glimmer(Species == "virginica" ~ Petal.Width, iris, "binomial")
Error in family$link : $ operator is invalid for atomic vectors
Using the standard glm
, the formula works fine:
glm(Species == "virginica" ~ Petal.Width, "binomial", iris)
For example:
> library(rethinking)
> data(milk)
> d <- milk
> glimmer(kcal.per.g ~ clade, data = d)
Error in parse(text = flist_txt) : <text>:4:20: unexpected symbol
3: mu <- Intercept +
4: b_cladeNew World
^
Possible fix (untested) - in xparse_glimmer_formula, change
fixef <- colnames(model.matrix(f_nobars, data))
to
fixef <- make.names(colnames(model.matrix(f_nobars, data)))
When selecting a subset of parameters for plotting the traces from a map2stan object, the method plot does not correctly select their n_eff.
Consider the model from the example in the help file to map2stan and compare
precis(m2)
precis(m2, pars = c('bp', 'bpc'))
plot(m2, pars = c('bp', 'bpc'))
The plot shows n_eff for the first two parameters instead of the selected ones.
[Package rethinking version 1.59]
So, I'm working with rethinking in a Structural Equation Modeling framework. It works beautifully, but, I'm having some issues in generating posterior predictions. Namely, effects of linked relationships don't seem to translate. Consider the following model
x1 -> y1 -> y2
where each coefficient is a slope of 2.
I can create a nice fake data frame
set.seed(31415)
ndf <- data.frame(x1=runif(100,0,100))
ndf$y1 <- rnorm(100, ndf$x1*2, 40)
ndf$y2 <- rnorm(100, ndf$y1*2, 80)
And then a model that fits nicely using map2stan
nf <- alist(
y1 ~ dnorm(yhat1, sigma1),
y2 ~ dnorm(yhat2, sigma2),
yhat1 <- b1*x1,
yhat2 <- b2*y1,
b1 ~ dnorm( 0 , 1000 ),
b2 ~ dnorm( 0 , 1000 ),
sigma1 ~ dcauchy( 0 , 1 ),
sigma2 ~ dcauchy( 0 , 1 )
)
n.fit <- map2stan(
nf ,
data=ndf ,
start=list(a1=1, a2=1, b1=1, b2=1, sigma1=10, sigma2=10)
)
here
> coef(n.fit)
b1 b2 sigma1 sigma2
1.930119 1.965780 37.934121 86.538586
Great! But, when I try and do posterior prediction with x starting as 3, which should produce roughly 12 for y2...
> link(n.fit, data=list(x1=3)) -> a
[ 1000 / 1000 ] 200 / 1000 ]
> sapply(a, mean)
yhat1 yhat2
5.790357 1.965780
`
Huh....why is yhat incorrect here?
Thank you for this great textbook. This book is truly what I have wished to read for a long time.
I tried a compare function in an overthinking box, on page 310, but an error message was displayed like this.
compare(m10.8,m10.9,func=DIC)
Error in newvec[j] <- pars_orig[[name_orig]] :
replacement has length zero
Without the func argument, this code worked.
In version 1.58, using a caret in a linear model like this:
library(rethinking)
data(cars)
m <- map2stan(
alist(
y ~ dnorm(mu,sigma),
mu <- a*x^b,
a ~ dnorm(0,10),
b ~ dnorm(0,1),
sigma ~ dexp(1)
),
data=list(y=cars$dist,x=cars$speed) )
produces an odd error, telling the user that the variable x cannot be found.
This bug is fixed in the dev branch and so will be gone in 1.59. In the meantime, a work-around is to put parentheses around the variable name:
m <- map2stan(
alist(
y ~ dnorm(mu,sigma),
mu <- a*(x)^b,
a ~ dnorm(0,10),
b ~ dnorm(0,1),
sigma ~ dexp(1)
),
data=list(y=cars$dist,x=cars$speed) )
Hi Richard!
Congratulations for your new book and your YouTube videos. I find them amazing, I'm learning a lot with them. Today I was replicating some of your examples and I realised that the results obtained with map()
are quite different from those estimated with map2stan()
. This gives the right answer:
m4.3 <- map(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*weight,
a ~ dnorm( 156 , 100 ) ,
b ~ dnorm( 0 , 10 ) ,
sigma ~ dunif( 0 , 50 )
) ,
data=d2 )
But this one does not:
m4.3 <- map2stan(
alist(
height ~ dnorm( mu , sigma ) ,
mu <- a + b*weight,
a ~ dnorm( 156 , 100 ) ,
b ~ dnorm( 0 , 10 ) ,
sigma ~ dunif( 0 , 50 )
) ,
data=d2 )
Any leads? Thanks a lot! :)
Hi Richard,
I'm hoping you could help me look under the hood on how your package plots the traceplots of map2stan objects. Specifically, is there a way to pull out the plots for just some of the parameters (my current models have dozens and I don't always want to look at all of them). Also, can I adjust the device parameters (e.g., ask or mfrow)?
Thanks for the help
after successfully fitting the following model:
m3 <- map2stan(
alist(
s ~ student_t( v+1, mu, sigma ),
mu <- k1*v1 + k2*v2
k1 ~ dnorm(0,100),
k2 ~ dnorm(0,100),
sigma ~ dcauchy(0,0.5),
v ~ dexp(30)
),
data=db,
constraints=list(sigma="lower=0", v="lower=0"),
chains=12,
cores=12,
WAIC=FALSE )
A following call to WAIC will produce the following error:
WAIC(m3)
Constructing posterior predictions
[ 12000 / 12000 ] 2400 / 12000 ]
Error in do.call(dname, args = args_list, envir = e1) :
could not find function "dstudent"
Hi,
is there any way to pass a function defining a complex non-linear data model
through map2stan ???
My hand built .stan file would look like this:
functions {
vector viscosity(vector x, real eps, real sig) {
... function body
}
}
model {
y ~ normal(viscosity(x,eps,sig), sigma);
}
First of all, many thanks for this great book and package!
When playing with the examples in the documentation of map2stan
, I noticed that using dmvnormNC
leads to saving parameter samples of z_N_<group>
and L_Rho_<group>
. The former samples are irrelevant because they are just normal(0,1)
and the latter are redundant (in my opinion) as Rho_<group>
is also saved. Accordingly, I feel that it would make sense to exclude them via the pars
argument of stan
. What do you think?
Is there a way to make map2stan re-use a compiled model with new data?
I know it's possible to extract the stanfit object and use that, but I'm running a simulation study and would like to be able to re-use the map2stan object to fit new data and then use post() simulate data from these fitted values.
I am trying to build on the stancode
generated from the model in code block 13.23 by adding a predictor to the random intercept (more generally, I want to code cross-level interactions) but I can't figure out the syntax in a model with MVN priors. I'm probably missing something obvious, and this may not be the place for what is essentially a Stan (not map2stan
) question, but would appreciate suggestions or example code.
To clarify further: In a simpler model--say, one with a variable intercept and fixed effects coefficients--the relevant Stan code would be:
`....
transformed parameters {
vector[J] eb0;
for (j in 1:J)
eb0[j] = z[j] * g0 ;
}
model {
g0 ~ normal(0, 5);
sigma_b0 ~ cauchy(0, 2);
for (j in 1:J)
b0[j] ~ normal(eb0[j], sigma_b0);
for (n in 1:N)
y[n] ~ normal(b0[cid[n]] + x[n] * beta, sigma_y[cid[n]]);
}
`
where z is a matrix of predictors with 1 in the first column and g is a vector of coefficients. There is no obvious way to plug this into my version of code block 13.23:
`data {
int<lower=0> N; // num country-years
int<lower=1> J; // num countries
real y[N]; // outcome
real dincr[N];
real gdpgr[N];
int<lower=1, upper=17> cid[N]; // country
}
parameters {
matrix[3, J] z_mat;
cholesky_factor_corr[3] L_Rho;
real g1;
real g2;
real g3;
vector<lower=0.0001>[3] sigma_b;
real<lower=0.0001> sigma_y;
}
transformed parameters{
matrix[J, 3] b_mat;
vector[J] b1;
vector[J] b2;
vector[J] b3;
vector[J] eb1;
matrix[3, 3] Rho;
b_mat = (diag_pre_multiply(sigma_b, L_Rho) * z_mat)';
b1 = col(b_mat, 1);
b2 = col(b_mat, 2);
b3 = col(b_mat, 3);
Rho = L_Rho * L_Rho' ;
}
model {
vector[N] mu;
vector[N] B1;
vector[N] B2;
vector[N] B3;
L_Rho ~ lkj_corr_cholesky(2);
sigma_y ~ cauchy(0, 2);
sigma_b ~ cauchy(0, 2);
g1 ~ normal(4.8, 10);
g2 ~ normal(0, 5);
g3 ~ normal(0, 5);
to_vector(z_mat) ~ normal(0, 1);
for (i in 1:N){
B1[i] = g1 + b1[cid[i]];
}
for (i in 1:N){
B2[i] = g2 + b2[cid[i]];
}
for (i in 1:N){
B3[i] = g3 + b3[cid[i]];
}
for (i in 1:N){
mu[i] = B1[i] + B2[i] * dincr[i] + B3[i] * gdpgr[i];
}
y ~ normal(mu, sigma_y);
}`
Thanks for any attention to this.
This probably not a real problem, but I thought I'd put it on your radar.
The following warning occurs in the midst of a bunch of messages about Creating a generic function for ‘x’ from package ‘y’ in package ‘rethinking’
:
Warning: undefined slot classes in definition of "map2stan": stanfit(class "stanfit")
Setting rng_seed
does not appear to ensure repeatable analyses with map2stan()
. For example, a simple model from Rethinking:
library(rethinking)
data("Howell1")
d <- Howell1
d2 <- d[ d$age >= 18 , ]
m1 <- map2stan(
alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50)),
data = d2,
WAIC = FALSE,
seed = 8
)
m2 <- map2stan(
alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50)),
data = d2,
WAIC = FALSE,
seed = 8
)
m1
m2
Produce slightly different results.
> m1
map2stan model fit
1000 samples from 1 chain
Formula:
height ~ dnorm(mu, sigma)
mu ~ dnorm(178, 20)
sigma ~ dunif(0, 50)
Log-likelihood at expected values: -1219.41
Deviance: 2438.83
DIC: 2442.6
Effective number of parameters (pD): 1.89
> m2
map2stan model fit
1000 samples from 1 chain
Formula:
height ~ dnorm(mu, sigma)
mu ~ dnorm(178, 20)
sigma ~ dunif(0, 50)
Log-likelihood at expected values: -1219.41
Deviance: 2438.82
DIC: 2442.72
Effective number of parameters (pD): 1.95
all.equal(m1, m2)
show lots of differences, which seem to be fundamental rather than related to sampling (e.g., timing).:
[1] "Attributes: < Component “coef”: Mean relative difference: 0.0001766267 >"
[2] "Attributes: < Component “deviance”: Mean relative difference: 3.669022e-06 >"
[3] "Attributes: < Component “DIC”: Mean relative difference: 6.030028e-05 >"
[4] "Attributes: < Component “pD”: Mean relative difference: 0.03561425 >"
...
[54] "Attributes: < Component “start”: Component 1: Component 1: Mean relative difference: 0.2771743 >"
[55] "Attributes: < Component “start”: Component 1: Component 2: Mean relative difference: 0.1926305 >"
[56] "Attributes: < Component “vcov”: Mean relative difference: 0.03782353 >"
These differences are admittedly small, but my thinking is that the models should be identical except for timing differences (the case when using seed =
is a pure stan()
call).
plot(m8.1stan)
Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'
Similar to the bug with plot precise, I am getting the same error with trace plots. I've reinstalled everything, but R itself. Any assistance would be greatly appreciated. This is Chapter 8 R code 8.12, I've followed the book coding exactly, yet I'm still getting this error. If I plot with:
plot(post$a,type="l")
it works, but it still isn't what I was hoping for since it lacks information.
Thanks.
Hi Richard, hope you're having a great time in Germany.
My understanding of the LKJ correlation matrix was that the shape parameter should have a consistent interpretation (e.g. eta==1 implies uniform) regardless of the number of dimensions. In version 1.58 of the rethinking package, I'm not finding that to be the case.
If I run the example with K==2 and eta==1, I get a uniform distribution, as expected. But if I change K to 20, I get a bell-shaped curve with SD≈0.2.
library(rethinking)
R <- rlkjcorr(1e4,K=20,eta=1)
dens( R[,1,2] )
Hi Richard, I could be wrong, but it seems that precis()
output is an S4 object that doesn't inherit the data.frame class.
> myPrecis = precis(myModel)
> class(myPrecis)
[1] "precis"
attr(,"package")
[1] "rethinking"
> as.data.frame(myPrecis)
Error in as.data.frame.default(myPrecis) :
cannot coerce class "structure("precis", package = "rethinking")" to a data.frame
First of all, thank you very much for your extremely well written and insightful book and the accompanying software package, which has already changed my 'statistical life'.
I was trying to use map2stan for solving a multilevel CFA, and in the course of that endeavour, came across a problem which I will try to demonstrate with the following minimal working example (the original CFA model is no longer recognizable from this, of course):
library(rethinking)
## package rethinking_1.59, installed from github
d <- data.frame( ## Never mind, just a small data set …
x = c(1, 2, 3, 4),
y = c(3, 4, 1, 2),
z = c(1, 3, 2, 4),
grp = c(1L, 1L, 2L, 2L) ## … with a group indicator
)
model <- alist(
y ~ dnorm(mu_y, sd_y),
z ~ dnorm(mu_z, sd_z),
mu_y <- b_y * x + a[grp],
mu_z <- b_z * x + a[grp], ## Probably the rest is irrelevant, but for completeness …
a[grp] ~ dnorm(mu_a, sd_a),
c(b_y, b_z) ~ dnorm(0, 100),
c(mu_y, mu_z, mu_a) ~ dnorm(0, 100),
c(sd_y, sd_z, sd_a) ~ dcauchy(0, 1)
)
model0 <- map2stan(model, data = d, sample = FALSE)
stancode(model0)
## (output omitted)
It seems like in the data declaration of the group index, int grp[N_z];
, a wrong symbol, N_z
, is given for the array size, which probably should be N_grp
, declared in the stancode. This is not really a problem unless z
contains missing values, because in that case, N_z
will no longer be declared in the stancode and that code will not run any more:
d2 <- d
d2$z[1] <- NA
model2 <- map2stan(model, data = d2, sample = FALSE)
stancode(model2)
## data{
## int<lower=1> N;
## int<lower=1> N_z_missing;
## int<lower=1> N_grp;
## int<lower=1> z_missingness;
## real y[N];
## real z[N];
## real x[N];
## int grp[N_z];
## }
## (further output omitted)
I tried to find my way around the source code of map2stan, but don't really feel up to fixing that. Thanks for looking into this issue. If I can be of any further help, please let me know.
I'd like to compute estimates of LOO and K-fold cross validation, as described here:
http://www.stat.columbia.edu/~gelman/research/unpublished/waic_stan.pdf
Is there any way to add to the generated quantities block of the stan code maptostan emits to enable one to capture the log likelihood as Gelman et al describe? e.g.
generated quantities {
vector[N] log_lik;
for (n in 1:N){
log_lik[n] <- normal_log(y[n], X[n]*b, sigma);
}
}
I realise it's possible to use maptostan first and then extract and edit the code, but this is somewhat of a fiddle. More generally, I wonder if it's possible to include the LL quantity always to enable map2stan fits to be compatible with this LOO package: https://github.com/stan-dev/loo
The documentation gives the default value of prob=0.95 but the code uses prob=0.89.
The prediction functions seem to require the variables in data
to be provided in the order they appear in the model formula. For example:
data(nettle)
n = nettle
n$loglpc = log10(n$num.lang / n$k.pop)
n$logArea = log10(n$area)
m = lm(loglpc ~ mean.growing.season * sd.growing.season + log10(area), data = n)
dpred = data.frame(mean.growing.season = 11,
sd.growing.season = seq(0, 6, len = 100),
logArea = mean(n$logArea))
l = link(m, dpred)
summary(apply(l, 2, mean))
dpred = dpred[, 3:1]
l = link(m, dpred)
summary(apply(l, 2, mean))
After the rstan 2.14.1 (it might have been 2.14, but at least by 2.14.1), running a second map2stan() model in the same R session causes a segfault in my setup. Example:
library(rethinking)
data(chimpanzees)
# don't want any variables with NAs
d <- list(
pulled_left = chimpanzees$pulled_left ,
prosoc_left = chimpanzees$prosoc_left ,
condition = chimpanzees$condition ,
actor = as.integer( chimpanzees$actor ) ,
blockid = as.integer( chimpanzees$block )
)
# RStan fit
m2 <- map2stan(
alist(
pulled_left ~ dbinom(1,theta),
logit(theta) <- a + bp*prosoc_left + bpc*condition*prosoc_left ,
a ~ dnorm(0,10),
bp ~ dnorm(0,10),
bpc ~ dnorm(0,10)
) ,
data=d, chains=1, cores=1 )
m2_2 <- map2stan(
alist(
pulled_left ~ dbinom(1,theta),
logit(theta) <- a + bp*prosoc_left + bpc*condition*prosoc_left ,
a ~ dnorm(0,10),
bp ~ dnorm(0,10),
bpc ~ dnorm(0,10)
) ,
data=d, chains=1, cores=1 )
Gives:
*** caught segfault ***
address 0x1c, cause 'memory not mapped'
Traceback:
1: FUN(X[[i]], ...)
2: lapply(name, function(id) { v <- .Internal(getSymbolInfo(as.character(id), PACKAGE, as.logical(withRegistrationInfo))) ...
3: getNativeSymbolInfo(name, PACKAGE)
4: doTryCatch(return(expr), name, parentenv, handler)
5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
6: tryCatchList(expr, classes, parentenv, handlers)
7: tryCatch(getNativeSymbolInfo(name, PACKAGE), error = function(e) e)
8: Module(module, mustStart = TRUE)
9: .getModulePointer(x)
10: <S4 object of class "Module">$stan_fit4model11743559f5a88_pulled_left___dbinom_1__theta_
11: eval(expr, envir, enclos)
12: eval(call("$", mod, paste("stan_fit4", model_cppname, sep = "")))
13: object@mk_cppmodule(object)
14: .local(object, ...)
15: sampling(sm, data, pars, chains, iter, warmup, thin, seed, init, check_data = TRUE, sample_file = sample_file, diagnostic_file = diagnostic_file, ...
16: sampling(sm, data, pars, chains, iter, warmup, thin, seed, init, check_data = TRUE, sample_file = sample_file, diagnostic_file = diagnostic_file, ...
17: stan(model_code = model_code, model_name = modname, data = d, init = initlist, iter = iter, warmup = warmup, chains = chains, ...
18: doTryCatch(return(expr), name, parentenv, handler)
19: tryCatchOne(expr, names, parentenv, handlers[[1L]])
20: tryCatchList(expr, classes, parentenv, handlers)
21: tryCatch(expr, error = function(e) { call <- conditionCall(e) ...
22: try(stan(model_code = model_code, model_name = modname, data = d, init = initlist, iter = iter, warmup = warmup, chains = chains, ...
23: map2stan(alist(pulled_left ~ dbinom(1, theta), logit(theta) <- a + bp * prosoc_left + bpc * condition * prosoc_left, a ~ dnorm(0, ...
I am running R 3.3.2 Patched and reinstalled rethinking and all dependencies from source. I'm not sure if this is reproducible by anyone else.
My sessionInfo():
R version 3.3.2 Patched (2017-01-03 r71888)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rethinking_1.59 rstan_2.14.1 StanHeaders_2.14.0 ggplot2_2.2.1 fortunes_1.5-4 devtools_1.12.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 MASS_7.3-45 munsell_0.4.3 colorspace_1.3-2 lattice_0.20-34 plyr_1.8.4
[7] tools_3.3.2 grid_3.3.2 gtable_0.2.0 loo_1.0.0 coda_0.19-1 withr_1.0.2
[13] matrixStats_0.51.0 lazyeval_0.2.0 assertthat_0.1 digest_0.6.11 tibble_1.2 gridExtra_2.2.1
[19] codetools_0.2-15 memoise_1.0.0 inline_0.3.14 scales_0.4.1 stats4_3.3.2 mvtnorm_1.0-5
Hi,
thanks for this package, which seems really great. I also enjoyed the book a lot.
I wondered if there is a way in map2stan
to use matrix notation, as is possible in stan. Suppose for example I have a matrix of covariates X
and a vector of parameters B
, such that y ~ XB
, is there anyway to indicate it in map2stan?
Thanks!
This is mostly an issue for the textbook chapter 4.
In some cases the model definitions given in the book don't match with the corresponding R code. Furthermore the book references code which doesn't appear.
Examples:
start
list for the model but the R code doesn't contain such a list.I've been working with a collaborator who has a model build in STAN, but we'd like to translate it to rethinking as it's so much more accessible to the rest of our collaborators on the project. An odd feature of the model is that each Study (this is a meta-analysis - although I can think of other models this should be true for) has it's own residual error. So, in our model, our likelihood function is
y ~ lognormal(y_loc, sd_e[Study])
for which our prior would be stated as
sd_e[Study] ~ dcauchy(0,1) #prior for study level errors
However, before map2stan will run the model, I get the error
Error in constr_list[[sigma_name]] : no such index at level 1
This wasn't super helpful in debugging, but I eventually tracked it down to the sd_e parameter, as when I took [Study] out of the residual, the model fit just fine. I'm thinking that either this is the wrong way to use dcauchy or there is something wonkier here in terms of adding having multiple residual errors. But given that normal distributions work just fine with a group structure, I cannot think of why this should fail. Is there something in how the model is translated to STAN code?
The following test model compiles correctly with uniform(0,1) but fails to compile with uniform(-1,1).
Stan doesn't have a problem with uniform(-1,1) so it's something with map2stan I believe.
m2 <- map2stan(
alist(
response ~ dnorm(m, 10),
m <- k * v1,
k ~ uniform(-1,1)
) ,
data = db,
)
Dear Richard and others,
I have installed rstan, but when I try to install rethinking, I get the following error:
ERROR: dependency 'loo' is not available for package 'rethinking'
removing 'C:/Users/drlabuser/Documents/R/win-library/3.1/rethinking'
Error: Command failed (1)
Any suggestions welcome!
Thank you,
Colleen
A few people have reported this error:
> plot(precis(m5.3))
Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'
It appears to be related to precis methods not exporting right. But I have been unable to replicate the error. It's on my radar. In the meantime, accessing the plot method directly with precis.plot() (or precis_plot()
) does work.
Hello Richard.
Great package, great book, and great lectures! Thank you.
I am wondering if there is a way of using map2stan for multivariate outcomes. You mention it briefly in one your lectures, but I haven't seen any code for it on the web or book.
I want to quantify the effect of disease on a bird species. I think it might affect weight, and skeleton size. So, we have paired measures of weight and skeleton size (which are related as you might imagine), and we have a predictor variable that is healthy/sick. At some point, I want scale up to multiple species analysis, and it would be awesome to get some pooling going at different levels.
I was hoping something like this might work:
alist(
c( weight, skeleton ) ~ dmvnorm2( c(mu_w, mu_s), sigma, Rho ),
c( mu_w, mu_s ) <- c(alpha_w, alpha_s)[ status_id ],
c( alpha_w, alpha_s )[ status_id ] ~ dmvnorm2( c(mu_wa, mu_sa), sigma_a, Rho_a),
mu_wa ~ dnorm( 0, 50 ),
mu_sa ~ dnorm( 0, 5 ),
sigma ~ dcauchy( 0, 2 ),
sigma_a ~ dcauchy( 0, 2 ),
Rho ~ dlkjcorr( 2 ),
Rho_a ~ dlkjcorr( 2 )
)
This does not seem work.
At first I got an error about not knowing the dimensions of Rho, and suggesting giving it a starting value. So, I gave map2stan this: start = list( Rho = diag(2))
. But, that was to no avail.
Do I have the syntax completely wrong, or the model, or both, or am I asking too much of the map2stan AI? All of the above might be an acceptable answer.
Thank you again.
Anders.
Hello,
Is there a way to access the course material (http://xcelab.net/rm/statistical-rethinking/)?
Thank You.
Not sure if this is intentional.
Hi Richard.
When using Hadley Wickam's tidyverse
tools, data.frame
s become tibble
s, which return a character vector when put through class
:
c("rowwise_df","tbl_df","tbl","data.frame")
This causes map
and map2stan
to fail the check whether the data are list
or data.frame
(lines 39 in map.R
and 74 in map2stan.r
).
I wondering if it would be possible to wrap the test in a function call to any
, so that it would look like this:
!any((class(data) %in% c("list", "data.frame")))
This would lead to returning a single TRUE
when checking with a tibble
, rather than a bool vector, which then R
defaults to the first result on the vector when testing the condition in the if
clause.
Thank you.
Anders.
When you train a model with map2stan
with a non-specified n
and then use link
or extract.samples
with a large n
, the results are nonobvious: extract.samples
(used by link
) returns a set of size n
but only filled up to the default map2stan
-size.
Example code:
Train model:
m <- map2stan (
formula,
data=ddd)
summary(m)
Relevant output:
Inference for Stan model: totalScore ~ dbinom(40, p).
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
Subsequent call to link
with the same dataset as example but now n=1010
:
posterior <- link(m, data=ddd, n=1010)
Result of 5th row:
posterior[,5]
[1] 0.4477779 0.4661901 0.5257805 0.4876305 0.4294614 0.4631741 0.5056220 0.5049068 0.4612833 0.4821353 0.4681812 0.4635064 0.5055391 0.5054395
...
[995] 0.5142565 0.4737347 0.4642776 0.4575579 0.4403849 0.4595757 NA NA NA NA NA NA NA NA
[1009] NA NA
The last 10 samples are set to NA, due to link
depending on extract.samples
which silently ignores its n
-argument when used in combination with map2stan
.
Perhaps this should be relayed back to the user (ie. a warning message about n
to be curbed to the map2stan
n-value).
arrowspline
in plotting.r
looks generally useful, but is undocumented. Same with segmentsby
and a few others.
The Stan reference manual recommends reparameterizing the Cauchy distribution in terms of the uniform distribution and a tangent function (manual v2.11.0 p225). In the example in the manual, this
beta ~ cauchy(mu, tau);
becomes
beta_unif ~ uniform(-pi()/2, pi()/2);
beta = mu + tau * tan(beta_unif);
Writing this directly as part of the model specification for map2stan doesn't work, as it complains that tan isn't a known distribution. In my case mu=0
and tau=1
, so we can rewrite the second line as:
atan(beta) = beta_unif
However, now the problem is that atan
isn't a defined link function in this package. Would that be difficult to add? I'm happy to attempt the code modification myself if you think it would be straightforward.
I ran into this error message today when playing with map
on the foxes
data, and it took me a moment to figure out what was going on:
Error in solve.default(fit$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0`
I initially looked for multicollinearity based on the error message, but that turned out not to be the problem.
My code is below. The problem is that I forgot to include dnorm
and just gave it a formula for the distribution's mean.
Since this will probably be a common form of user error, it would be nice if map
had a more informative error message that triggered some time before solve
got called. I unfortunately don't know enough about working with expressions to know exactly how one would go about defining the conditions for the error to trigger, but I thought I'd put it on your radar.
library(rethinking)
data("foxes")
head(foxes)
map(
list(weight ~ a + b_f * avgfood + b_s * groupsize),
data = foxes,
start = list(a = mean(foxes$weight), b_f = 0, b_s = 0)
)
When I run model 13.12 on the simulated data, I get the following error:
Error in map2stan(alist(wait ~ dnorm(mu, Sigma), mu <- a_cafe[cafe] + :
Rho : no dimension found for this matrix. Use explicit start value to define its dimension.
For example: Rho = diag(2)
All previous code is exactly per the book. In particular,
Rho <- matrix(c(1,rho,rho,1), nrow = 2) # correlation matrix
Is there a way of specifying priors parameters from code/file? For example, in the code below, rather than writing dnorm(0,100) we may want to specify the prior mean and variance as a result of a calculation. Is there a way of feeding such computed prior parameters to the model?
`m3 <- map2stan(
alist(
s ~ student_t( v+1, mu, sigma ),
mu <- k1*v1 + k2*v2
k1 ~ dnorm(0,100),
k2 ~ dnorm(0,100),
sigma ~ dcauchy(0,0.5),
v ~ dexp(30)
),
data=db,
constraints=list(sigma="lower=0", v="lower=0"),
chains=12,
cores=12,
WAIC=FALSE )`
Attempting to recreate the following examples
System:
RStudio 99.902, R version 3.2.1 for Mac
Description:
Loading required package: rstan
Loading required package: ggplot2
Loading required package: StanHeaders
rstan (Version 2.10.1, packaged: 2016-06-24 13:22:16 UTC, GitRev: 85f7a56811da)
For execution on a local, multicore CPU with excess RAM we recommend calling
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Loading required package: parallel
rethinking (Version 1.59)
Warning messages:
1: package ‘rstan’ was built under R version 3.2.5
2: package ‘ggplot2’ was built under R version 3.2.4
3: package ‘StanHeaders’ was built under R version 3.2.5
Found an odd (and new) bug with using imputation in a varying effects model. I have an easy fix in local repository, but want to change algorithm a little to provide a better fix.
So fix pending in next release, which should be 1.54.
Hi!
I've been running the code in chapter 13 of your book and currently the code does not extract out the correct Rho value after fitting the model.
The code in question is the following:
## R code 13.1
a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (-0.7) # correlation between intercepts and slopes
## R code 13.2
Mu <- c( a , b )
## R code 13.3
cov_ab <- sigma_a*sigma_b*rho
Sigma <- matrix( c(sigma_a^2,cov_ab,cov_ab,sigma_b^2) , ncol=2 )
## R code 13.4
matrix( c(1,2,3,4) , nrow=2 , ncol=2 )
## R code 13.5
sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix
# now matrix multiply to get covariance matrix
Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)
## R code 13.6
N_cafes <- 20
## R code 13.7
library(MASS)
set.seed(5) # used to replicate example
vary_effects <- mvrnorm( N_cafes , Mu , Sigma )
## R code 13.8
a_cafe <- vary_effects[,1]
b_cafe <- vary_effects[,2]
## R code 13.9
plot( a_cafe , b_cafe , col=rangi2 ,
xlab="intercepts (a_cafe)" , ylab="slopes (b_cafe)" )
# overlay population distribution
library(ellipse)
for ( l in c(0.1,0.3,0.5,0.8,0.99) )
lines(ellipse(Sigma,centre=Mu,level=l),col=col.alpha("black",0.2))
## R code 13.10
N_visits <- 10
afternoon <- rep(0:1,N_visits*N_cafes/2)
cafe_id <- rep( 1:N_cafes , each=N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]*afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm( N_visits*N_cafes , mu , sigma )
d <- data.frame( cafe=cafe_id , afternoon=afternoon , wait=wait )
## R code 13.11
R <- rlkjcorr( 1e4 , K=2 , eta=2 )
dens( R[,1,2] , xlab="correlation" )
## R code 13.12
m13.1 <- map2stan(
alist(
wait ~ dnorm( mu , sigma ),
mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon,
c(a_cafe,b_cafe)[cafe] ~ dmvnorm2(c(a,b),sigma_cafe,Rho),
a ~ dnorm(0,10),
b ~ dnorm(0,10),
sigma_cafe ~ dcauchy(0,2),
sigma ~ dcauchy(0,2),
Rho ~ dlkjcorr(2)
) ,
data=d ,
iter=5000 , warmup=2000 , chains=2 )
## R code 13.13
post <- extract.samples(m13.1)
dens( post$Rho[,1,2] )
You do not get a plot similar to that in Figure 13.4 when running your code.
You instead get the following:
If you run 1000000 reps with 4000 warmup you get the following, which doesn't include the big spike on zero:
Either way, it is centered at -0.19, not at -0.7 at it should be by design.
There is also a few errors that pop up on each chain, besides the one addressed by the overthinking box in the chapter:
[1] "The following numerical problems occured the indicated number of times after warmup on chain 2"
count
validate transformed params: SRS_sigma_cafeRho is not symmetric. SRS_sigma_cafeRho[1,2] = inf, but S 1
[1] "When a numerical problem occurs, the Metropolis proposal gets rejected."
[1] "However, by design Metropolis proposals sometimes get rejected even when there are no numerical problems."
[1] "Thus, if the number in the 'count' column is small, do not ask about this message on stan-users."
Anyway, I am not sure what is going on, but from the looks of things, the sampling seems to be getting stuck on 0 in covariance estimation, any ideas on how to fix this?
Thanks!
For example Is there a way to extend the following example given at page 337 in the book to the multilevel case?
m11.1stan <- map2stan(
alist(
response ~ dordlogit( phi , cutpoints ),
phi <- 0,
cutpoints ~ dnorm(0,10)
) ,
data=list(response=d$response),
start=list(cutpoints=c(-2,-1,0,1,2,2.5)) ,
chains=2 , cores=2 )
Let's suppose we would like to estimate the 'cutpoints' for different education groups in a multilevel fashion. It would seem that we need to impose a hierarchical structure to 'cutpoints' .How could this be achieved within the Rethinking package? I have tried different options but none compiled for different reasons.
Thanks,
Antonio
After sampling for all chains has finished, it prints out something like the following:
Chain 1, Iteration: 1 / 1 [100%] (Sampling)#
# Elapsed Time: 2e-06 seconds (Warm-up)
# 0.000223 seconds (Sampling)
# 0.000225 seconds (Total)
#
I then get a warning about a single divergent transition
There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.X may help.
where 0.X is some adapt_delta setting (changing it hasn't helped).
This seems to occur no matter what model I'm fitting, including very simple models like the following:
fakeData <- data.frame(y=rnorm(100,10,5))
test <- map2stan(alist(y ~ dnorm(mu,sigma), mu~dnorm(0,10), sigma~dunif(1,20)), data=fakeData)
Using rethinking
v. 1.58. This has occurred with multiple versions of R, running on multiple versions of Ubuntu, and rstan 2.8 and 2.9
From the documentation, it is not clear to me how map() will form it's function calls to the likelihood function, in a way that makes it straightforward to write a custom function. Could you clarify this in the documentation?
It seems that map() expects the likelihood function to return log(p) values, and that it passes the option "log=true", and that it also passes the data itself as "x=". I can't find that this is described anywhere. My apologies if this is already explained somewhere, but I missed it.
It seems that perhaps this is standardized in R itself, but I wasn't able to find any official probability distribution API documentation, except this PDF on the cran list of distributions:
http://www.rmetrics.org/Meielisalp2009/Presentations/Scott.pdf
https://cran.r-project.org/web/views/Distributions.html
Some published likelihood functions I have seen don't seem to follow these guidelines, for example the Wallenius noncentral hypergeometric distribution provided by the biasedUrn package.
Hi,
Thank you so much writing an amazing textbook. I really enjoyed it a lot and also really liked the rethinking
package. It made bayesian analysis enjoyable thing to do.
While using rethinking::rgampois
, I noticed some inconsistency.
Here's my small experiment that illustrates the point:
# this is the function definition of rethinking::rgampois
rgampois_rethinking <- function (n, mu, scale) {
shape <- mu/scale
prob <- 1/(1 + scale)
rnbinom(n, size = shape, prob = prob)
}
# this is what I propose
rgampois_proposed <- function(n, mu, scale) {
prob <- scale/(scale + mu)
# this definition is directly copied off of the documentation of rnbionom :
# "...An alternative parametrization (often used in ecology) is
# by the mean mu (see above), and size, the dispersion parameter,
# where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization...."
rnbinom(n = n, size = scale, prob = prob)
}
library(tidyverse)
library(rethinking)
b0_truth <- 3
phi_truth <- 10
data_proposed <- rgampois_proposed(100,exp(b0_truth), phi_truth)
data_rethinking <- rgampois_rethinking(100, exp(b0_truth), phi_truth)
We can see from the bottom plot that the samples are very different from each other
list(data_proposed=data_proposed, data_rethinking=data_rethinking) %>%
bind_rows() %>%
gather(type, data) %>%
ggplot(aes(x=data, fill=type)) +
geom_density(alpha=0.3)
m2s_proposed <- map2stan(
alist(
y ~ dgampois(mu,phi),
log(mu) ~ b0,
b0 ~ dnorm(0,1),
phi ~ dcauchy(0,1)
),
data=data.frame(y=data_proposed)
)
m2s_rethinking <- map2stan(
alist(
y ~ dgampois(mu,phi),
log(mu) ~ b0,
b0 ~ dnorm(0,1),
phi ~ dcauchy(0,1)
),
data=data.frame(y=data_rethinking)
)
precis(m2s_proposed)
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## b0 3.03 0.05 2.97 3.11 734 1
## phi 8.27 1.91 5.44 11.07 753 1
precis(m2s_rethinking)
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## b0 3.05 0.09 2.92 3.18 719 1
## phi 1.78 0.28 1.36 2.21 616 1
As you can see, the estimate of phi
is more or less closer to the correct phi_truth=10 in m2s_proposed
but way off in m2s_rethinking
.
Is it possible to convert map2stan model fit objects to coda objects, as it is for standard stan objects (e.g. http://jeromyanglim.tumblr.com/post/91434443911/how-to-convert-a-stan-fit-object-to-work-with-coda)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.