suyusung / arm Goto Github PK
View Code? Open in Web Editor NEWData Analysis Using Regression and Multilevel/Hierarchical Models
Data Analysis Using Regression and Multilevel/Hierarchical Models
Can we have a sim()
method for coxph
objects?
I could submit a PR, but I am not sure what exactly the stat theory is. In particular, I am not sure whether the example from the ARM book generalizes directly to Cox regression. Assuming that object
is of class coxph
, does it boil down to simply sampling from a multivariate normal with
object$coefficients
vcov(object)
?
The line
Line 22 in 3e7f29f
Error in thedata[!is.na(thedata)] : object of type 'closure' is not subsettable
This is due to the
Line 16 in 3e7f29f
function (f, levelsToKeep)
for a binary variable (vector of type factor).
Am I missing something here?
The dummy I have in my regression formula:
dummy [1] 0 0 0 1
...
Dear Yu-Sung Su,
I noticed recently a rather strange behaviour of the bayesglm-function. More specifically, it seems that the results of the bayesglm-function depend on the order of covariates in the formula argument. To give an example in R:
library(arm)
data(lalonde)
m1 <- bayesglm(marriedage+educ, lalonde, family = binomial("logit"))educ+age, lalonde, family = binomial("logit"))
summary(m1)
m2 <- bayesglm(married
summary(m2)
In the example, the differences of the point estimate and standard error differ only slightly. However, for other datasets I use in my daily work, I noticed that the differences can become quite large. Could there be something wrong with the function?
Best,
André
Is it possible to use maximum likelihood ratio test (anova in R) to compare between models?
Hello arm staff,
I've being doing some analysis using arm::bayesglm
and to predict confidence interval I use arm::sim
for 1E5 simulations but it was running too slow.
Checking the source code, I found in the function for sim.glm
the following for loop.
for (s in 1:n.sims){
beta[s,] <- MASS::mvrnorm (1, beta.hat, V.beta)
}
I don't understand why do you choose to run as a for loop instead of:
beata <- MASS::mvrnorm(n.sims, beta.hat, V.beta)
I've run some tests and while the second one is almost instantaneous, the first take several seconds.
> system.time(x <- MASS::mvrnorm (n, beta.hat, V.beta))
usuário sistema decorrido
0.184 0.038 0.204
> system.time(for (i in 1:n) y[i,] <- MASS::mvrnorm (1, beta.hat, V.beta))
usuário sistema decorrido
27.824 0.681 29.698
> system.time(replicate(n, MASS::mvrnorm (1, beta.hat, V.beta)))
usuário sistema decorrido
29.766 0.510 31.146
Is there a statistical reason for the choice? I would love to understand it. If soo, would you consider a blocked alternative, such as:
blocks <- 100
blocksize <- floor(n.sims/blocks)
for (s in 1:blocks){
from <- (s-1)*blocksize
to <- pmin(s*blocksize, n.sims)
beta[from:to,] <- MASS::mvrnorm (pmin(blocksize, from-n.sims), beta.hat, V.beta)
}
Regards
Dear Yu-Sung Su,
I recently wrote some lines of code to use the sim function with plm objects. I didn't test the code in different scenarios, but it might help to implement the sim function for class plm objects (if this is an option).
sim.plm<-function(object, n.sims=100)
{
object.class <- class(object)[[1]]
summ <- summary (object)
coef <- summ$coef[,1:2,drop=FALSE]
dimnames(coef)[[2]] <- c("coef.est","coef.sd")
# sigma.hat <- summ$sigma
# TR: define sigma by hand
NN <- nrow(object$model)
PP <- nrow(coef)
sigma.hat <- sqrt(deviance(object) / (NN-PP))
# TR: end
beta.hat <- coef[,1,drop = FALSE]
# V.beta <- summ$cov.unscaled
V.beta <- vcov(summ)/sigma.hat^2 # TR: unscale scaled vcov
# n <- summ$df[1] + summ$df[2]
# k <- summ$df[1]
n <- nrow(summ$model) # TR: define n
k <- nrow(summ$coefficients) # TR: define k
sigma <- rep (NA, n.sims)
beta <- array (NA, c(n.sims,k))
dimnames(beta) <- list (NULL, rownames(beta.hat))
for (s in 1:n.sims){
sigma[s] <- sigma.hat*sqrt((n-k)/rchisq(1,n-k))
beta[s,] <- MASS::mvrnorm(1, beta.hat, V.beta * sigma[s]^2)
}
ans <- new("sim",
coef = beta,
sigma = sigma)
return (ans)
}
#### Example ####
library(arm)
library(plm)
data(Cigar)
plm.mod<-plm(sales~ price + pop + pimin + price*pimin,
model="within", effect="individual",
index=c("state", "year"), data=Cigar)
summary(plm.mod)
plm.mod.sim<-sim.plm(plm.mod, n.sims=500)
Best,
Tobias
Yu Sung,
Any chance we could get a horseshoe prior added to bayesglm? Vincent Dorie graciously did something similar for blme, but it would also be great to have arm offer this option too. Thanks much.
Here is Vincent's addition: vdorie/blme#3
I expected that rescale(a_binary_factor, binary.inputs = "-0.5,0.5")
would return -0.5 and 0.5, but actually it returns 0.5 and 1.5. According to the R code of rescale()
, it converts a binary factor to number (as 1 and 2) by using as.numeric()
and then subtracts 0.5 (as 0.5 and 1.5) under recent R version.
It seems that the author incorrectly assumed R convert a binary factor to integer as 0/1 (I'm not sure whether previous R version had such feature), but actually R returns 1/2. Please consider fix this bug. Many thanks.
To reproduce this bug: arm::rescale(gl(2,1), binary.inputs = "-0.5,0.5") # returns c(0.5, 1.5)
(R version 4.2.0; Package arm version 1.12-2).
When I use family=quasipoisson in bayesglm(...), and then summary(...), it gives an estimate of
It appears that the former uses the divisor
This is also seen by comparing the SEs of the regression coefficients given by summary(...), which are larger than the SDs of the posteriors produced by sim(...), by a factor of
I assume that a similar issue arises with family=quasibinomial, but I haven't checked that.
Is there a reason for this discrepancy? It certainly leads to worse frequentist coverage for credible intervals based on the posteriors.
Question 1. After deriving an estimate of the dependent variable (y) through the Bayesian multiple regression analysis Estimation Function (postf), the calculation of R squared is made using the cumulative sum function (cumsum). Is there no problem finding SSR(Sum or Square Regression) and SST(Sum of Square Total) with the following coding? I have confirmed that SSR and SST are derived.
postf <- bayesglm.fit(xx,y, family = gaussian(), prior.mean = c(8663265, 325.9304, 1435.758, -4388.022, -1973.862, 2686.825, 1830.709, -442.933, 762.1138), prior.df = Inf)
summary(postf)
postf$coeff
postf$fit
postf$y
postf$res
SSE <- cumsum((postf$res)^2) #SSE(Sum of Square Error)
SST <- cumsum((postf$y-mean(postf$y))^2) #SST(Sum of Square Total)
SSR <- cumsum((postf$fit-mean(postf$y))^2) #SSR(Sum or Square Regression)
print(SST)
print(SSR)
Question 2. The number of variables in the Bayesian multiple regression analysis Estimation Function that I intend to execute is nine, and the prior-mean of the nine variables have been replaced by vector type(c()) accordingly. However, when substituting the prior-mean, the error "invalid length for prior.mean" appears. In this case, is it impossible to proceed with Bayesian estimates? Or is there another way?
> postf <- bayesglm.fit(xx,y, family = gaussian(), prior.mean = c(8663265, 325.9304, 1435.758, -4388.022, -1973.862, 2686.825, 1830.709, -442.933, 762.1138))
Error in bayesglm.fit(xx, y, family = gaussian(), prior.mean = c(8663265, :
invalid length for prior.mean
Is it possible to also specify an inverse-gamma prior in bayesglm()
for the variance of a Gaussian-family model?
install.packages('arm')
Installing package into ‘\CNAS.RU.NL/U759233/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip'
Warning in install.packages :
cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip': HTTP status was '404 Not Found'
Error in download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip'
Warning in install.packages :
download of package ‘arm’ failed
library(sjPlot) # table functions
Error: package or namespace load failed for ‘sjPlot’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called ‘arm’
When running reverse package checks on {arm}, I'm getting:
Error in setClass("balance", representation(rawdata = "data.frame", matched = "data.frame", :
could not find function "setClass"
Please add "setClass"
to the list of imports from {methods} in the NAMESPACE
file:
Lines 23 to 29 in 7bbc274
When I use arm::rescale for each of my variables that I use in the plm, I get almost exactly the same result (same F, R2 etc.) as my raw data.
However, when I run the same code again after restarting Rstudio, I get a totally different result because arm::rescale produces a very different result next time. After some tries, I sometimes get the right results. Sometimes the wrong one. I have no idea how I get the right one (since I don't change anything in between the wrong and the right results).
What am I missing here?
FYI, my data is panel data.
When creating the M1.2
objects in the example code would it make more sense to use the standardize function on the original model M1
rather than the model that has already used rescale
(M1.1
)? i.e.
M1.2 <- standardize(M1.1)
should be
M1.2 <- standardize(M1)
in both examples
I am attempting to load the arm package. In the end I get this specific message: package 'arm' is not available (for R version 3.3.2). I could explore a newer version of R but I would be risking backward compatibility issues. Any suggestions? thx
Hi Team, first of all thanks for the great work. I need a clarification: would you be able to state with which GPL version you are licensing your software at the moment? In the DESCRIPTION I can see "(> 2), now, if I read this correctly it would map to a "GPL 3.0 or later". See the current SPDX version list at https://spdx.org/licenses/
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.