GithubHelp home page GithubHelp logo

arm's People

Contributors

mariusbarth avatar suyusung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

arm's Issues

`sim` method for `coxph` objects

Can we have a sim() method for coxph objects?

I could submit a PR, but I am not sure what exactly the stat theory is. In particular, I am not sure whether the example from the ARM book generalizes directly to Cox regression. Assuming that object is of class coxph, does it boil down to simply sampling from a multivariate normal with

  • means equal to object$coefficients
  • covariance matrix from vcov(object)

?

standardize.R: Error in thedata[!is.na(thedata)] : object of type 'closure' is not subsettable

The line

num.categories <- length (unique(thedata[!is.na(thedata)]))
produces the following error:

Error in thedata[!is.na(thedata)] : object of type 'closure' is not subsettable

This is due to the

thedata <- get(v)
call returning function (f, levelsToKeep) for a binary variable (vector of type factor).

Am I missing something here?


The dummy I have in my regression formula:
dummy [1] 0 0 0 1 ...

Estimation Result depending on order of covariates?

Dear Yu-Sung Su,

I noticed recently a rather strange behaviour of the bayesglm-function. More specifically, it seems that the results of the bayesglm-function depend on the order of covariates in the formula argument. To give an example in R:

library(arm)

data(lalonde)

m1 <- bayesglm(marriedage+educ, lalonde, family = binomial("logit"))
summary(m1)
m2 <- bayesglm(married
educ+age, lalonde, family = binomial("logit"))
summary(m2)

In the example, the differences of the point estimate and standard error differ only slightly. However, for other datasets I use in my daily work, I noticed that the differences can become quite large. Could there be something wrong with the function?

Best,
André

arm::sim speed

Hello arm staff,

I've being doing some analysis using arm::bayesglm and to predict confidence interval I use arm::sim for 1E5 simulations but it was running too slow.
Checking the source code, I found in the function for sim.glm the following for loop.

for (s in 1:n.sims){
      beta[s,] <- MASS::mvrnorm (1, beta.hat, V.beta)
 }

I don't understand why do you choose to run as a for loop instead of:

beata <- MASS::mvrnorm(n.sims, beta.hat, V.beta)

I've run some tests and while the second one is almost instantaneous, the first take several seconds.

> system.time(x <- MASS::mvrnorm (n, beta.hat, V.beta))
  usuário   sistema decorrido 
    0.184     0.038     0.204 
> system.time(for (i in 1:n) y[i,] <-  MASS::mvrnorm (1, beta.hat, V.beta))
  usuário   sistema decorrido 
   27.824     0.681    29.698 
> system.time(replicate(n, MASS::mvrnorm (1, beta.hat, V.beta)))
  usuário   sistema decorrido 
   29.766     0.510    31.146 

Is there a statistical reason for the choice? I would love to understand it. If soo, would you consider a blocked alternative, such as:

blocks <- 100
blocksize <- floor(n.sims/blocks)
for (s in 1:blocks){
    from <- (s-1)*blocksize
    to <- pmin(s*blocksize, n.sims)
    beta[from:to,] <- MASS::mvrnorm (pmin(blocksize, from-n.sims), beta.hat, V.beta)
}

Regards

'sim' method for 'plm' objects

Dear Yu-Sung Su,

I recently wrote some lines of code to use the sim function with plm objects. I didn't test the code in different scenarios, but it might help to implement the sim function for class plm objects (if this is an option).

sim.plm<-function(object, n.sims=100)
{
  object.class <- class(object)[[1]]
  summ <- summary (object)
  coef <- summ$coef[,1:2,drop=FALSE]
  dimnames(coef)[[2]] <- c("coef.est","coef.sd")
  # sigma.hat <- summ$sigma 
  # TR: define sigma by hand
  NN <- nrow(object$model)
  PP <- nrow(coef)
  sigma.hat <- sqrt(deviance(object) / (NN-PP))
  # TR: end              
  beta.hat <- coef[,1,drop = FALSE]
  # V.beta <- summ$cov.unscaled
  V.beta <- vcov(summ)/sigma.hat^2 # TR: unscale scaled vcov
  # n <- summ$df[1] + summ$df[2]
  # k <- summ$df[1]
  n <- nrow(summ$model) # TR: define n
  k <- nrow(summ$coefficients) # TR: define k
  sigma <- rep (NA, n.sims)
  beta <- array (NA, c(n.sims,k))
  dimnames(beta) <- list (NULL, rownames(beta.hat))
  for (s in 1:n.sims){
    sigma[s] <- sigma.hat*sqrt((n-k)/rchisq(1,n-k))
    beta[s,] <- MASS::mvrnorm(1, beta.hat, V.beta * sigma[s]^2)
  }
  
  ans <- new("sim",
             coef = beta,
             sigma = sigma)
  return (ans)
}


#### Example ####

library(arm)
library(plm)

data(Cigar)

plm.mod<-plm(sales~ price + pop + pimin + price*pimin,
             model="within", effect="individual",
             index=c("state", "year"), data=Cigar)

summary(plm.mod)

plm.mod.sim<-sim.plm(plm.mod, n.sims=500)

Best,
Tobias

Horseshoe prior?

Yu Sung,

Any chance we could get a horseshoe prior added to bayesglm? Vincent Dorie graciously did something similar for blme, but it would also be great to have arm offer this option too. Thanks much.

Here is Vincent's addition: vdorie/blme#3

arm::rescale(a_binary_factor, binary.inputs = "-0.5,0.5") returns 0.5/1.5 instead of -0.5/0.5

I expected that rescale(a_binary_factor, binary.inputs = "-0.5,0.5") would return -0.5 and 0.5, but actually it returns 0.5 and 1.5. According to the R code of rescale(), it converts a binary factor to number (as 1 and 2) by using as.numeric() and then subtracts 0.5 (as 0.5 and 1.5) under recent R version.

It seems that the author incorrectly assumed R convert a binary factor to integer as 0/1 (I'm not sure whether previous R version had such feature), but actually R returns 1/2. Please consider fix this bug. Many thanks.

To reproduce this bug: arm::rescale(gl(2,1), binary.inputs = "-0.5,0.5") # returns c(0.5, 1.5) (R version 4.2.0; Package arm version 1.12-2).

family=quasipoisson issue

When I use family=quasipoisson in bayesglm(...), and then summary(...), it gives an estimate of $\sigma^2$ (the dispersion parameter) that is larger than the square of the estimate of $\sigma$ obtained using sim(...)@sigma, by a factor of $n/(n-p)$.

It appears that the former uses the divisor $n-p$, and the latter uses $n$, where $n$ = # observations and $p$ = # parameters.

This is also seen by comparing the SEs of the regression coefficients given by summary(...), which are larger than the SDs of the posteriors produced by sim(...), by a factor of $\sqrt{(n-p)/n}$.

I assume that a similar issue arises with family=quasibinomial, but I haven't checked that.

Is there a reason for this discrepancy? It certainly leads to worse frequentist coverage for credible intervals based on the posteriors.

How to get R squared in bayesglm?

Question 1. After deriving an estimate of the dependent variable (y) through the Bayesian multiple regression analysis Estimation Function (postf), the calculation of R squared is made using the cumulative sum function (cumsum). Is there no problem finding SSR(Sum or Square Regression) and SST(Sum of Square Total) with the following coding? I have confirmed that SSR and SST are derived.

postf <- bayesglm.fit(xx,y, family = gaussian(), prior.mean = c(8663265, 325.9304, 1435.758, -4388.022, -1973.862, 2686.825, 1830.709, -442.933, 762.1138), prior.df = Inf)
summary(postf)
postf$coeff
postf$fit
postf$y
postf$res

SSE <- cumsum((postf$res)^2) #SSE(Sum of Square Error)
SST <- cumsum((postf$y-mean(postf$y))^2) #SST(Sum of Square Total)
SSR <- cumsum((postf$fit-mean(postf$y))^2) #SSR(Sum or Square Regression)

print(SST)
print(SSR)

Question 2. The number of variables in the Bayesian multiple regression analysis Estimation Function that I intend to execute is nine, and the prior-mean of the nine variables have been replaced by vector type(c()) accordingly. However, when substituting the prior-mean, the error "invalid length for prior.mean" appears. In this case, is it impossible to proceed with Bayesian estimates? Or is there another way?

> postf <- bayesglm.fit(xx,y, family = gaussian(), prior.mean = c(8663265, 325.9304, 1435.758, -4388.022, -1973.862, 2686.825, 1830.709, -442.933, 762.1138))
Error in bayesglm.fit(xx, y, family = gaussian(), prior.mean = c(8663265,  : 
  invalid length for prior.mean

Variance prior for bayesglm

Is it possible to also specify an inverse-gamma prior in bayesglm() for the variance of a Gaussian-family model?

package "arm" cannot be downloaded, therefore sjPlot cannot be used

install.packages('arm')
Installing package into ‘\CNAS.RU.NL/U759233/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip'
Warning in install.packages :
cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip': HTTP status was '404 Not Found'
Error in download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/arm_1.9-3.zip'
Warning in install.packages :
download of package ‘arm’ failed
library(sjPlot) # table functions
Error: package or namespace load failed for ‘sjPlot’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called ‘arm’

rescale returning different results each time.

When I use arm::rescale for each of my variables that I use in the plm, I get almost exactly the same result (same F, R2 etc.) as my raw data.

However, when I run the same code again after restarting Rstudio, I get a totally different result because arm::rescale produces a very different result next time. After some tries, I sometimes get the right results. Sometimes the wrong one. I have no idea how I get the right one (since I don't change anything in between the wrong and the right results).

What am I missing here?

FYI, my data is panel data.

small typo in standardize example

When creating the M1.2 objects in the example code would it make more sense to use the standardize function on the original model M1 rather than the model that has already used rescale (M1.1)? i.e.
M1.2 <- standardize(M1.1)
should be
M1.2 <- standardize(M1)
in both examples

arm not available for version?

I am attempting to load the arm package. In the end I get this specific message: package 'arm' is not available (for R version 3.3.2). I could explore a newer version of R but I would be risking backward compatibility issues. Any suggestions? thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.