GithubHelp home page GithubHelp logo

kkholst / lava Goto Github PK

View Code? Open in Web Editor NEW
33.0 10.0 1.0 28.1 MB

Latent Variable Models in R https://kkholst.github.io/lava/

R 99.04% TeX 0.96%
r statistics latent-variable-models structural-equation-models simulation

lava's Introduction

R-CMD-check coverage cran cran-dl

Latent Variable Models: lava

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Installation

install.packages("lava", dependencies=TRUE)
library("lava")
demo("lava")

For graphical capabilities the Rgraphviz package is needed (first install the BiocManager package)

# install.packages("BiocManager")
BiocManager::install("Rgraphviz")

or the igraph or visNetwork packages

install.packages("igraph")
install.packages("visNetwork")

The development version of lava may also be installed directly from github:

# install.packages("remotes")
remotes::install_github("kkholst/lava")

Citation

To cite that lava package please use one of the following references

Klaus K. Holst and Esben Budtz-Joergensen (2013). Linear Latent Variable Models: The lava-package. Computational Statistics 28 (4), pp 1385-1453. http://dx.doi.org/10.1007/s00180-012-0344-y

@article{lava,
  title = {Linear Latent Variable Models: The lava-package},
  author = {Klaus Kähler Holst and Esben Budtz-Jørgensen},
  year = {2013},
  volume = {28},
  number = {4},
  pages = {1385-1452},
  journal = {Computational Statistics},
  doi = {10.1007/s00180-012-0344-y}
}

Klaus K. Holst and Esben Budtz-Jørgensen (2020). A two-stage estimation procedure for non-linear structural equation models. Biostatistics 21 (4), pp 676-691. http://dx.doi.org/10.1093/biostatistics/kxy082

@article{lava_nlin,
  title = {A two-stage estimation procedure for non-linear structural equation models},
  author = {Klaus Kähler Holst and Esben Budtz-Jørgensen},
  journal = {Biostatistics},
  year = {2020},
  volume = {21},
  number = {4},
  pages = {676-691},
  doi = {10.1093/biostatistics/kxy082},
}

Examples

Structural Equation Model

Specify structural equation models with two factors

m <- lvm()
regression(m) <- y1 + y2 + y3 ~ eta1
regression(m) <- z1 + z2 + z3 ~ eta2
latent(m) <- ~ eta1 + eta2
regression(m) <- eta2 ~ eta1 + x
regression(m) <- eta1 ~ x

labels(m) <- c(eta1=expression(eta[1]), eta2=expression(eta[2]))
plot(m)

plot of chunk lvm1

Simulation

d <- sim(m, 100, seed=1)

Estimation

e <- estimate(m, d)
e
#>                     Estimate Std. Error  Z-value   P-value
#> Measurements:                                             
#>    y2~eta1           0.95462    0.08083 11.80993    <1e-12
#>    y3~eta1           0.98476    0.08922 11.03722    <1e-12
#>     z2~eta2          0.97038    0.05368 18.07714    <1e-12
#>     z3~eta2          0.95608    0.05643 16.94182    <1e-12
#> Regressions:                                              
#>    eta1~x            1.24587    0.11486 10.84694    <1e-12
#>     eta2~eta1        0.95608    0.18008  5.30910 1.102e-07
#>     eta2~x           1.11495    0.25228  4.41951 9.893e-06
#> Intercepts:                                               
#>    y2               -0.13896    0.12458 -1.11537    0.2647
#>    y3               -0.07661    0.13869 -0.55241    0.5807
#>    eta1              0.15801    0.12780  1.23644    0.2163
#>    z2               -0.00441    0.14858 -0.02969    0.9763
#>    z3               -0.15900    0.15731 -1.01076    0.3121
#>    eta2             -0.14143    0.18380 -0.76949    0.4416
#> Residual Variances:                                       
#>    y1                0.69684    0.14858  4.69004          
#>    y2                0.89804    0.16630  5.40026          
#>    y3                1.22456    0.21182  5.78109          
#>    eta1              0.93620    0.19623  4.77084          
#>    z1                1.41422    0.26259  5.38570          
#>    z2                0.87569    0.19463  4.49934          
#>    z3                1.18155    0.22640  5.21883          
#>    eta2              1.24430    0.28992  4.29195

Model assessment

Assessing goodness-of-fit, here the linearity between eta2 and eta1 (requires the gof package)

# install.packages("gof", repos="https://kkholst.github.io/r_repo/")
library("gof")
set.seed(1)
g <- cumres(e, eta2 ~ eta1)
plot(g)

plot of chunk gof1

Non-linear measurement error model

Simulate non-linear model

m <- lvm(y1 + y2 + y3 ~ u, u ~ x)
transform(m,u2 ~ u) <- function(x) x^2
regression(m) <- z~u2+u

d <- sim(m,200,p=c("z"=-1, "z~u2"=-0.5), seed=1)

Stage 1:

m1 <- lvm(c(y1[0:s], y2[0:s], y3[0:s]) ~ 1*u, u ~ x)
latent(m1) <- ~ u
(e1 <- estimate(m1, d))
#>                     Estimate Std. Error  Z-value  P-value
#> Regressions:                                             
#>    u~x               1.06998    0.08208 13.03542   <1e-12
#> Intercepts:                                              
#>    u                -0.08871    0.08753 -1.01344   0.3108
#> Residual Variances:                                      
#>    y1                1.00054    0.07075 14.14214         
#>    u                 1.19873    0.15503  7.73233

Stage 2

pp <- function(mu,var,data,...) cbind(u=mu[,"u"], u2=mu[,"u"]^2+var["u","u"])
(e <- measurement.error(e1, z~1+x, data=d, predictfun=pp))
#>             Estimate Std.Err    2.5%   97.5%   P-value
#> (Intercept)  -1.1181 0.13795 -1.3885 -0.8477 5.273e-16
#> x            -0.0537 0.13213 -0.3127  0.2053 6.844e-01
#> u             1.0039 0.11504  0.7785  1.2294 2.609e-18
#> u2           -0.4718 0.05213 -0.5740 -0.3697 1.410e-19
f <- function(p) p[1]+p["u"]*u+p["u2"]*u^2
u <- seq(-1, 1, length.out=100)
plot(e, f, data=data.frame(u))

plot of chunk nlin1

Simulation

Studying the small-sample properties of a mediation analysis

m <- lvm(y~x, c~1)
regression(m) <- y+x ~ z
eventTime(m) <- t~min(y=1, c=0)
transform(m,S~t+status) <- function(x) survival::Surv(x[,1],x[,2])
plot(m)

plot of chunk mediation1

Simulate from model and estimate indirect effects

onerun <- function(...) {
    d <- sim(m, 100)
    m0 <- lvm(S~x+z, x~z)
    e <- estimate(m0, d, estimator="glm")
    vec(summary(effects(e, S~z))$coef[,1:2])
}
val <- sim(onerun, 100)
summary(val, estimate=1:4, se=5:8, short=TRUE)
#> 100 replications					Time: 3.667s
#> 
#>         Total.Estimate Direct.Estimate Indirect.Estimate S~x~z.Estimate
#> Mean           1.97292         0.96537           1.00755        1.00755
#> SD             0.16900         0.18782           0.15924        0.15924
#> SE             0.18665         0.18090           0.16431        0.16431
#> SE/SD          1.10446         0.96315           1.03183        1.03183
#>                                                                        
#> Min            1.47243         0.54497           0.54554        0.54554
#> 2.5%           1.63496         0.61228           0.64914        0.64914
#> 50%            1.95574         0.97154           0.99120        0.99120
#> 97.5%          2.27887         1.32443           1.27807        1.27807
#> Max            2.45746         1.49491           1.33446        1.33446
#>                                                                        
#> Missing        0.00000         0.00000           0.00000        0.00000

Add additional simulations and visualize results

val <- sim(val,500) ## Add 500 simulations
plot(val, estimate=c("Total.Estimate", "Indirect.Estimate"),
     true=c(2, 1), se=c("Total.Std.Err", "Indirect.Std.Err"),
     scatter.plot=TRUE)

plot of chunk simres1

lava's People

Contributors

kkholst avatar klaus-holst avatar paulkaefer avatar scheike avatar tagteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

paulkaefer

lava's Issues

Installation issues on R < 4.1

The latest package version uses the \(x) anonymous function notation in the pairwise.diff function. This syntax is not supported in R versions prior to 4.1, causing package installation to fail.

46: pairwise.diff <- function(n) {
47:   pdiff <- function(n) lava::contr(lapply(seq(n-1), \
                                                        ^
ERROR: unable to collate and parse R files for package lava```

R >=3.5 requirement

Hi there! It looks like you recently made a release of lava that has a new R >= 3.5 requirement. According to this commit, it seems you want to enforce that readRDS() has support for the version 3 format that came with R 3.5. Is this required by your package? 28dd87b#diff-35ba4a2677442e210c23a00a5601aba3

In tidymodels, we implicitly rely on your package through recipes -> ipred -> prodlim -> lava. We try to maintain support for 5 versions of R where possible, which means that we support back through R 3.3 at the moment. Do you think it might be possible for lava to continue to support R versions back this far?

modelsearch output

Hi Klaus,
The function modelsearch in lava (v 1.4.7) appears to return each additional model path 3 times. Adjusted p-values appear sensitive to this redundancy.

m <- lvm(c(v1, v2, v3, v4) ~ u)
latent(m) <- ~u
dtempo <- sim(m, 1e2)
e <- estimate(m, dtempo)
esearch <- modelsearch(e)
esearch

Thanks,
Patrick

Error when removing the last external parameter

Hi Klaus,

I get an error when I remove the last external parameter from an lvm model (lava version 1.4.4.13):
lv1 <- lvm(Y ~ X1 + X2)
parameter(lv1) <- "YX3"
parameter(lv1) <- "Y
X4"
coef(lv1)
parameter(lv1, remove = TRUE) <- "YX3"
coef(lv1)
parameter(lv1, remove = TRUE) <- "Y
X4"

Error in x$exfix[enamed] : invalid subscript type 'list'

I tried to look at what was going on, the error appears
in "parameter<-.lvm" line "index(x) <- reindex(x)"
Then in the reindex function the line:
"eparname <- unlist(unique(x$exfix[enamed]))"
is problematic as enamed is an empty list.
A possible solution (for my small example) is to replace this line by:

if(length(enamed)>0){
eparname <- unlist(unique(x$exfix[enamed]))
}else{
eparname <- NULL
}

Thanks!

trend.delta in 'spaghetti' function doesn't change shading region when altered

Hi Klaus,

Hope you are doing well. I am trying to change the confidence region interval for a spaghetti plot I am making using the trend.delta specification in the 'spaghetti' function. When I alter the value for trend.delta, the shaded region does not change. This seems to be the case using my data as well as in the generated data in the example provided online:
These two
if (interactive() & requireNamespace("mets")) {

  • K <- 5
    
  • y <- "y"%++%seq(K)
    
  • m <- lvm()
    
  • regression(m,y=y,x=~u) <- 1
    
  • regression(m,y=y,x=~s) <- seq(K)-1
    
  • regression(m,y=y,x=~x) <- "b"
    
  • d <- sim(m,500)
    
  • dd <- mets::fast.reshape(d);
    
  • dd$num <- dd$num+rnorm(nrow(dd),sd=0.5) ## Unbalance
    
  • spaghetti(y~num,dd,id="id",lty=1,col=Col(1,.4),trend=TRUE,trend.col="darkblue")
    
  • }

The resulting plot when I specify trend.delta=0.05 and trend.delta=0.5 for dd is the same

Sorry if this is an error on my part -- I am not especially strong in R. Any advice you have on how to fix this would be greatly appreciated. If you need any additional information from me or if I can help to resolve this please let me know.

Thanks

ERROR (test-model.R:82:5): Graph attributes

Hi,
Thanks for using data.table. For the next release, in revdep testing I get the following error from R CMD check. It happens with data.table 1.13.2, too, as released on CRAN, so I don't think it is due to the data.table update. CRAN checks seem all OK for lava so it seems to be problem just local for me. Any ideas please? It would be nice to pass OK locally to clear this up.
I see that Suggested packages gof and lava.tobit have been removed from CRAN, but that doesn't seem to cause CRAN checks any problem. I'm running with R_CHECK_FORCE_SUGGESTS=false to get past that, which I assume CRAN checks do too.
Thanks, Matt

00check.log
test-all.Rout.fail

Test suite issue

Hi,
I intend to upgrade the Debian package if lava. Unfortunately the testsuite has some failure in our setup. You can see the full test log which contains:

== Failed tests ================================================================
-- Error ('test-model.R:97:5'): Graph attributes -------------------------------
Error in `testthat::expect_match(col, graph::nodeRenderInfo(g2)$fill[v])`: is.character(regexp) is not TRUE
Backtrace:
    x
 1. \-testthat::expect_match(col, graph::nodeRenderInfo(g2)$fill[v]) at test-model.R:97:4
 2.   \-base::stopifnot(is.character(regexp), length(regexp) == 1)
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 254 ]

Any idea what might be wrong here?
Kind regards, Andreas.

fix examples in closed.testing

There is a call loadNamespace(mets) in examples - https://github.com/kkholst/lava/blob/master/man/closed.testing.Rd#L39
As mets is suggested dependency, it should not be required to run examples according to R-exts. This can be conditionally escaped with requireNamespace.
It is possible there are other cases of that issue in other examples, this one was to first that fails.

* checking examples ... ERROR
Running examples in ‘lava-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: closed.testing
> ### Title: Closed testing procedure
> ### Aliases: closed.testing p.correct
> 
> ### ** Examples
> 
> m <- lvm()
> regression(m, c(y1,y2,y3,y4,y5,y6,y7)~x) <- c(0,0.25,0,0.25,0.25,0,0)
> regression(m, to=endogenous(m), from="u") <- 1
> variance(m,endogenous(m)) <- 1
> set.seed(2)
> d <- sim(m,200)
> l1 <- lm(y1~x,d)
> l2 <- lm(y2~x,d)
> l3 <- lm(y3~x,d)
> l4 <- lm(y4~x,d)
> l5 <- lm(y5~x,d)
> l6 <- lm(y6~x,d)
> l7 <- lm(y7~x,d)
> 
> (a <- merge(l1,l2,l3,l4,l5,l6,l7,subset=2))
    Estimate Std.Err     2.5% 97.5%  P-value
x    -0.0220  0.0993 -0.21668 0.173 8.25e-01
x.1   0.3723  0.1157  0.14564 0.599 1.29e-03
x.2   0.1198  0.1110 -0.09780 0.337 2.81e-01
x.3   0.4223  0.0926  0.24076 0.604 5.14e-06
x.4   0.2934  0.1214  0.05558 0.531 1.56e-02
x.5   0.2057  0.1062 -0.00246 0.414 5.28e-02
x.6   0.0524  0.1182 -0.17922 0.284 6.57e-01
> p.correct(a)
Error in loadNamespace(name) : there is no package called ‘mets’
Calls: p.correct ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

Rate vs. scale in coxExponential.lvm

Hi Klaus,

Someone made me notice that the argument rate in coxExponential.lvm should actually be called scale
to be consistent with the rexp function in R.
A small example:

p.rate <- 5; m <- lvm(Y~1); distribution(m,"Y") <- coxExponential.lvm(rate = p.rate) set.seed(10); round(quantile(lava::sim(m, 1e4)[,1]),3)
gives something much closer to
set.seed(10); round(quantile(rexp(1e4, rate = 1/p.rate)),3)
than
set.seed(10); round(quantile(rexp(1e4, rate = p.rate)),3)

modelsearch output

The function modelsearch appears to return each additional possible model path evaluated 3 times. Adjusted p-values appear sensitive to this redundancy.

m <- lvm(c(v1, v2, v3, v4) ~ u) latent(m) <- ~u dtempo <- sim(m, 1e2) e <- estimate(m, dtempo) esearch <- modelsearch(e)

plotConf for lmer models

Hi Klaus,

Melanie would like to use the plotConf function for mixed models (e.g. lmer object).
But she gets the following error:
"Error in if (fixed.only) { : argument is of length zero"

example:

library(lme4)
library(lava)
n <- 100
x0 <- rnorm(n)
x1 <- seq(-3,3, length.out=n)
x2 <- factor(rep(c(1,2),each=n/2), labels=c("A","B"))
y <- 5 + 2*x0 + 0.5*x1 + -1*(x2=="B")*x1 + 0.5*(x2=="B") + rnorm(n, sd=0.25)
dd <- data.frame(y=y, x1=x1, x2=x2)
dd$Id <- rbinom(n, size = 3, prob = 0.3)
lmer0 <- lmer(y ~ x0 + x1*x2 + (1|Id), dd)
plotConf(lmer0, var1="x1", `var2="x2")

I think this is because lme4:::model.frame.merMod has an additional argument fixed.only compare to the standard stats:::model.frame.lm
I have tried to fixed, as well as another error due to the fact that when extracting the covariance matrix in lmer we get an object and not a matrix (so I have applied as.matrix to it).
In the following file you will find the change that I suggest:
plotConf.txt
(it should be an .R file but I could not load it so I converted it to a .txt file)

Thanks

Ordinal regression

Hi Klaus,

I wanted to add a multinomial variable to the model:
m <- lvm()
distribution(m, ~y) <- multinomial.lvm(m, prob=c(0.2, 0.2, 0.6)
But I got the error <could not find function "multinomial.lvm"> (which is however listed in the reference manual)

Thanks!

lava_1.7.2.1.tar.gz is missing from mirrors

Hi,
seems for the last only lava_1.7.2.tar.gz made it to the CRAN mirrors. See http://cran.r-project.org/src/contrib/Archive/lava/ missing lava_1.7.2.1.tar.gz .

The bug #12 is hence still unfixed.

$ for f in lava_1.7.*.tar.gz; do echo $f; R CMD INSTALL $f; done
lava_1.7.1.tar.gz
* installing to library ‘/home/bionic/R/x86_64-pc-linux-gnu-library/3.5’
* installing *source* package ‘lava’ ...
** package ‘lava’ successfully unpacked and MD5 sums checked
** R
** data
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (lava)
lava_1.7.2.tar.gz
* installing to library ‘/home/bionic/R/x86_64-pc-linux-gnu-library/3.5’
* installing *source* package ‘lava’ ...
** package ‘lava’ successfully unpacked and MD5 sums checked
** R
Error in parse(outFile) : 
  /tmp/Rtmpdt3HA8/R.INSTALL1ebd4eadab44/lava/R/contr.R:47:53: unexpected input
46: pairwise.diff <- function(n) {
47:   pdiff <- function(n) lava::contr(lapply(seq(n-1), \
                                                        ^
ERROR: unable to collate and parse R files for package ‘lava’
* removing ‘/home/bionic/R/x86_64-pc-linux-gnu-library/3.5/lava’
* restoring previous ‘/home/bionic/R/x86_64-pc-linux-gnu-library/3.5/lava’
$ ls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.