murphymv / semeff Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 0.0 15.62 MB

Automatic Calculation of Effects for Piecewise Structural Equation Models

Home Page: https://murphymv.github.io/semEff/

License: GNU General Public License v3.0

R 96.84% CSS 1.88% TeX 1.29%

semeff's People

Contributors

Stargazers

Watchers

semeff's Issues

how to cite semEff?

semEff documentation not clear for using psem objects

The documentation for semEff suggests that for sem, one can provide either a list or a psem object. It appears that this is the only object that would need to be provided.

keeley.psem <- psem(
lm(age ~ distance, data = keeley),
lm(hetero ~ distance, data = keeley),
lm(abiotic ~ distance, data = keeley),
lm(firesev ~ age, data = keeley),
lm(cover ~ firesev, data = keeley),
lm(rich ~ distance + hetero + abiotic + cover, data = keeley)
)

x <- semEff(keeley.psem)
Error in bootEff(sem, ...) :
Number of bootstrap resamples (R) must be specified.

I deduced from this error message that if a psem object is provided, bootEff must be run within the function. It would be helpful to indicate this in the documentation, and that additional arguments are required when sem is a psem object.

SemEff using gls models

Hello!

First of all, thank you for creating this wonderful R package 😊.

I have fitted the following psem, which consists of a list of phylogenetic least squared models (gls):

psem(gls(genome_size ~ temperature + precipitation, tree.y, correlation = corBrownian(phy= newT2)),
gls(PC1_def ~ genome_size + temperature + precipitation, data.y, correlation = corBrownian(phy= newT2)),
gls(PC2_def ~ genome_size + temperature + precipitation, data.y, correlation = corBrownian(phy= newT2)),
gls(PC3_def ~ genome_size + temperature + precipitation, data.y, correlation = corBrownian(phy= newT2)),
gls(herbivory ~ PC1_def + PC2_def + PC3_def + genome_size + temperature + precipitation, data.y, correlation = corBrownian(phy= newT2)))

However, when I tried to obtain total, direct, and indirect effects using 'semEff()', I encountered the following error:

Error in getData(m) :
'data' does not contain all variables used to fit model.

I assume this error occurs because 'gls' models require both the data and a correlation structure based on a phylogenetic tree, which is not included in the 'data' object.

Is there any way to resolve this issue? I would greatly appreciate your help.

Thanks!

Carla

Error message "estimated adjustment 'a' is NA"

Dear Murphy, dear semEff users,

I found out myself that this was a stupid question and the solution is very easy:

I found out that the error message comes from the build in function boot::boot.ci. When using type="bca", this error occurs because bias-corrected, accelerated confidence interval calculation is unstable for a small number of replicates. The error no longer occurred when I used >= 1000 replicates. Finally, I think about choosing type="perc", because it is conceptually much more simple and straightforward.

Nevertheless, I did not delete the question, because somebody else might come across the same issue. However, I am also happy to delete it.

The original questions was the following:
Dear Murphy,
first of all many thanks for developing the semEff package. We had a hard time in the past calculating indirect and direct effects based on the output of piecewiseSEM::psem.

I have just downloaded semEff yesterday. I set up a piecewiseSEM::psem and the summary gives reasonable results for the standard estimates (path coefficients).

I then used semEff to bootstrap the path coefficients. However, when I try to use this bootEff object as an input for semEff, I get the following error message: "Error in bca.ci(boot.out, conf, index[1L], L = L, t = t.o, t0 = t0.o, : estimated adjustment 'a' is NA".

I tried to understand where this error comes from but was not able to do so.

Do you have any idea what might be wrong. See the code and data attached below. Please do not hesitate to contact me if you need any further information!

Many thanks

Jochem

Code:
#SEM package
library(piecewiseSEM)
#semEFF to calculate direct, indirect and total effects of variables in the SEM
library(semEff)

#GLMs
GLMTNFull3GaussTrans <- glm(TN_mgl ~ B10_Agri + B10_Urban, family = gaussian(link = "identity"), data=dfsitesGLMTrans)
GLMTPFull3GaussTrans <- glm(TP_mgl ~ B10_Agri + B10_Urban, family = gaussian(link = "identity"), data=dfsitesGLMTrans)
GLMChlaFull3GaussTrans <- glm(Chla_mcgl ~ TN_mgl + TP_mgl, family = gaussian(link = "identity"), data=dfsitesGLMTrans)
GLMMPAllFull3GaussTrans <- glm(MP_EQR01 ~ Chla_mcgl + TN_mgl + TP_mgl + B10_Agri + B10_Urban + LakeScore , family = gaussian(link = "identity"), data=dfsitesGLMTrans)

#SEM and evaluation
SEMAllFull3GaussTrans <- psem(GLMTNFull3GaussTrans, GLMTPFull3GaussTrans, GLMChlaFull3GaussTrans, GLMMPAllFull3GaussTrans, TN_mgl %~~% TP_mgl)
summary(SEMAllFull3GaussTrans, .progressBar = F, conserve = T)

#Some diagnostics on the path coefficients from the piecewiseSEM package
coefs(SEMAllFull3GaussTrans) # path coefficients
plot(SEMAllFull3GaussTrans, alpha=0.05)
rsquared(SEMAllFull3GaussTrans, method = "trigamma")

#Calculate direct, indirect and total effect using the new package semEff!, which allows to do a bootstrapping first for the path coefficients to finally get confidence intervals!
SEM3Boot <- bootEff(SEMAllFull3GaussTrans, 10, seed=12345) #bootstrap the coefficients of the SEM first
summary(SEM3Boot)
View(SEM3Boot)

SEM3Eff <- semEff(SEM3Boot, predictors = NULL, mediators = NULL, use.raw = FALSE, ci.conf = 0.95, ci.type = "bca", digits = 3, bci.arg=NULL)

dfsitesGLMTrans.txt

Considerations for GLMM?

Hi Mark,

Great package! I'm interested to know how semEff handles estimating standardized coefficients for GLMMs, if at all. Should users set use.raw = T with GLMMs? How are direct/indirect effects handled in this case?

Thanks

Sean

getData does not work

Dear Mark V. Murphy,
I am trying to bootstrap some CI using the semEff pkg.

I created an piecewiseSEM object and then I used the semEff function.
It looks there is a little issue with the getData function, what do you think?

edit 1:
I also tried
bootEff(fit, ran.eff="random effect")
but the error is the same.

Thank you in advance

edit 2: I solved in this way

fit <- psem(
lmer( Y ~ X + M + (1|groupe_recode),
na.action = na.omit, data = dati)
)
summary(fit)

fitB <- list(
lmer( Y ~ -1 + X + M+ (1|groupe_recode),
na.action = na.omit, data = dati)
)

A=bootEff(fitB, ran.eff="groupe_recode",
type="perc", ncpu=3, R=1000)
CI=bootCI(A)

confidence intervals for getEff() functions

Hi,

I am wondering if there is some way to output the confidence intervals along with the reported getEff() functions?

One of the issues I'm running into is that I have some pathways that flow through multiple predictors and while the getAllInd() function does provide me with the effect estimates, I want to check the confidence intervals, as well.

The summary(semEff(sem.Eff.boot)) does output confidence intervals, but I am unable to get some of the paths through and around multiple mediators to show with the summary output.

For example, in my model I have the following paths that I want the direct/indirect effects and confidence intervals for:
Daycount_poly1 --> turb_pre_ln1 --> nitrite_pre_ln --> cl2_tot_pre
Daycount_poly1 --> turb_pre_ln1 --> cl2_tot_pre
^Note: not the "total" indirect path through turb_pre_ln1, but the path that passes through turb_pre_ln1 and not nitrite_pre_ln

Daycount_poly1 --> temp_pre_ln --> nitrite_pre_ln --> cl2_tot_pre
Daycount_poly1 --> temp_pre_ln --> cl2_tot_pre
^Note: not the "total" indirect path through temp_pre_ln, but the path that passes through temp_pre_ln and not nitrite_pre_ln

Daycount_poly1 --> nitrite_pre_ln --> cl2_tot_pre
^Note: the path through nitrite_pre_ln1 that does not first go through turb_pre_ln1 or through temp_pre_ln

Here is the output from my getAllInd(sem.Eff.boot) that I want to have confidence intervals for each effect reported:

I tried to dive into the sem.eff object to extract values myself but couldn't find them all.

Thanks for any help!

semEff() provides much lower estimates than piecewiseSEM::coefs() for psem objects with lme4 (poisson errors)

Hi Mark,

I'm trying to obtain indirect and direct effects of a psem object which contains mixed models with poisson family (lme4 package).

After bootstraping, I get a summary table with direct and indirect effects from semEff() but the estimates (even direct effects) are much lower than the standardized coefficients that I obtained by using piecewiseSEM::coefs().

For instance, considering only the response nseedlings:
coefs(sem, standardize = "scale", standardize.type = "Menard.OE", test.type="III")

        Response                  Predictor           Estimate  Std.Error  DF    Crit.Value   P.Value  Std.Estimate    
             [...]                           
      nseedlings         nseeds.germinated              0.0337    0.0033    120    10.3457     0.0000      0.3815 ***
      nseedlings                  Habitat              -0.3172    0.1667   120    -1.9032      0.0570     -0.1206    
      nseedlings                 Microsite              0.0124    0.1157    120     0.1072      0.9146      0.0077

sem_boot <- 
  bootEff(sem, 
          R = 100, 
          seed = 13, 
          ran.eff = "Location")

(sem_eff<- semEff(sem_boot))

summary(sem_eff, response = "nseedlings")


                               Effect    Bias   Std. Err.  Lower CI  Upper CI    
 DIRECT    Habitat            | -0.016 | 0.006 |  0.031 |   -0.047    0.040 |  
         Microsite           |  0.001 | 0.000 |  0.022 |   -0.025    0.043 |  
         nseeds.germinated   |  0.044 | 0.002 |  0.010 |     0.034    0.069 | *

INDIRECT  Predators           |  0.000 | 0.000 |  0.000 |     0.000    0.000 |  
        Habitat               |  0.000 | 0.000 |  0.001 |    -0.001    0.003 |  
       Microsite             | -0.001 | 0.000 |  0.001 |   -0.002    0.000 |  
       nseeds                 |  0.000 | 0.000 |  0.000 |     0.000    0.000 |

Predators, Habitat and Microsite are ordinal variables and the rest of variables are count.

I have checked for multicolinearity and everything looks ok:
RVIF(sem[[6]])

nseeds.germinated Habitat Microsite
1.316408 1.115216 1.212450

What may be happening?

Thanks in advance!

How to use semEff with glmmTMB?

Hi Mark,

I would like to calculate the direct and indirect effects of a psem() using glmmTMB() and lm().

I use this code:
keeley.sem <- list(
glmmTMB(Variable1 ~ Variable2 + (1 | random), data = dat, family =gaussian(link = "identity")),
glmmTMB(Variable2 ~ Variable3 +Variable4 + Variable5 + Variable6 +
Variable7+ (1 | random), data = dat, family =gaussian(link = "identity")),
lm(Variable6 ~ Variable7+ Variable3, data = dat),
lm(Variable4 ~ Variable5+ Variable6 , data = dat)
)
keeley.sem.eff <- semEff(keeley.sem, R = 1000, seed = 13, ran.eff= "random")
summary(keeley.sem.eff)

I tried using bootEff() as in R's help page before using semEff(), but it didn't work as well.

I also got a warning message when I tried the code with lmer() instead of glmmTMB(): "Warning message:
In bootEff(sem, ...): Mixed and unmixed models together in one list. Resampling will treat all models as mixed."

I have two questions:

Does semEff() work for glmmTMB() or how can I handle it?
Can I use mixed and non mixed models in one modellist and get reliable results?

Thanks a lot!
Sina

Documentation on standardization

I do not find documentation of how exactly you standardize the effects in the SEM.

Do you refit the model with standardized variables, i.e. centering and scaling, or you standardize the coefficients after the model evaluation ? Could you also point me toward a reference for the adjustment for semi-partial correlation ?

Transfer `digits` argument from `semEff()` to `print.semEff()`/`summary.semEff()`?

"WARNING:All vanes of t1* are NA" - only on one computer when running the same code, also 10x slower

Hi again,

I'm having another strange issue with the bootEff function that seems to be localized to computer A, and doesn't occur when the same code is run on computer B. I tried to make a test version of the dataset to illustrate the issue for you here, and interestingly, the code works fine with the shorter test dataset on both computer. The test dataset has 100 rows of the full dataset, and I removed some of the columns. In my previous issue, the bug was due to a column I wasn't calling in the function, so I wonder if perhaps this is a similar bug.

When the bootEff function is run with test dataset is run on either computer, or with the full dataset on computer A, the results for bootEff make sense. For example:

#t1* -0.00000000000002290888 0.00000000000002787414 0.00000000000000501701
#t2* -0.60260267367821596096 0.00415074893531663935 0.03539173696975370098
#t3* 0.16633473320324301814 -0.00470021776430315508 0.04867780856680303803
#t4* 0.55316550100493566688 -0.00822456537248661412 0.03631034514791206536

When the boot Eff function is run with the full dataset on computer B, there is a warning for all variables in the model:

Bootstrap statistics:
WARNING: All values of t1* are NA

Also, when I tested with the full dataset, computer A took 56 mins to run 100 iteration, while computer B took 6 mins. The test dataset takes about 50 seconds on both computers

Because I can't share the full dataset here, I'll share the script and associated data here:

https://drive.google.com/drive/folders/13xM9x7BcP2MwApEjSdRMmlN4i3WOYqcn?usp=sharing

Computer A session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

About your PC
Processor: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz 3.51 GHz
Installed RAM: 80.0GB

Computer B session info:

sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C LC_TIME=English_Canada.utf8

About your PC: Not available due to administrative controls....

Can't list correlated erros with cor.err

I am trying to run a few betareg models and I trying to understand what is wrong when I indicate which variables have correlated errors (cor.err), particularly when you have multiple variables in a model. Here is an example and output on console:

library(dplyr)
library(semEff)

shipley2<-shipley %>% mutate(Growth_perc=Growth/100)

model1<-betareg( Growth_perc  ~   Survival, data=shipley2)

model2<-betareg( Survival ~ year + Date + DD ,data=shipley2)

testing<-bootEff(list(
  model1,
  model2),
  cor.err = c("Growth_perc ~~ year"),
  R = 10)


#Error in bootEff(list(model1, model2), cor.err = c("Growth_perc ~~ year"),  : 
# Names of variable(s) with correlated errors missing from list of models and/or weights.

The problem does not seem to be related to betareg per se

In (function (m, w) : 101 model fit(s) or parameter estimation(s) failed. NAs reported/generated.

Hi there. I'm trying to follow along with the instructions on the R vignette and am having absolutely no luck pinpointing why I can't get this to work.

I've created a path model in piecewiseSEM as follows:

test_model <- psem(glm(presence ~ Annual_mean_temp + Snowmelt_DOY + LC_tussock_gram + Substrate_Circumneutral, REPH2, family = binomial),
glm(LC_tussock_gram ~ Annual_mean_temp + Substrate_Acidic + Substrate_Circumneutral + Snowmelt_DOY, REPH2, family = binomial),
lm(Snowmelt_DOY ~ Annual_mean_temp + Substrate_Acidic + Substrate_Circumneutral , REPH2))

When I try to use semEff functions, I experience the following errors:

test_SEM_boot <- bootEff(test_model, R = 100, seed = 13, parallel = "no")

Warning messages:
1: In (function (m, w) ... :
101 model fit(s) or parameter estimation(s) failed. NAs reported/generated.

test_SEM_eff <- semEff(test_SEM_boot)

Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic

I've tried the quite a few things to try to narrow down what this issue might be. I tried making a simpler model with my data only using the variables that would require lm instead of glm. I tried specifying the model exactly as in the vignette, which uses list() instead of psem(). I wondered if something about the structure of my model was problematic so I tried using the exact same path model as in the vignette, but with my data instead of the keeley data.

All of these return Warning messages:
1: In (function (m, w) :
101 model fit(s) or parameter estimation(s) failed. NAs reported/generated. ....

This makes me think my data must be the issue, but I can't see why. Each individual model, and the piecewise SEM model I created seem to be fine - no convergence issues, no errors or warnings, the outputs seem to make sense.

Any thoughts on what I could do to address this issue?

summary(semEff(sem.Eff.boot)) output to matrix/dataframe

Hi there,

First of all thanks so much for this package - it has helped so much with my work on mixed effects multiple regression / sem modeling.

I am wondering if there is some way of converting the summary(semEff(sem.Eff.boot)) output from the text based tables into a matrix or dataframe format?

I was trying to look through the model object components to extract all of the numbers myself and put into a dataframe, but have been having trouble locating them. I am referring to the Effect, Bias, Std. Err., Lower CI, and Upper CI summary output. For example, this output I have been trying to get into a dataframe format:

Thanks for any help!

lapply bootEff

Hi again,

I'd like to apply the same path model structure to multiple groups, and look at the results using semEff. My strategy has been to use lapply. It doesn't seem to work with bootEff. Do you have any thoughts on how to do this?

data(keeley, package = "piecewiseSEM")
keeley <- as_tibble(keeley) %>%
mutate(group = sample(c("A", "B"), nrow(keeley), TRUE))

keeley_split <- group_split(keeley, group)

path_structure_keeley <- function(x) {

path <- psem(
lm(age ~ distance, data = x),
lm(hetero ~ distance, data = x),
lm(abiotic ~ distance, data = x),
lm(firesev ~ age, data = x),
lm(cover ~ firesev, data = x),
lm(rich ~ distance + hetero + abiotic + cover, data = x)
)
}

keeley_models <- lapply(keeley_split, path_structure_keeley)

keeley_boot <- lapply(keeley_models, bootEff, R = 1000)

Error in getData(m) :
'data' does not contain all variables used to fit model.

BootEff in psem with interaction terms

Hello¡
I'm trying to use the BootEff() function in a psem model composed of lmer models with some interaction terms (code attached at the end). I always get this error:

Warning message:
In (function (m, w) : 1001 model fit(s) or parameter estimation(s) failed. NAs reported/generated.

It seems that BootEff is not able to estimate these models. But, if I remove the interaction terms from the lmer models within the psem, it works perfectly. The problem is not related to random effects, because even with lm() models I get the same error when there are interaction terms.

Prod_psem <- psem(
lmer( Production ~ grass_proportion+
Treatment+Soil_fertility+Annual_Prep+
Treatment:Annual_Prep+
(1| Farm/Plot), data= Out_canopy_prod),

lmer( grass_proportion ~ Soil_fertility+Treatment+Mean_annual_Temp+
Annual_Prep+Treatment:Annual_Prep+
(1| Farm/Plot),data= Out_canopy_prod) ,

lmer(Soil_fertility ~ Treatment+
Mean_annual_Temp+(1| Farm),
data= Out_canopy_prod),
data = Out_canopy_prod)

Prod.sem.boot <- bootEff(Prod_psem, R = 1000, seed = 13, parallel = "snow", ran.eff = "Farm"))

Warning message:
In (function (m, w) :
1001 model fit(s) or parameter estimation(s) failed. NAs reported/generated.

I've also run the bootEff() including catch.err = FALSE and this is the error reported:

Error in contr.sum(levels(i)) :
not enough degrees of freedom to define contrasts

What goes wrong?

Thank you very much

Get the CI as a data.frame after bootstrap

Thank you so much for the nice package!

semEff() provides a very nice print summary of the CI and all the effects, that is great.
I wanted to get this summary as a table but I did not find how to (Have I missed something?)
getEff() does return bootstrap but not the CI, right? I found hard to recompute the bca interval from getEff result.

In the meantime, I just made a function which takes the output of semEff() and extract the coefficients from the summary.

get_table_semEff <- function(semeff = NULL) {

  effect_type <-  c("^DIRECT", "^INDIRECT", "TOTAL", "MEDIATORS")
  x  <- semeff[-1, ]
  effect_type_col <- x[[1]] 
  predictor <- x[[2]] 

  row_effect_type <- map_int(effect_type, ~which(str_detect(x[[1]], .x)))

  numeric_table <- map(x, as.numeric)
  # suppr table columns
  mask_chr_column <- map_lgl(numeric_table, ~all(is.na(.x)))
  # Keep the second columns: 
  mask_chr_column[2] <- FALSE 

  tab <- x[, !mask_chr_column]
  colnames(tab)[1] <- "predictor"

  output <- rbind(
    cbind(effect_type = "direct", tab[row_effect_type[1]:row_effect_type[2]-1, ]),
    cbind(effect_type = "indirect", tab[row_effect_type[2]:row_effect_type[3]-1, ]),
    cbind(effect_type = "total", tab[row_effect_type[3]:row_effect_type[4]-1, ]),
    cbind(effect_type = "mediators", tab[row_effect_type[4]:nrow(tab), ])
  )

  numeric_rows <- str_detect(output[["Effect"]], "\\d")
  output <- output[numeric_rows,]
  output[, -c(1:2)] <- apply(output[, -c(1:2)], 2, as.numeric)

  janitor::clean_names(output)
}

from_semEff_to_table <- function(x = NULL) {

  output <- map_dfr(x$Summary[-1], get_table_semEff, .id = "response") %>%
    as_tibble()

  output[, c("response", "predictor")] <- apply(
    output[, c("response", "predictor")], 2,
    function(x) str_replace_all(x, c("\\." = "_", "\\s" = ""))
  )
  output[, colnames(output) != "bias"]

}

ci_df <- from_semEff_to_table(x = my_semeff_output)

murphymv / semeff Goto Github PK

semeff's People

Contributors

Stargazers

Watchers

semeff's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs