GithubHelp home page GithubHelp logo

insp-rh / pifpaf Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 1.0 7.53 MB

Estimation of the Population Impact Fraction and Potential Impact Fraction

License: GNU General Public License v3.0

R 14.25% HTML 85.75%
potential-impact-fraction population-attributable-fraction pif paf public-health counterfactual-analysis comparative-risk-assesment r package

pifpaf's People

Contributors

dalia1992 avatar rodrigozepeda avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

al00014

pifpaf's Issues

Incorrect warning

The following code throws the warning:
Warning messages:
1: In if (thetaup < thetahat) { :
the condition has length > 1 and only the first element will be used
2: In if (thetahat < thetalow) { :
the condition has length > 1 and only the first element will be used

#Clear all
rm(list = ls())

#Cargamos los datos que lo truenan
load("Nosirve.rda")

#Create risk function
risk <- function(X,theta){

#Start empty vectors
r1 <- c()
r2 <- c()

#Transform SSB to BMI
Y <- X*theta[1]
` ` #Scenario for BMI < 25` ` Y1 <- Y[which(Y < 25)]` ` if (length(Y1) != 0){ r1 <- exp(Y1*theta[2]/5) }

#Scenario for BMI >= 25
Y2 <- Y[which(Y >= 25)]
if (length(Y2) != 0){ r2 <- exp(Y2*theta[3]/5) }

return(c(r1,r2))
}

#Nota que evaluado en 0 el RR da = 1
paf.confidence.one2one(X, thetavec, thetalow, thetaup, risk)

The data used:
thetavec <- c(1,2,3)
X <- c(1,2,3)
thetalow <- thetavec/2
thetaup <- thetavec + thetavec/2

create pif.confidence

We are missing a pif.confidence function that includes both confidence methods as paf.confidence does in the case of the Population Attributable Fraction

On Problem 1: Computing PAF with diverging integrals 2.2.2 Gamma case

I think, by using a beta distribution for the true exposure distribution, you weaken your argument since this is no longer strictly a diverging integrals situation. It is also a Distribution choosing situation (from part 3). The example also seems a bit contrived in two ways:

  1. If the mean is negative, one would not use a gamma distribution to describe the exposure distribution in the first place (especially this true distribution where most observations would be negative). By making the true mean negative, you've manufactured a scenario we fail the theta < mu/sigma^2 requirement for convergence.

  2. I still don't quite understand if negative PAF make sense. PAF stands for population attributable fraction. "fraction" meaning, it's implied that it's bounded from 0 to 1. The interpretation of the value is the proportion of the outcomes in the population that can be attributed to the exposure. On that level, I don't understand what a negative PAF means. I'm also wary of this scenario where intake can be negative. While in practice, I've seen people (including us) integrate over the negative real line in the past, I believe the literature consistently writes out the PAF as integrating from 0 to m (0 being minimum possible value and m being maximum possible value). So allowing for negative intakes is actually a huge paradigm shift, in my opinion. However, the negative true PAF is probably driven by the combination of positive theta (weight gain is unhealthy) and the fact that the true population has most people losing weight. The exposed population is overall healthier than a hypothetical unexposed population. Unless you make the assumption that theta = 0 (like we do) when exposure is better than the "ideal" scenario (in this case, when exposure is negative), it may not be possible or make sense to estimate proportion of outcomes attributable to exposure. It may be helpful to think of the PAF as comparing between an exposed distribution and an ideal distribution. Given the the RR function and distribution for this scenario, the ideal scenario would not be everyone at 0 BMI change so might not make sense to calculate PIF here.

However, while I have problems with the example, the point is taken that certain scenarios (high exposure effect, low mean, high variance) will lead to theoretical PAF of 1 (not good!)
Edit1: This is a not necessarily a purely theoretical scenario either. Nuts in the US may actually hit these three points.
Edit2: On second thought, since we make some assumptions ourselves (the TMRED is normally distributed and the RRs do not change after going beyond TMRED), we may have avoided this situation for nuts as well. That would explain why we didn't encounter near 1 PAFs for nuts. (P.S, Sorry for all these edits)

Check s2= sum(w_i^2)

Check whether or not to multiply by s2, that comes from

var(ln(R0)) approx 1/(R0^2)Var(sum wi R(xi)) by independence:
1/(R0^2)sum(Var(wi R(xi)))= 1/(R0^2)sum(wi^2 Var( R(xi))) by identical distribution:
1/(R0^2)Var(R(x))sum(wi^2).

Check whenever variance of R(X) or variance of R(f(X)) is calculated

PAF with infinite confidence

The following code returns -Inf as lower bound for PAF's CI:

set.seed(2374)
X <- rnorm(500, 3, 0.1)
RR <- function(X,theta){exp(X*theta)}
thetahat <- 0.2
paf.confidence(X, thetahat = 0.2, thetamin = 0.1, thetamax = 0.3, rr = RR, method = "one2one")

The real CI the function should return seems to be: c(0.2184727, 0.8270486)

Errors when deploying shiny app

Deploying the ShinyApp (via PIFApp()and shiny::runGitHub("pif", "INSP-RH", subdir = "inst/shiny-examples/PIFapp") returns the following errors:

Warning: Error in if: missing value where TRUE/FALSE needed
Stack trace (innermost first):
82: tagList
81: withMathJax
80: renderUI [/Library/Frameworks/R.framework/Versions/3.3/Resources/library/pif/shiny-examples/PIFapp/server.R#67]
79: func
78: origRenderFunc
77: output$theta
2: shiny::runApp
1: PIFApp

Warning in pif.confidence.inverse

The following code throws a warning in pif.confidence.inverse

#Example: Multidimensional example using approximate method
X1       <- rnorm(1000,3,.5)
X2       <- rnorm(1000,4,1)
X        <- as.matrix(cbind(X1,X2))
Xmean    <- colMeans(X)
Xvar     <- cov(X)
.Xmean   <- matrix(Xmean, ncol = length(Xmean))
.Xvar    <- matrix(Xvar, ncol = sqrt(length(Xvar)))
theta    <- c(0.12, 0.17)
thetasd  <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence.inverse(Xmean, thetahat = theta, thetavar = thetasd, 
rr=rr, method = "approximate", Xvar = Xvar)

On problem 3

I agree the choice of distribution matters for calculating PAF. And there is definitely an advantage to the empirical method where you don't have to worry about that. No need to oversell it with strange scenario where researcher uses Poisson distribution when the data is uniform (who would do such a thing? If they know what a Poisson distribution is, surely they would recognize that data doesn't fit distribution?)! I was actually surprised the bias was only 10%!

No need to do this if point of document is to clarify things for me, but I do think this exercise to see how sensitive PAFs are to different exposure distribution assumptions is interesting, and would be useful if comparing similar distributions (for example: log-normal, gamma and weibull which are all skewed and positive, and you can have true distribution be skewed beta) that reasonable researchers can disagree on. (Could also add normal and/or truncated normal since that would probably be the default distribution for the unthinking/rushed researcher)

I remain somewhat unconvinced that mis-specifying distribution would create so much bias if, say, the true distribution was right-skewed beta and the assumed distribution is gamma, or if the true distribution is truncated log-normal and used distribution is gamma.

On Problem 2: Truncation

sdlog<-sqrt(1.7)
meanlog<-.05

plnorm(q=0, meanlog=meanlog, sdlog=sdlog)
[1] 0
plnorm(q=2.5, meanlog=meanlog, sdlog=sdlog)
[1] 0.7467875
plnorm(q=5, meanlog=meanlog, sdlog=sdlog)
[1] 0.8841584
plnorm(q=10, meanlog=meanlog, sdlog=sdlog)
[1] 0.9579749
plnorm(q=100, meanlog=meanlog, sdlog=sdlog)
[1] 0.9997618

summary(rlnorm(n=100, meanlog=meanlog, sdlog=sdlog))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0161 0.4143 1.1530 3.3660 3.0470 84.8300

Sorry for ugly code and output. Above is just showing probably of observing values greater than 2.5, 5, 10, and 100 for log-normal, and summary stats for 100 random draws for distribution used in your example. I'm showing it because I think you can make the case that, assuming the parameters are based on data, and that the researchers are assuming truncated lognormal because they believe the exposure distribution is approximately shaped like a lognormal. that the researchers would not use a cutoff as low as 2.5 or 5. Even 10 is may be low. However, once you increase the cutoff to 10, your PAF will be, for all practical purposes, very close to the "true" PAF for the log-normal. Based on this example, it seems like truncation is likely not huge issue as long as max is picked in a reasonable way that is consistent with the data.

I think the log-normal may not be the best example for illustrating the how sensitive the PAF is to truncation since the the log-normal is fundamentally flawed (as you explain in 2.2.1) when R(X; theta) = exp(theta*X). Might be more interesting/helpful to look at gamma distributions that don't diverge and see how sensitive they are truncation (though I have no idea whether dtruc supports gamma).

All pif confidence intervals crash with real paf 1

Assume the following (solely theoretical) distribution:

X            <- rlnorm(100, 2.5, 7)  
thetahat <- 0.93
thetavar <- 0.00109
rr             <- function(X, theta){exp(X*theta)}

then all confidence intervals throw senseless confidence intervals:

 pif.confidence.linear(X, thetahat, thetavar, rr)
 pif.confidence.bootstrap(X, thetahat, thetavar, rr)
 pif.confidence.loglinear(X, thetahat, thetavar, rr)

Error on installment from Github

When trying to

devtools::install_github("INSP-RH/pif", build_vignettes = TRUE)

the following error occurs:

Error: processing vignette 'Introduction_to_pif_package.Rmd' failed with diagnostics:
object 's2' not found
Execution halted

check_exposure

check.exposure function checks that exposure levels are > 0 but we have shown that exposure levels can be negative. Add boolean option to not check it

Check test

Need to check one test. When building the following error appears:

Failed ---------------------------------------------------------------------------------------
1. Failure: Checking pif.approximate function errors (@test_pif_approximate.R#9) -------------
{
...
} did not throw an error.

Warning in pif.confidence when using kernel

Bootstrap is the only valid option for kernel; there is no point of the warning

In pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, :
Method of confidence interval estimation defaulted to bootstrap.

happening for the following case:


X <- rnorm(100)
thetahat <- 1
thetavar <- 1
rr <- function(X, theta){exp(theta*X)}
cft <- function(X){X/2}
pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, method = "kernel", thetavar = thetavar)


paf.confidence.loglinear

Why line 106 has .paf <- pif(.X, thetahat, rr, weights = weights)? On the other hand paf.confidence.approximate.loglinearand paf.confidence.loglinear estimate different things with .inverse. In particular, it seems two different methods are being used for the same loglinear case.

pif.heatmap construction NOTE

Following NOTE appears when constructing the package

paf.confidence.inverse: possible error in risk.ratio.confidence(X = X,
  thetahat = thetahat, thetasd = .thetavar, rr = rr, weights = weights,
  nsim = nsim, confidence = confidence, force.min = force.min): unused
  argument (thetasd = .thetavar)
pif.counterfactual.heatmap: no visible global function definition for
  ‘heat.colors’
pif.counterfactual.heatmap: no visible binding for global variable ‘b’
Undefined global functions or variables:
  b heat.colors
Consider adding
  importFrom("grDevices", "heat.colors")
to your NAMESPACE file.

thetahat must be MLE

In the help we should add that thetahat must be a Maximum Likelihood Estimator (we need asymptotic normality and consistency)

if (is.na(Xvar)) { : the condition has length > 1 and only the first element will be used

The following code throws the warning:

X1       <- rnorm(1000,3,.5)
X2       <- rnorm(1000,4,1)
X        <- as.matrix(cbind(X1,X2))
Xmean    <- colMeans(X)
Xvar     <- cov(X)
theta    <- c(0.12, 0.17)
thetasd  <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence(X = Xmean, thetahat = theta, thetavar = thetasd,
 rr = rr, Xvar = Xvar, est.method = "approximate")

Delete Entry Mult

For two matrices X and Y the function EntryMult can be replaced by sum(X*Y)

Shiny app loading

If loading shiny app via
pif::dalia.rex()
it does not recognize the Warnings1 function

Checking twice same conditions

Check function dependencies as some functions such as pif.variance.approximate.linearcheck twice that X and theta are correctly specified (once in them and once in the functions they call)

mvrnorm

Check the usage of mvrnorm in the other functions to make sure it says "empirical" and that variance is specified.

Multivariate Counterfactual

The following multivariate counterfactual raises error in every function
cft <- function(X){0.5*X[,1] + 0.25*X[,2]}

traceback shows error comes from check.cft

On Approximate empirical method

Quick question on blue text: Can this be right? If sigma^2 is near 0, wouldn't sigma_hat^2 also be near 0 and then the second term in the denominator wouldn't be necessary?

Upper CI < point estimate in "one2one"

The following code

set.seed(236478)
X1 <- rnorm(100, 3,.5)
X2 <- rnorm(100,3,.5)
X  <- as.matrix(cbind(X1,X2))
thetahat <- c(2, 19)
thetavar <- matrix(c(0.1, 0, 0, 0.05), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){
  .X <- matrix(X, ncol = 2)
  exp(theta[1]*.X[,1] + theta[2]*.X[,2])
}
cft <- function(X){0.5*X}


# Approximate 
Xmean <- t(as.matrix(colMeans(X)))
Xvar  <- cov(X)
paf.confidence(X = X, thetahat = thetahat, thetalow = c(0.05, 0), 
               thetaup = c(0.15, 0.08), rr = rr, confidence_method = "one2one")

returns an upper bound for the CI that is smaller than the point estimate of 1

Different point estimates from paf and paf.confidence

Using the paf_confidence_diesdataset these functions return different point estimates

paf(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)

paf.confidence(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)

as.matrix(X) destroys data.frames

Consider a numerical exposure X from a data frame with string covariates such as:
X <- data.frame(rnorm(100), sample(c("A","B"), 100, replace = TRUE))
the code we have now is unable to evaluate such functions as converting to matrix transforms the first entry to string.

Remove latex2exp

Stop using latex2exp as it requires the user has LaTeX installed

Error in pif when using kernel

The following code:

set.seed(2374)
X        <- rlnorm(100)
rr       <- function(X, theta){theta*X + 1}
cft      <- function(X){sqrt(X)}
thetahat <- 0.1943
pif(X, thetahat, rr, cft, method = "kernel")

throws the error:

Error in if (meancft < 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In sqrt(X) : NaNs produced

Variance in paf_confidence_linear

Check whether:
.var <- (1/.RO^2)s2( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)
or
.var <- (1/.RO^2)*( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)

s2 is sum(wi^2)

bootstrap lower > upper ci!

With the information on crashes_bootstrap.rda and doing pif.confidence.bootstrap(X, thetahat, thetavar, rr) the lower bound > upper bound of ci!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.