insp-rh / pifpaf Goto Github PK

Estimation of the Population Impact Fraction and Potential Impact Fraction

License: GNU General Public License v3.0

R 14.25% HTML 85.75%

potential-impact-fraction population-attributable-fraction pif paf public-health counterfactual-analysis comparative-risk-assesment r package

pifpaf's People

Contributors

Stargazers

Watchers

Forkers

al00014

pifpaf's Issues

Check thetas in pif.conditional.variance.linear

Check that the inputed values make sense as in the other cases

Incorrect warning

The following code throws the warning:
Warning messages:
1: In if (thetaup < thetahat) { :
the condition has length > 1 and only the first element will be used
2: In if (thetahat < thetalow) { :
the condition has length > 1 and only the first element will be used

#Clear all
rm(list = ls())

#Cargamos los datos que lo truenan
load("Nosirve.rda")

#Create risk function
risk <- function(X,theta){

#Start empty vectors
r1 <- c()
r2 <- c()

#Transform SSB to BMI
Y <- X*theta[1]
` ` #Scenario for BMI < 25` ` Y1 <- Y[which(Y < 25)]` ` if (length(Y1) != 0){ r1 <- exp(Y1*theta[2]/5) }

#Scenario for BMI >= 25
Y2 <- Y[which(Y >= 25)]
if (length(Y2) != 0){ r2 <- exp(Y2*theta[3]/5) }

return(c(r1,r2))
}

#Nota que evaluado en 0 el RR da = 1
paf.confidence.one2one(X, thetavec, thetalow, thetaup, risk)

The data used:
thetavec <- c(1,2,3)
X <- c(1,2,3)
thetalow <- thetavec/2
thetaup <- thetavec + thetavec/2

pif.approximate vs pif_approximate

What is the difference?

create pif.confidence

We are missing a pif.confidence function that includes both confidence methods as paf.confidence does in the case of the Population Attributable Fraction

paf.confidence not using forcing nor kernel

It seems that paf.confidence is not using either kernel nor force.min options

Confidence intervals of pif_plot

Include confidence intervals for X in pif_plot

On Problem 1: Computing PAF with diverging integrals 2.2.2 Gamma case

I think, by using a beta distribution for the true exposure distribution, you weaken your argument since this is no longer strictly a diverging integrals situation. It is also a Distribution choosing situation (from part 3). The example also seems a bit contrived in two ways:

If the mean is negative, one would not use a gamma distribution to describe the exposure distribution in the first place (especially this true distribution where most observations would be negative). By making the true mean negative, you've manufactured a scenario we fail the theta < mu/sigma^2 requirement for convergence.
I still don't quite understand if negative PAF make sense. PAF stands for population attributable fraction. "fraction" meaning, it's implied that it's bounded from 0 to 1. The interpretation of the value is the proportion of the outcomes in the population that can be attributed to the exposure. On that level, I don't understand what a negative PAF means. I'm also wary of this scenario where intake can be negative. While in practice, I've seen people (including us) integrate over the negative real line in the past, I believe the literature consistently writes out the PAF as integrating from 0 to m (0 being minimum possible value and m being maximum possible value). So allowing for negative intakes is actually a huge paradigm shift, in my opinion. However, the negative true PAF is probably driven by the combination of positive theta (weight gain is unhealthy) and the fact that the true population has most people losing weight. The exposed population is overall healthier than a hypothetical unexposed population. Unless you make the assumption that theta = 0 (like we do) when exposure is better than the "ideal" scenario (in this case, when exposure is negative), it may not be possible or make sense to estimate proportion of outcomes attributable to exposure. It may be helpful to think of the PAF as comparing between an exposed distribution and an ideal distribution. Given the the RR function and distribution for this scenario, the ideal scenario would not be everyone at 0 BMI change so might not make sense to calculate PIF here.

However, while I have problems with the example, the point is taken that certain scenarios (high exposure effect, low mean, high variance) will lead to theoretical PAF of 1 (not good!)
Edit1: This is a not necessarily a purely theoretical scenario either. Nuts in the US may actually hit these three points.
Edit2: On second thought, since we make some assumptions ourselves (the TMRED is normally distributed and the RRs do not change after going beyond TMRED), we may have avoided this situation for nuts as well. That would explain why we didn't encounter near 1 PAFs for nuts. (P.S, Sorry for all these edits)

pif.confidence.linear not passing options to pif

Check s2= sum(w_i^2)

Check whether or not to multiply by s2, that comes from

var(ln(R0)) approx 1/(R0^2)Var(sum wi R(xi)) by independence:
1/(R0^2)sum(Var(wi R(xi)))= 1/(R0^2)sum(wi^2 Var( R(xi))) by identical distribution:
1/(R0^2)Var(R(x))sum(wi^2).

Check whenever variance of R(X) or variance of R(f(X)) is calculated

Entry Mult returning non_numeric arguments?

Why is EntryMult returning arguments that are not numeric? (i.e. why do we need the as.numeric for results from EntryMult to match numbers?)

paf_confidence_one_to_one.R IS NOT (1-alpha) X 100%

The article On confidence intervals for nonmonotone parametric functions and an application to the squared mean of the normal distribution requires a different lower bound for the CI. The one_to_one here is only valid for monotonically increasing functions whose sum is also monotonically increasing and is not what we are writting in the paper.

PAF with infinite confidence

The following code returns -Inf as lower bound for PAF's CI:

set.seed(2374)
X <- rnorm(500, 3, 0.1)
RR <- function(X,theta){exp(X*theta)}
thetahat <- 0.2
paf.confidence(X, thetahat = 0.2, thetamin = 0.1, thetamax = 0.3, rr = RR, method = "one2one")

The real CI the function should return seems to be: c(0.2184727, 0.8270486)

paf.confidence and pif.confidence have different ordering in inputs

Errors when deploying shiny app

Deploying the ShinyApp (via PIFApp()and shiny::runGitHub("pif", "INSP-RH", subdir = "inst/shiny-examples/PIFapp") returns the following errors:

Warning: Error in if: missing value where TRUE/FALSE needed
Stack trace (innermost first):
82: tagList
81: withMathJax
80: renderUI [/Library/Frameworks/R.framework/Versions/3.3/Resources/library/pif/shiny-examples/PIFapp/server.R#67]
79: func
78: origRenderFunc
77: output$theta
2: shiny::runApp
1: PIFApp

Warning in pif.confidence.inverse

The following code throws a warning in pif.confidence.inverse

#Example: Multidimensional example using approximate method
X1       <- rnorm(1000,3,.5)
X2       <- rnorm(1000,4,1)
X        <- as.matrix(cbind(X1,X2))
Xmean    <- colMeans(X)
Xvar     <- cov(X)
.Xmean   <- matrix(Xmean, ncol = length(Xmean))
.Xvar    <- matrix(Xvar, ncol = sqrt(length(Xvar)))
theta    <- c(0.12, 0.17)
thetasd  <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence.inverse(Xmean, thetahat = theta, thetavar = thetasd, 
rr=rr, method = "approximate", Xvar = Xvar)

On problem 3

I agree the choice of distribution matters for calculating PAF. And there is definitely an advantage to the empirical method where you don't have to worry about that. No need to oversell it with strange scenario where researcher uses Poisson distribution when the data is uniform (who would do such a thing? If they know what a Poisson distribution is, surely they would recognize that data doesn't fit distribution?)! I was actually surprised the bias was only 10%!

No need to do this if point of document is to clarify things for me, but I do think this exercise to see how sensitive PAFs are to different exposure distribution assumptions is interesting, and would be useful if comparing similar distributions (for example: log-normal, gamma and weibull which are all skewed and positive, and you can have true distribution be skewed beta) that reasonable researchers can disagree on. (Could also add normal and/or truncated normal since that would probably be the default distribution for the unthinking/rushed researcher)

I remain somewhat unconvinced that mis-specifying distribution would create so much bias if, say, the true distribution was right-skewed beta and the assumed distribution is gamma, or if the true distribution is truncated log-normal and used distribution is gamma.

On Problem 2: Truncation

sdlog<-sqrt(1.7)
meanlog<-.05

plnorm(q=0, meanlog=meanlog, sdlog=sdlog)
[1] 0
plnorm(q=2.5, meanlog=meanlog, sdlog=sdlog)
[1] 0.7467875
plnorm(q=5, meanlog=meanlog, sdlog=sdlog)
[1] 0.8841584
plnorm(q=10, meanlog=meanlog, sdlog=sdlog)
[1] 0.9579749
plnorm(q=100, meanlog=meanlog, sdlog=sdlog)
[1] 0.9997618

summary(rlnorm(n=100, meanlog=meanlog, sdlog=sdlog))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0161 0.4143 1.1530 3.3660 3.0470 84.8300

Sorry for ugly code and output. Above is just showing probably of observing values greater than 2.5, 5, 10, and 100 for log-normal, and summary stats for 100 random draws for distribution used in your example. I'm showing it because I think you can make the case that, assuming the parameters are based on data, and that the researchers are assuming truncated lognormal because they believe the exposure distribution is approximately shaped like a lognormal. that the researchers would not use a cutoff as low as 2.5 or 5. Even 10 is may be low. However, once you increase the cutoff to 10, your PAF will be, for all practical purposes, very close to the "true" PAF for the log-normal. Based on this example, it seems like truncation is likely not huge issue as long as max is picked in a reasonable way that is consistent with the data.

I think the log-normal may not be the best example for illustrating the how sensitive the PAF is to truncation since the the log-normal is fundamentally flawed (as you explain in 2.2.1) when R(X; theta) = exp(theta*X). Might be more interesting/helpful to look at gamma distributions that don't diverge and see how sensitive they are truncation (though I have no idea whether dtruc supports gamma).

All pif confidence intervals crash with real paf 1

Assume the following (solely theoretical) distribution:

X            <- rlnorm(100, 2.5, 7)  
thetahat <- 0.93
thetavar <- 0.00109
rr             <- function(X, theta){exp(X*theta)}

then all confidence intervals throw senseless confidence intervals:

 pif.confidence.linear(X, thetahat, thetavar, rr)
 pif.confidence.bootstrap(X, thetahat, thetavar, rr)
 pif.confidence.loglinear(X, thetahat, thetavar, rr)

Delete paf_confidence approximate

paf_confidence_approximate is redundant with pif_confidence_approximate

Change plot3D for heatmap in README

The code has been changed, but it still says 3D plot

``CovR0.RC``

In pif.conditional.variance.linear line 66 why is round used? @dalia1992

Error on installment from Github

When trying to

devtools::install_github("INSP-RH/pif", build_vignettes = TRUE)

the following error occurs:

Error: processing vignette 'Introduction_to_pif_package.Rmd' failed with diagnostics:
object 's2' not found
Execution halted

check_exposure

check.exposure function checks that exposure levels are > 0 but we have shown that exposure levels can be negative. Add boolean option to not check it

Check test

Need to check one test. When building the following error appears:

Failed ---------------------------------------------------------------------------------------
1. Failure: Checking pif.approximate function errors (@test_pif_approximate.R#9) -------------
{
...
} did not throw an error.

Warning in pif.confidence when using kernel

Bootstrap is the only valid option for kernel; there is no point of the warning

In pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, :
Method of confidence interval estimation defaulted to bootstrap.

happening for the following case:


X <- rnorm(100)
thetahat <- 1
thetavar <- 1
rr <- function(X, theta){exp(theta*X)}
cft <- function(X){X/2}
pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, method = "kernel", thetavar = thetavar)

paf.confidence.loglinear

Why line 106 has .paf <- pif(.X, thetahat, rr, weights = weights)? On the other hand paf.confidence.approximate.loglinearand paf.confidence.loglinear estimate different things with .inverse. In particular, it seems two different methods are being used for the same loglinear case.

pif.heatmap construction NOTE

Following NOTE appears when constructing the package

paf.confidence.inverse: possible error in risk.ratio.confidence(X = X,
  thetahat = thetahat, thetasd = .thetavar, rr = rr, weights = weights,
  nsim = nsim, confidence = confidence, force.min = force.min): unused
  argument (thetasd = .thetavar)
pif.counterfactual.heatmap: no visible global function definition for
  ‘heat.colors’
pif.counterfactual.heatmap: no visible binding for global variable ‘b’
Undefined global functions or variables:
  b heat.colors
Consider adding
  importFrom("grDevices", "heat.colors")
to your NAMESPACE file.

Check the upper limits for ci in bootstrap and linear

Check the applicability / adaptability of the methods described in

THE LOWER CONFIDENCE LIMIT FOR THE MEAN OF POSITIVE RANDOM VARIABLES

as it might not be that simple.

thetahat must be MLE

In the help we should add that thetahat must be a Maximum Likelihood Estimator (we need asymptotic normality and consistency)

if (is.na(Xvar)) { : the condition has length > 1 and only the first element will be used

The following code throws the warning:

X1       <- rnorm(1000,3,.5)
X2       <- rnorm(1000,4,1)
X        <- as.matrix(cbind(X1,X2))
Xmean    <- colMeans(X)
Xvar     <- cov(X)
theta    <- c(0.12, 0.17)
thetasd  <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence(X = Xmean, thetahat = theta, thetavar = thetasd,
 rr = rr, Xvar = Xvar, est.method = "approximate")

Delete Entry Mult

For two matrices X and Y the function EntryMult can be replaced by sum(X*Y)

Shiny app loading

If loading shiny app via
pif::dalia.rex()
it does not recognize the Warnings1 function

Checking twice same conditions

Check function dependencies as some functions such as pif.variance.approximate.linearcheck twice that X and theta are correctly specified (once in them and once in the functions they call)

mvrnorm

Check the usage of mvrnorm in the other functions to make sure it says "empirical" and that variance is specified.

Multivariate Counterfactual

The following multivariate counterfactual raises error in every function
cft <- function(X){0.5*X[,1] + 0.25*X[,2]}

traceback shows error comes from check.cft

Import numDeriv when using grad

Need to check the functions in order to import numDeriv every time grad is used

Warnings in pif.counterfactual.3d

The following code:

set.seed(242)
X <- rnorm(1000, 9, 3)
pif.counterfactual.3D(X, 0.11, rr)

returns more than 50 warnings.

Incorrect PAF in Markdown

The associated Markdown incorrectly estimates a PAF of 65000...

On Approximate empirical method

Quick question on blue text: Can this be right? If sigma^2 is near 0, wouldn't sigma_hat^2 also be near 0 and then the second term in the denominator wouldn't be necessary?

Upper CI < point estimate in "one2one"

The following code

set.seed(236478)
X1 <- rnorm(100, 3,.5)
X2 <- rnorm(100,3,.5)
X  <- as.matrix(cbind(X1,X2))
thetahat <- c(2, 19)
thetavar <- matrix(c(0.1, 0, 0, 0.05), byrow = TRUE, nrow = 2)
rr       <- function(X, theta){
  .X <- matrix(X, ncol = 2)
  exp(theta[1]*.X[,1] + theta[2]*.X[,2])
}
cft <- function(X){0.5*X}


# Approximate 
Xmean <- t(as.matrix(colMeans(X)))
Xvar  <- cov(X)
paf.confidence(X = X, thetahat = thetahat, thetalow = c(0.05, 0), 
               thetaup = c(0.15, 0.08), rr = rr, confidence_method = "one2one")

returns an upper bound for the CI that is smaller than the point estimate of 1

Different point estimates from paf and paf.confidence

Using the paf_confidence_diesdataset these functions return different point estimates

paf(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)

paf.confidence(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)

as.matrix(X) destroys data.frames

Consider a numerical exposure X from a data frame with string covariates such as:
X <- data.frame(rnorm(100), sample(c("A","B"), 100, replace = TRUE))
the code we have now is unable to evaluate such functions as converting to matrix transforms the first entry to string.

Remove latex2exp

Stop using latex2exp as it requires the user has LaTeX installed

Correct covariance in pif_confidence_loglinear

Calculation of covariance in pif_confidence _loglinear is incorrect. Need to recalculate it

Change default method of CI to bootstrap

Approximate in paf

We need to add the approximate option to paffunction under method

Error in pif when using kernel

The following code:

set.seed(2374)
X        <- rlnorm(100)
rr       <- function(X, theta){theta*X + 1}
cft      <- function(X){sqrt(X)}
thetahat <- 0.1943
pif(X, thetahat, rr, cft, method = "kernel")

throws the error:

Error in if (meancft < 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In sqrt(X) : NaNs produced

Square root of length?

In paf.confidence.approximate.loglinear line 70:

.Xvar <- matrix(Xvar, ncol = sqrt(length(Xvar)))

why is square root taken? @dalia1992

Variance in paf_confidence_linear

Check whether:
.var <- (1/.RO^2)s2( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)
or
.var <- (1/.RO^2)*( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)

s2 is sum(wi^2)

bootstrap lower > upper ci!

With the information on crashes_bootstrap.rda and doing pif.confidence.bootstrap(X, thetahat, thetavar, rr) the lower bound > upper bound of ci!

insp-rh / pifpaf Goto Github PK

pifpaf's People

Contributors

Stargazers

Watchers

Forkers

pifpaf's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs