insp-rh / pifpaf Goto Github PK
View Code? Open in Web Editor NEWEstimation of the Population Impact Fraction and Potential Impact Fraction
License: GNU General Public License v3.0
Estimation of the Population Impact Fraction and Potential Impact Fraction
License: GNU General Public License v3.0
Check that the inputed values make sense as in the other cases
The following code throws the warning:
Warning messages:
1: In if (thetaup < thetahat) { :
the condition has length > 1 and only the first element will be used
2: In if (thetahat < thetalow) { :
the condition has length > 1 and only the first element will be used
#Clear all
rm(list = ls())
#Cargamos los datos que lo truenan
load("Nosirve.rda")
#Create risk function
risk <- function(X,theta){
#Start empty vectors
r1 <- c()
r2 <- c()
#Transform SSB to BMI
Y <- X*theta[1]
` ` #Scenario for BMI < 25` ` Y1 <- Y[which(Y < 25)]` ` if (length(Y1) != 0){ r1 <- exp(Y1*theta[2]/5) }
#Scenario for BMI >= 25
Y2 <- Y[which(Y >= 25)]
if (length(Y2) != 0){ r2 <- exp(Y2*theta[3]/5) }
return(c(r1,r2))
}
#Nota que evaluado en 0 el RR da = 1
paf.confidence.one2one(X, thetavec, thetalow, thetaup, risk)
The data used:
thetavec <- c(1,2,3)
X <- c(1,2,3)
thetalow <- thetavec/2
thetaup <- thetavec + thetavec/2
What is the difference?
We are missing a pif.confidence
function that includes both confidence methods as paf.confidence
does in the case of the Population Attributable Fraction
It seems that paf.confidence
is not using either kernel
nor force.min
options
Include confidence intervals for X in pif_plot
I think, by using a beta distribution for the true exposure distribution, you weaken your argument since this is no longer strictly a diverging integrals situation. It is also a Distribution choosing situation (from part 3). The example also seems a bit contrived in two ways:
If the mean is negative, one would not use a gamma distribution to describe the exposure distribution in the first place (especially this true distribution where most observations would be negative). By making the true mean negative, you've manufactured a scenario we fail the theta < mu/sigma^2 requirement for convergence.
I still don't quite understand if negative PAF make sense. PAF stands for population attributable fraction. "fraction" meaning, it's implied that it's bounded from 0 to 1. The interpretation of the value is the proportion of the outcomes in the population that can be attributed to the exposure. On that level, I don't understand what a negative PAF means. I'm also wary of this scenario where intake can be negative. While in practice, I've seen people (including us) integrate over the negative real line in the past, I believe the literature consistently writes out the PAF as integrating from 0 to m (0 being minimum possible value and m being maximum possible value). So allowing for negative intakes is actually a huge paradigm shift, in my opinion. However, the negative true PAF is probably driven by the combination of positive theta (weight gain is unhealthy) and the fact that the true population has most people losing weight. The exposed population is overall healthier than a hypothetical unexposed population. Unless you make the assumption that theta = 0 (like we do) when exposure is better than the "ideal" scenario (in this case, when exposure is negative), it may not be possible or make sense to estimate proportion of outcomes attributable to exposure. It may be helpful to think of the PAF as comparing between an exposed distribution and an ideal distribution. Given the the RR function and distribution for this scenario, the ideal scenario would not be everyone at 0 BMI change so might not make sense to calculate PIF here.
However, while I have problems with the example, the point is taken that certain scenarios (high exposure effect, low mean, high variance) will lead to theoretical PAF of 1 (not good!)
Edit1: This is a not necessarily a purely theoretical scenario either. Nuts in the US may actually hit these three points.
Edit2: On second thought, since we make some assumptions ourselves (the TMRED is normally distributed and the RRs do not change after going beyond TMRED), we may have avoided this situation for nuts as well. That would explain why we didn't encounter near 1 PAFs for nuts. (P.S, Sorry for all these edits)
Check whether or not to multiply by s2, that comes from
var(ln(R0)) approx 1/(R0^2)Var(sum wi R(xi)) by independence:
1/(R0^2)sum(Var(wi R(xi)))= 1/(R0^2)sum(wi^2 Var( R(xi))) by identical distribution:
1/(R0^2)Var(R(x))sum(wi^2).
Check whenever variance of R(X) or variance of R(f(X)) is calculated
Why is EntryMult
returning arguments that are not numeric? (i.e. why do we need the as.numeric
for results from EntryMult
to match numbers?)
The article On confidence intervals for nonmonotone parametric functions and an application to the squared mean of the normal distribution requires a different lower bound for the CI. The one_to_one here is only valid for monotonically increasing functions whose sum is also monotonically increasing and is not what we are writting in the paper.
The following code returns -Inf as lower bound for PAF's CI:
set.seed(2374)
X <- rnorm(500, 3, 0.1)
RR <- function(X,theta){exp(X*theta)}
thetahat <- 0.2
paf.confidence(X, thetahat = 0.2, thetamin = 0.1, thetamax = 0.3, rr = RR, method = "one2one")
The real CI the function should return seems to be: c(0.2184727, 0.8270486)
Deploying the ShinyApp (via PIFApp()
and shiny::runGitHub("pif", "INSP-RH", subdir = "inst/shiny-examples/PIFapp")
returns the following errors:
Warning: Error in if: missing value where TRUE/FALSE needed
Stack trace (innermost first):
82: tagList
81: withMathJax
80: renderUI [/Library/Frameworks/R.framework/Versions/3.3/Resources/library/pif/shiny-examples/PIFapp/server.R#67]
79: func
78: origRenderFunc
77: output$theta
2: shiny::runApp
1: PIFApp
The following code throws a warning in pif.confidence.inverse
#Example: Multidimensional example using approximate method
X1 <- rnorm(1000,3,.5)
X2 <- rnorm(1000,4,1)
X <- as.matrix(cbind(X1,X2))
Xmean <- colMeans(X)
Xvar <- cov(X)
.Xmean <- matrix(Xmean, ncol = length(Xmean))
.Xvar <- matrix(Xvar, ncol = sqrt(length(Xvar)))
theta <- c(0.12, 0.17)
thetasd <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence.inverse(Xmean, thetahat = theta, thetavar = thetasd,
rr=rr, method = "approximate", Xvar = Xvar)
I agree the choice of distribution matters for calculating PAF. And there is definitely an advantage to the empirical method where you don't have to worry about that. No need to oversell it with strange scenario where researcher uses Poisson distribution when the data is uniform (who would do such a thing? If they know what a Poisson distribution is, surely they would recognize that data doesn't fit distribution?)! I was actually surprised the bias was only 10%!
No need to do this if point of document is to clarify things for me, but I do think this exercise to see how sensitive PAFs are to different exposure distribution assumptions is interesting, and would be useful if comparing similar distributions (for example: log-normal, gamma and weibull which are all skewed and positive, and you can have true distribution be skewed beta) that reasonable researchers can disagree on. (Could also add normal and/or truncated normal since that would probably be the default distribution for the unthinking/rushed researcher)
I remain somewhat unconvinced that mis-specifying distribution would create so much bias if, say, the true distribution was right-skewed beta and the assumed distribution is gamma, or if the true distribution is truncated log-normal and used distribution is gamma.
sdlog<-sqrt(1.7)
meanlog<-.05plnorm(q=0, meanlog=meanlog, sdlog=sdlog)
[1] 0
plnorm(q=2.5, meanlog=meanlog, sdlog=sdlog)
[1] 0.7467875
plnorm(q=5, meanlog=meanlog, sdlog=sdlog)
[1] 0.8841584
plnorm(q=10, meanlog=meanlog, sdlog=sdlog)
[1] 0.9579749
plnorm(q=100, meanlog=meanlog, sdlog=sdlog)
[1] 0.9997618summary(rlnorm(n=100, meanlog=meanlog, sdlog=sdlog))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0161 0.4143 1.1530 3.3660 3.0470 84.8300
Sorry for ugly code and output. Above is just showing probably of observing values greater than 2.5, 5, 10, and 100 for log-normal, and summary stats for 100 random draws for distribution used in your example. I'm showing it because I think you can make the case that, assuming the parameters are based on data, and that the researchers are assuming truncated lognormal because they believe the exposure distribution is approximately shaped like a lognormal. that the researchers would not use a cutoff as low as 2.5 or 5. Even 10 is may be low. However, once you increase the cutoff to 10, your PAF will be, for all practical purposes, very close to the "true" PAF for the log-normal. Based on this example, it seems like truncation is likely not huge issue as long as max is picked in a reasonable way that is consistent with the data.
I think the log-normal may not be the best example for illustrating the how sensitive the PAF is to truncation since the the log-normal is fundamentally flawed (as you explain in 2.2.1) when R(X; theta) = exp(theta*X). Might be more interesting/helpful to look at gamma distributions that don't diverge and see how sensitive they are truncation (though I have no idea whether dtruc supports gamma).
Assume the following (solely theoretical) distribution:
X <- rlnorm(100, 2.5, 7)
thetahat <- 0.93
thetavar <- 0.00109
rr <- function(X, theta){exp(X*theta)}
then all confidence intervals throw senseless confidence intervals:
pif.confidence.linear(X, thetahat, thetavar, rr)
pif.confidence.bootstrap(X, thetahat, thetavar, rr)
pif.confidence.loglinear(X, thetahat, thetavar, rr)
paf_confidence_approximate is redundant with pif_confidence_approximate
The code has been changed, but it still says 3D plot
In pif.conditional.variance.linear
line 66 why is round
used? @dalia1992
When trying to
devtools::install_github("INSP-RH/pif", build_vignettes = TRUE)
the following error occurs:
Error: processing vignette 'Introduction_to_pif_package.Rmd' failed with diagnostics:
object 's2' not found
Execution halted
check.exposure
function checks that exposure levels are > 0
but we have shown that exposure levels can be negative. Add boolean option to not check it
Need to check one test. When building the following error appears:
Failed ---------------------------------------------------------------------------------------
1. Failure: Checking pif.approximate function errors (@test_pif_approximate.R#9) -------------
{
...
} did not throw an error.
Bootstrap is the only valid option for kernel
; there is no point of the warning
In pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, :
Method of confidence interval estimation defaulted to bootstrap.
happening for the following case:
X <- rnorm(100)
thetahat <- 1
thetavar <- 1
rr <- function(X, theta){exp(theta*X)}
cft <- function(X){X/2}
pif.confidence(X = X, thetahat = thetahat, rr = rr, cft = cft, method = "kernel", thetavar = thetavar)
Why line 106
has .paf <- pif(.X, thetahat, rr, weights = weights)
? On the other hand paf.confidence.approximate.loglinear
and paf.confidence.loglinear
estimate different things with .inverse
. In particular, it seems two different methods are being used for the same loglinear case.
Following NOTE appears when constructing the package
paf.confidence.inverse: possible error in risk.ratio.confidence(X = X,
thetahat = thetahat, thetasd = .thetavar, rr = rr, weights = weights,
nsim = nsim, confidence = confidence, force.min = force.min): unused
argument (thetasd = .thetavar)
pif.counterfactual.heatmap: no visible global function definition for
‘heat.colors’
pif.counterfactual.heatmap: no visible binding for global variable ‘b’
Undefined global functions or variables:
b heat.colors
Consider adding
importFrom("grDevices", "heat.colors")
to your NAMESPACE file.
Check the applicability / adaptability of the methods described in
THE LOWER CONFIDENCE LIMIT FOR THE MEAN OF POSITIVE RANDOM VARIABLES
as it might not be that simple.
In the help we should add that thetahat must be a Maximum Likelihood Estimator (we need asymptotic normality and consistency)
The following code throws the warning:
X1 <- rnorm(1000,3,.5)
X2 <- rnorm(1000,4,1)
X <- as.matrix(cbind(X1,X2))
Xmean <- colMeans(X)
Xvar <- cov(X)
theta <- c(0.12, 0.17)
thetasd <- matrix(c(0.001, 0.00001, 0.00001, 0.004), byrow = TRUE, nrow = 2)
rr <- function(X, theta){exp(theta[1]*X[,1] + theta[2]*X[,2])}
paf.confidence(X = Xmean, thetahat = theta, thetavar = thetasd,
rr = rr, Xvar = Xvar, est.method = "approximate")
For two matrices X
and Y
the function EntryMult
can be replaced by sum(X*Y)
If loading shiny app via
pif::dalia.rex()
it does not recognize the Warnings1 function
Check function dependencies as some functions such as pif.variance.approximate.linear
check twice that X and theta are correctly specified (once in them and once in the functions they call)
Check the usage of mvrnorm in the other functions to make sure it says "empirical" and that variance is specified.
The following multivariate counterfactual raises error in every function
cft <- function(X){0.5*X[,1] + 0.25*X[,2]}
traceback shows error comes from check.cft
Need to check the functions in order to import numDeriv every time grad is used
The following code:
set.seed(242)
X <- rnorm(1000, 9, 3)
pif.counterfactual.3D(X, 0.11, rr)
returns more than 50 warnings.
The associated Markdown incorrectly estimates a PAF of 65000...
Quick question on blue text: Can this be right? If sigma^2 is near 0, wouldn't sigma_hat^2 also be near 0 and then the second term in the denominator wouldn't be necessary?
The following code
set.seed(236478)
X1 <- rnorm(100, 3,.5)
X2 <- rnorm(100,3,.5)
X <- as.matrix(cbind(X1,X2))
thetahat <- c(2, 19)
thetavar <- matrix(c(0.1, 0, 0, 0.05), byrow = TRUE, nrow = 2)
rr <- function(X, theta){
.X <- matrix(X, ncol = 2)
exp(theta[1]*.X[,1] + theta[2]*.X[,2])
}
cft <- function(X){0.5*X}
# Approximate
Xmean <- t(as.matrix(colMeans(X)))
Xvar <- cov(X)
paf.confidence(X = X, thetahat = thetahat, thetalow = c(0.05, 0),
thetaup = c(0.15, 0.08), rr = rr, confidence_method = "one2one")
returns an upper bound for the CI that is smaller than the point estimate of 1
Using the paf_confidence_dies
dataset these functions return different point estimates
paf(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)
paf.confidence(X = wpr_mean, thetahat = thetahat, rr = rr, method = "approximate", Xvar = wpr_var, check_rr = FALSE)
Consider a numerical exposure X from a data frame with string covariates such as:
X <- data.frame(rnorm(100), sample(c("A","B"), 100, replace = TRUE))
the code we have now is unable to evaluate such functions as converting to matrix transforms the first entry to string.
Stop using latex2exp
as it requires the user has LaTeX installed
Calculation of covariance in pif_confidence _loglinear is incorrect. Need to recalculate it
We need to add the approximate option to paf
function under method
The following code:
set.seed(2374)
X <- rlnorm(100)
rr <- function(X, theta){theta*X + 1}
cft <- function(X){sqrt(X)}
thetahat <- 0.1943
pif(X, thetahat, rr, cft, method = "kernel")
throws the error:
Error in if (meancft < 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In sqrt(X) : NaNs produced
In paf.confidence.approximate.loglinear
line 70:
.Xvar <- matrix(Xvar, ncol = sqrt(length(Xvar)))
why is square root taken? @dalia1992
Check whether:
.var <- (1/.RO^2)s2( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)
or
.var <- (1/.RO^2)*( s / (s^2 - s2) ) * weighted.mean((rr(.X,.theta) - .RO)^2, weights)
s2 is sum(wi^2)
With the information on crashes_bootstrap.rda
and doing pif.confidence.bootstrap(X, thetahat, thetavar, rr)
the lower bound > upper bound of ci!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.