xrobin / proc Goto Github PK

View Code? Open in Web Editor NEW

117.0 8.0 31.0 2.11 MB

Display and analyze ROC curves in R and S+

Home Page: https://cran.r-project.org/web/packages/pROC/

License: GNU General Public License v3.0

R 98.84% C++ 1.11% SourcePawn 0.05%

roc cran r roc-curve bootstrapping hypothesis-testing plot plotting variance covariance

proc's Issues

Fast Calculation for Area Under ROC curve

The area under the ROC curve can be calculated directly from a vector of predictions and a vector of binary labels using the Mann-Whitney U Test. Since this algorithm does not require calculating the ROC curve, it can provide a significant performance increase. My benchmarks on show that, on 10 thousand observations, this algorithm is 1,000 times faster than calculating AUROC with your package (2100 milliseconds seconds vs 2.3 milliseconds).

Would you be interested in adding a C++ implementation of this algorithm to your package? The speedup that this algorithm provides would be valuable for users who need to evaluate hundreds to thousands of models (e.g. with a grid search over a feature / hyper-parameter space).

If you are interested in this contributions to your package, please let me know.

Precision-recall curves

Any chance that you might add these?

ggroc x-axis

Is there an easy way to have ggroc plot the false positive rate (1 - specificity) on the x-axis?

By default ggroc plots specificity on the reversed x-axis over [1, 0], instead of the perhaps more familiar (1-specificity) over [0, 1].

Vertical, horizontal and threshold averaging

Implement vertical, horizontal and threshold averaging, as described by Fawcett

Input: list of roc curves.

Output: object with behavior like ROC. Can calculate AUC and plot, but not CI.

See also this Stack Overflow question for possible visualization.

"DeLong's test should not be applied to ROC curves with a different direction"

Hi!

I am computing ROC curves to compare a new score to previously developed scores. Visually when looking at AUROC curves, the new score seems to outperfom all old scores. One of the old scores performs quite bad and reminds the letter S (AUC = 0.52 and half of the curve is under the line of identity and the other half is ontop of the line of identity). I receive an error message when trying to analysis it with Delong's test

roc.test(score_new, score_old_3, method = "delong")

"Warning message:
In roc.test.roc(score_new, score_old_3, method = "delong") :
DeLong's test should not be applied to ROC curves with a different direction."

According to Delong's test the new score is better than other scores (p<0.05), but with this bad-performing score_old_3, the p-value is 0.08. The problem remains with bootstrap and venkatraman. I do not thrust the results. How would you recommend me to analyze this?

Thanks for the help,
Oscar

Transpose coords

Coords returns a matrix with thresholds in columns, and the measurement in rows.

This has always been a bit weird but is becoming problematic with pipelines where a data.frame in transposed form would be better suited.

Expected behavior

> library(dplyr)
> roc(aSAH, outcome, wfns) %>% coords()
Setting levels: control = Good, case = Poor
Setting direction: controls < cases
                threshold specificity sensitivity
      -Inf   0.0000000   1.0000000
       1.5   0.5138889   0.9512195
       2.5   0.7916667   0.6585366
       3.5   0.8333333   0.6341463
       4.5   0.9444444   0.4390244
       Inf   1.0000000   0.0000000

NAMESPACE / S3

Hi,

we use multiclass.roc in mlr here:

https://github.com/berndbischl/mlr

It seems to be that auc.roc etc are S3 methods in your package. But you do not mark them as such in your NAMESPACE, which is probably incorrect.

In mlr this triggers now a bug, where we requireNamespace("pROC"), then call multiclass.roc, then this does not find auc.roc, although that function lives in the same package.

Could this please be fixed?

DeLongPlacements produces NA/NaNs when cases or controls contains a single observation

Surfaced with issue #25.

r <- roc(c("A", "B"), c(0, 1))
ci(r)

r <- roc(c("A", "B", "A", "A"), c(0, 1, 0.5, 1.1))
var(r)

The problem happens when one group has only one observation.

Calculate auc in pROc

I'm trying to calculate auc based on a pROC package. I use the formula:

auc(set_temp$def_woe,set_temp$total_pymnt_woe)

Unfortunately, for some variables, I getting an error:
Error in if (thresholds[tie.idx] == unique.candidates[tie.idx -1]) { : argument is of length zero

ggroc with several aesthetics

Hi,

I am trying to plot two roc curves in the same figure. I would like to have different colour as well as different linetype for each. However, i can do only one at a time and not simultaneously. That is, i can have different colour but same linetype :

`ggroc(list(myrocglm, myrocrf), legacy.axes = T) + geom_abline(intercept = 0,slope = 1)`

or different linetype but same color :

`ggroc(list(myrocglm, myrocrf), aes= "linetype", legacy.axes = T) + geom_abline(intercept = 0,slope = 1)`

And if i try to add the color parameter in the fuction above, it works only for one value, i.e. color="red". For more i get the following error :

"Error: Aesthetics must be either length 1 or the same as the data (353): colour"

Thanks,
John

par('mar') and par('mfg') are not set when add=TRUE

They are set with par() only on new plots. Not sure if this is a bug, a feature, or has no effect.
It may or may not cause or be related to issue #9.

Plotting outside of the plot area

Something goes wrong when setting par(mar=...), calling plot.roc, axis and plot.roc again with add=TRUE. Visible only when xlim/ylim are set (or maybe also with massive margins?)

Compare:

roc1 <- roc(aSAH$outcome, aSAH$wfns)
roc2 <- roc(aSAH$outcome, aSAH$ndka)
par(mar=c( 4, 4.5, 1, 1 ))
plot(roc1, xlim=c(0.96, 0.66), ylim=c(0.56,0.86), xaxt="n")
axis(side=1)
plot(roc2, add=T)

With:

m1.roc <- roc(aSAH$outcome, aSAH$wfns)
m2.1.roc <- roc(aSAH$outcome, aSAH$ndka)
par(mar=c( 4, 4.5, 1, 1 ))
plot(roc1, xlim=c(0.96, 0.66), ylim=c(0.56,0.86), xaxt="n")
plot(roc2, add=T)

or:

m1.roc <- roc(aSAH$outcome, aSAH$wfns)
m2.1.roc <- roc(aSAH$outcome, aSAH$ndka)
par(mar=c( 4, 4.5, 1, 1 ))
plot(roc1, xlim=c(0.96, 0.66), ylim=c(0.56,0.86), xaxt="n")
plot(roc2, add=T)
axis(side=1)

Bug in calculating DeLong's Theta - delongPlacements(roc)

Describe the bug

While re-running (repeating a colleague's analysis) ci.auc() I received the following error message :
pROC: error in calculating DeLong's theta: got 0.65441176470588235947 instead of 0.63622994652406417160. And was asked to report the bug. (sorry if report is not perfect - my first bug report, and under time pressure)

To Reproduce

Session info - packages:

EDIT: posted the wrong list originally.

R version 3.5.0 (2018-04-23)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Leap 42.3

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] pROC_1.12.1 epiR_0.9-96 survival_2.41-3 tableone_0.9.3
[5] xtable_1.8-2 doBy_4.6-1 ggplot2_2.2.1 someR_1.5.1

What command did you run?
Command - ci.aus()
What data did you use? Use save(myData, file="data.RData") or save.image("data.RData")

pROC_bug.zip

What error or output did you get?

Error in delongPlacements(roc) :
pROC: error in calculating DeLong's theta: got 0.65441176470588235947 instead of 0.63622994652406417160. Diagnostic data saved in pROC_bug.RData. Please report this bug to https://github.com/xrobin/pROC/issues.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Implement CI for multiclass.roc

CI is broken for multiclass.roc:

data(aSAH)
multiclass.roc(aSAH$gos6, aSAH$s100b, ci=TRUE)
Error in roc.default(response, predictor, levels = X, percent = percent,  : 
  formal argument "ci" matched by multiple actual arguments

It is also not possible to calculate a CI on an existing object:

ci(multiclass.roc(aSAH$gos6, aSAH$s100b))
Error in roc.default(response, predictor, ...) : No valid data provided.

This should work easily for univariate multiclass.roc. The new mv.multiclass.roc might need a bit more work.

Forthcoming release of ggplot2 and pROC

We are contacting you because you are the maintainer of pROC, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Biased AUC estimate ?

Thank you very much for your very useful pROC package.
I've noticed a curious result on which I'd like to draw your attention : when simulating samples with the same distribution for the classifier score for cases and controls, an AUC of 0.5 is expected on average. However, a slightly biased mean > 0.5 is provided by auc function of pROC, whereas both ROC and fbroc packages yield an identical mean value closer to 0.5 than pROC.
When comparing the individual AUCs estimated by the 3 packages, although pROC yields in some cases, the same AUCs as the 2 other packages, in some other cases, the result is different. The 2 others packages always give the same AUC estimate.
The following code illustrates this:
##############################################
rm(list=ls())
library(pROC)
library(ROC)
library(fbroc)

nsim <- 1000

result <- matrix(ncol=3,nrow=nsim)
n.cases <- n.controls <- 150

for (i in 1:nsim){

same distributions of scores for cases and controls

response.cases <- rnorm(n.cases, 6,50)
response.controls <- rnorm(n.controls, 6,50)

#############################################
pROC <- roc(controls=response.controls, cases=response.cases)
#############################################
ROC <- rocdemo.sca(truth=c(rep(1, n.cases), rep(0, n.controls)), data=c(response.cases, response.controls))
#############################################
fbroc <- boot.roc(pred=c(response.cases, response.controls), true.class=c(rep(TRUE, n.cases), rep(FALSE, n.controls)))
#############################################
result[i,] <- c(auc(pROC), AUC(ROC), fbroc$auc)

}

the mean is not 0.5 for pROC whereas it is 0.5 for the other 2 packages

apply(result, 2, mean)

##############################################

Many thanks if you can look into this issue.
Best regards,
Jacques

ci.coords should fail more gracefully with ret = "threshold"

I believe the only supported use case of boostrapping threshold is with 'x = "best"'. For all other cases, pROC should produce a useful error message, not garbage like:

> ci.coords(roc1, x=0.8, input = "sensitivity", ret=c("specificity", "ppv", "tp", "thr"))
Error in apply(sapply(perfs, c), 1, quantile, probs = c(0 + (1 - conf.level)/2,  : 
  dim(X) must have a positive length
De plus : Warning message:
In ci.coords.roc(roc1, x = 0.8, input = "sensitivity", ret = c("specificity",  :
  NA value(s) produced during bootstrap were ignored.

> ci.coords(roc1, x=0.9, input = "sensitivity", ret="t")
95% CI (2000 stratified bootstrap replicates):
                           2.5% 50% 97.5%
sensitivity 0.9: threshold   NA  NA    NA
Warning message:
In ci.coords.roc(roc1, x = 0.9, input = "sensitivity", ret = "t") :
  NA value(s) produced during bootstrap were ignored.

In the longer term, work should continue on branch interpolate that will ultimately support this feature by interpolating thresholds.

Generalized AUC for more than 2 outcome categories

I'd like to compute the Generalized AUC for comparing methods that should separate 3 different categories (e.g. no, mild, severe disease).
https://stats.stackexchange.com/questions/112383/roc-for-more-than-2-outcome-categories

Does your package provide that kind of statistic?

Thanks

Infinite case value causes error

pROC generates an error whenever the list of case values contains Inf. I suspect this is related to issue #25 . I am using the most recent GitHub version of pROC (as of May 11).

To Reproduce
Steps to reproduce the behavior:

Started with a fresh R session:

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4 yaml_2.1.18

Run attached demo script (includes minimal data set).

pROC-inf-bug-test.txt

Error message is:

Error in delongPlacements(roc) :
pROC: error in calculating DeLong's theta: got 0.73333333333333328152 instead of 0.65000000000000002220. Diagnostic data saved in pROC_bug.RData. Please report this bug to https://github.com/xrobin/pROC/issues.

power.roc.test

Hi,

First, thank you of this wonderfull package.

I am trying to use the function 'power.roc.test' from the development version of the package. I would like to compute the sample size needed to compare a single AUC to a theoric value.

Is it possible to define the theoric value of the AUC (For example, if the expected AUC is 0.9, and its theoric value is 0.8) ?

Best,

David

help with pROC installation on a Debian box (Rcpp related)

Hello,

i've a problem installing package pROC on my Debian testing

install.packages("pROC")
Installing package into
‘/home/l/R/x86_64-pc-linux-gnu-library/3.0’
(as ‘lib’ is unspecified)
provo con l'URL
'http://cran.mirror.garr.it/mirrors/CRAN/src/contrib/pROC_1.7.1.ta
r.gz'
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
connesso a 'cran.mirror.garr.it' sulla porta 80.
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
-> GET /mirrors/CRAN/src/contrib/pROC_1.7.1.tar.gz HTTP/1.0
Host: cran.mirror.garr.it
User-Agent: R (3.0.3 x86_64-pc-linux-gnu x86_64 linux-gnu)

Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- HTTP/1.1 200 OK
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Server: nginx/1.4.7
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Date: Fri, 28 Mar 2014 16:54:04 GMT
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Content-Type: text/plain
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Content-Length: 91857
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Last-Modified: Fri, 21 Feb 2014 04:39:58 GMT
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Connection: close
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- ETag: "5306d89e-166d1"
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
<- Accept-Ranges: bytes
Warning in download.file(url, destfile, method, mode = "wb", ...)
:
Code 200, content-type 'text/plain'
Content type 'text/plain' length 91857 bytes (89 Kb)
URL aperto

downloaded 89 Kb

Carico il pacchetto richiesto: splines
Warning in library(pkg, character.only = TRUE, logical.return =
TRUE, lib.loc = lib.loc) :
there is no package called ‘pROC’
Warning: package ‘yapomif’ in options("defaultPackages") was not
found

tools:::.install_packages()

installing source package ‘pROC’ ...
** package ‘pROC’ successfully unpacked and MD5 sums checked
Warning in writeLines(paste0(c(out[is_not_empty]), eor), file) :
stringa carattere non valida nella conversione dell'output
** libs
g++ -I/usr/share/R/include -DNDEBUG
-I"/usr/lib/R/site-library/Rcpp/include" -fpic -O3 -pipe -g
-c RcppExports.cpp -o RcppExports.o
g++ -I/usr/share/R/include -DNDEBUG
-I"/usr/lib/R/site-library/Rcpp/include" -fpic -O3 -pipe -g
-c delong.cpp -o delong.o
g++ -I/usr/share/R/include -DNDEBUG
-I"/usr/lib/R/site-library/Rcpp/include" -fpic -O3 -pipe -g
-c perfsAll.cpp -o perfsAll.o
Carico il pacchetto richiesto: splines
Error in library.dynam(lib, package, package.lib) :
shared object ‘pROC.so’ not found
Warning: package ‘yapomif’ in options("defaultPackages") was not
found
g++ -shared -o pROC.so RcppExports.o delong.o perfsAll.o >
Rcpp:::LdFlags() > > -L/usr/lib/R/lib -lR
Carico il pacchetto richiesto: splines
Error in library.dynam(lib, package, package.lib) :
shared object ‘pROC.so’ not found
Warning: package ‘yapomif’ in options("defaultPackages") was not
found
g++: error: >: File o directory non esistente
g++: error: Rcpp:::LdFlags(): File o directory non esistente
g++: error: >: File o directory non esistente
g++: error: >: File o directory non esistente
make: *** [pROC.so] Error 1
ERROR: compilation failed for package ‘pROC’

removing ‘/home/l/R/x86_64-pc-linux-gnu-library/3.0/pROC’
Warning in install.packages("pROC") :
installation of package ‘pROC’ had non-zero exit status

It seems to be a compilation problem, but Rcpp version is > than
that required (0.10.5)

packageVersion("Rcpp")
[1] ‘0.11.0’

A few infos...

                          sysname 
                          "Linux" 
                          release 
                   "3.10-2-amd64" 
                          version

"#1 SMP Debian 3.10.7-1 (2013-08-17)"
nodename
"np350v5c"
machine
"x86_64"
login
"l"
user
"l"
effective_user
"l"

R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 0.3
year 2014
month 03
day 06
svn rev 65126
language R
version.string R version 3.0.3 (2014-03-06)
nickname Warm Puppy

Any hint to solve the problem?

thank you,
Luca

Warning "no non-missing arguments" and nothing computed

I'm computing multiple ROC-curves. Two out of 12 computations give me a warning message and won't compute the desired variables without throwing an error.

Output of the function "roc" with rm.remove=TRUE of the two problematic functions:
"True Postive Rate: from Inf to -Inf
False Positive Rate: from Inf to -Inf
Area under Curve:
Maximum F1 Score: -Inf
Warning messages:
1: In min(x$tp) : no non-missing arguments to min; returning Inf
2: In max(x$tp) : no non-missing arguments to max; returning -Inf
3: In min(x$fp) : no non-missing arguments to min; returning Inf
4: In max(x$fp) : no non-missing arguments to max; returning -Inf
5: In max(x$F1) : no non-missing arguments to max; returning -Inf"

problematicRoc$auc gives "NULL".

Everything works as expected with the two computations when I explicitly state that I want auc and ci computed like "roc(....., auc=TRUE, ci=TRUE)".

I have no conflicting packages installed and to my knowledge the data im running roc on in the two problematic instances is not that different from the other ten instances where it works as expected.

I'm not sure how to reproduce the error but I'm glad to provide more detail.

(Thanks for this great package by the way!)

Update list of CRAN packages that export roc or auc functions.

https://github.com/xrobin/pROC/wiki/FAQ---Frequently-asked-questions#the-functions-are-giving-me-weird-error-messages-i-dont-understand

Write a script that creates an updated list
Flag packages exporting only roc or only auc function
Flag packages exporting a non-generic auc function
Flag packages creating objects of class roc or auc

coordinates when smoothing

I think that there is an issue when using coords and a smoothed curve. The format of results are different between smooth and unsmoothed curves and I suspect that the threshold is being returned in place of the specificity when smoothing is used.

For example:

library(pROC)

data(aSAH)

roc_orig <- roc(aSAH$outcome, aSAH$s100b)
roc_smooth <- roc(aSAH$outcome, aSAH$s100b, smooth = TRUE)

## plots are not extremely different
plot(roc(aSAH$outcome, aSAH$s100b, smooth = TRUE))
plot(roc(aSAH$outcome, aSAH$s100b), add = TRUE, col = "red")

coord_orig <- t(coords(roc_orig, seq(0, 1, 0.01)))
coord_smooth <- t(coords(roc_smooth, seq(0, 1, 0.01)))
coord_smooth2 <- t(coords(smooth(roc_orig), seq(0, 1, 0.01)))

The results are very different:

> head(coord_orig)
     threshold specificity sensitivity
0         0.00  0.00000000   1.0000000
0.01      0.01  0.00000000   1.0000000
0.02      0.02  0.00000000   1.0000000
0.03      0.03  0.00000000   1.0000000
0.04      0.04  0.00000000   0.9756098
0.05      0.05  0.06944444   0.9756098
> head(coord_smooth)
     specificity sensitivity
0           0.00   1.0000000
0.01        0.01   0.9970265
0.02        0.02   0.9942254
0.03        0.03   0.9914151
0.04        0.04   0.9885741
0.05        0.05   0.9856905

Thanks,

Max

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pROC_1.8

loaded via a namespace (and not attached):
[1] plyr_1.8.4  tools_3.3.1 Rcpp_0.12.5

Sample size calculation with wrong zalpha?

I tried to reproduce the sample size calculations in Table 4 of the Obuchowski paper (2004) for a single ROC curve. For a significance level of 0.05, an expected AUC of 0.7, a desired power of 0.9 and kappa = 1, the sample size calculation should result in 33 patients for each of the two groups.

However,
power.roc.test(auc=0.7, sig.level=0.05, power=0.9, kappa=1.0)
gives ncases = ncontrols = 40.21369 as a result.

Maybe the problem is that inside the function, the z-value for the significance level is calculated by
zalpha <- qnorm(sig.level),
which gives the lower alpha percentile (-1.64 instead of 1.64), not the upper one. I think it should be:
zalpha <- qnorm(sig.level, lower.tail = F) or, of course
zalpha <- qnorm(1 - sig.level)

Thank you very much for your work and for maintaining this great package!

pROC 1.12.0 cannot deal with large datasets any longer

Describe the bug
This is a regression due to the fix in #25

To Reproduce

> response <- rbinom(1E5, 1, .5)
> predictor <- rnorm(1E5)
> rocobj <- roc(response, predictor)
Erreur : impossible d'allouer un vecteur de taille 74.5 Go
4: outer(thresholds, predictor, `==`) at roc.utils.R#119
3: roc.utils.thresholds(c(controls, cases), direction) at roc.R#316
2: roc.default(response, predictor) at roc.R#21
1: roc(response, predictor)

This is caused by the check for identical values:

if (any(o <- outer(thresholds, predictor, `==`))) {

There must be an other way to test for exact equality between two vectors safely.

Implement TPR and FPR in coords

Calculate FPR and TPR of ROC curve in coords.
Example usage: https://stackoverflow.com/questions/16643917/

This is trivially calculated as:

TPR = sensitivity
FPR = 1 − specificity

Returning with the "TPR" and "FPR" labels might require a bit more work.

'ci.coords' with x="best" fails if one or more resampled curve have multiple best thresholds

Ways to deal with this include:

selecting one best threshold at random
dropping the sample
handling the case gracefully with a meaningful error message

ci.auc with method bootstrap that works in RStudio Cloud

Hi,

I am trying to use pROC in RStudio Cloud. The data I'm dealing with can only be accessed in a secure "datalab" environment designed by Statistics New Zealand, so using R on my personal computer is not possible and I doubt that Stats NZ will be able to support a different implementation just for me.

Using the DeLong method to get a confidence interval works no problem, but when I try something like:

ci.auc(roc_object, method = "bootstrap")

I get the error:

Error in structure(.External(.C_dotTclObjv, objv), class = "tckObj") : [tcl] invalid comman name "toplevel".

This appears to me to be similar to this issue in the old RStudio community. It appears that the tcltkpackage is the reason it doesn't work? Is that the case for the pROC package too?

Thanks for the great package, would love any feedback on whether I'm mistaken or whether a work around is possible!

Handle changing xlim in plot.roc

The following piece of code in plot.roc handling legacy.axis fails to rescale with xlim:

        lab.at <- seq(1, 0, -0.2)
        if (x$percent) 
            lab.at <- lab.at * 100
        lab.labels <- lab.at
        if (legacy.axes) 
            lab.labels <- rev(lab.labels)

Optimize threshold determination with algorithm=2

Too much time is spent in roc.utils.R:60 in roc.utils.perfs.all.fast:

dups.sesp <- duplicated(matrix(c(se, sp), ncol=2), MARGIN=1)

There must be a better way to do it. Here is some benchmarking code:

n <- 1e6
dat <- data.frame(x = rnorm(n), y = sample(c(0:1), size = n, replace = TRUE))

library(profvis)
profvis({
	for (i in 1:10) {
		pROC::roc(dat$y, dat$x, algorithm = 2)
	}
	
})

Implement `ret="all"` in coords

Followup of issue #40.

It might be useful to return every coordinate possible in coords. This could be done by adding a special ret value of "all" (verbatim).

Warning: this cannot be abbreviated, as this would change the current behavior of ret="a" which is to return accuracy. It cannot be mixed with any other value, and therefore only an exact match with a vector of length 1 should be allowed.

DeLong AUC Confidence Interval

I think there may be an issue with the DeLong confidence interval for AUC. When the sample size gets large, the CI goes to either 0-0 or 1-1. Here is an example:

Create ROC Objects

predictor1 <- c(runif(12000,0,0.5), runif(14472-12000, 0.5,0.75))
response1 <- rbinom(14472, size=1, p=predictor1)
roc1 <- roc(response1, predictor1)

predictor2 <- c(runif(3 * 12000,0,0.5), runif(3 * (14472-12000), 0.5,0.75))
response2 <- rbinom(3 * 14472, size=1, p=predictor2)
roc2 <- roc(response2, predictor2)

predictor3 <- c(runif(10 * 12000,0,0.5), runif(10 * (14472-12000), 0.5,0.75))
response3 <- rbinom(10 * 14472, size=1, p=predictor3)
roc3 <- roc(response3, predictor3)

Calculate AUC and CI

auc(roc1)
Area under the curve: 0.7586
ci.auc(roc1)
95% CI: 0.7506-0.7667 (DeLong)

auc(roc2)
Area under the curve: 0.7584
ci.auc(roc2)
95% CI: 0.7537-0.7631 (DeLong)

auc(roc3)
Area under the curve: 0.7561
ci.auc(roc3)
95% CI: 1-1 (DeLong)

Set `quiet` to FALSE by default

See the confusion on SO. A similar behavior is featured in the new cutpointr package. It is time to be more explicit about automatic choices.

Todo:

Change defaults on github
Tell dependent packages about the change
Release ~ 1 month later.

Add tests for bootstrap operations.

This should start with simple operations like var and later ci.coords. The following sub-steps will have to be taken:

Define a class of tests that can be controlled by an environment variable (the tests will be slow and may fail)
Establish the current expectation and standard deviation of each bootstrapped statistic
Make sure that the results are within N standard deviations of the margin (can be controlled with environment variable?)

Additional considerations:

The tests may fail and shouldn't be run upon normal testing / CRAN

ggroc() does not show subtitle or caption label

ggroc() does not show subtitle or caption label

To Reproduce

a <- 1:10
b <- rep(c(TRUE, FALSE), 5)
ggroc(roc(b ~ a)) + labs(title = "stairs", subtitle = "leading upstairs", caption = "from right to left leading downstairs")

Expected behavior
A graph of a step function displaying the contents of the subtitle and caption arguments somewhere.

Switch warning about matrix predictor/response to error.

For a future version of pROC:

Get a baseline with revdep
Switch warning to stop (see f8bedba) and
Contact authors if necessary
Publish after at least 2 weeks (per CRAN policy)

Namespacing the functions?

Currently unable to use pROC::ci in my package without requiring the whole package.

# somewhere in function definition
#' @importFrom pROC ci ci.auc ci.roc roc
pROC::ci(factor(c(0, 1, 0, 1)), c(0.1, 0.2, 0.3, 0.4), of = 'auc')
# WARNING: Error in UseMethod("ci") :
#  no applicable method for 'ci' applied to an object of class "factor"

library(pROC)
pROC::ci(factor(c(0, 1, 0, 1)), c(0.1, 0.2, 0.3, 0.4), of = 'auc')
#95% CI: 0.05705-1 (DeLong)

Probably an issue with namespacing within method dispatch.

Exporting AUC values from pROC

Hello, I've been using pROC in last few days. Very nice and works well for getting AUC out. However, I can't seem to find a way to extract AUC values out into txt or csv files. I was hoping to loop through several columns of an input datafile in order to calculate the AUC for each variable.

Many thanks for your help

smooth.method="density" doesn't work in 'roc'

roc(aSAH$outcome, aSAH$ndka, smooth=TRUE, smooth.method="density")
Error in match.fun(paste("bw", bw, sep = "."))(roc$predictor) : 
  need at least 2 data points

This is because roc$predictor is not set at the time smooth.roc is called.

Make sure to re-enable tests by removing the skip_if call in test-roc.R once fixed.

Change default direction from auto to <

Users seem confused by the auto-detection of the direction of a ROC curve. See this discussion and others. This issue discusses whether to change the default to '<'.

Pros:

auto can bias AUCs towards higher values. Can be an issue when resampling etc
'<' is consistent with thinking of the score as "probability to be a case"

Cons:

Why would < be less confusing? What if score is "probability to be a control"?

Use doRNG and foreach for reproducible parallel bootstrapping

The plyr is old and newer, better options exist for parallel execution. The foreach package seems to be the way to go, with different backends available, and the doRNG package for reproducible parallel calculations.

Interface from the user perspective would look like:

cl <- makeCluster(2) # 2 cores
registerDoParallel(cl)
registerDoRNG(1234) 
ci(...)
stopCluster(cl)

Internally we would simply have:

resampled.values <- foreach(i=1:boot.n) %dopar% { stratified.bootstrap.test(...) }

instead of

resampled.values <- laply(1:boot.n, stratified.bootstrap.test, ...)

Things to consider:

Code should be able to run without any extra line of code from the user (but then not in parallel)
Progress bars?
What if some of the bootstrapping gets implemented in C++ in the future?

'drop' in 'coords' doesn't drop over 'ret' direction

coords(r.s100b, c(0.51, 0.2), input = "threshold", ret = "specificity", drop = TRUE)
coords(r.s100b, "local maximas", input = "threshold", ret = "specificity", drop = TRUE)

Both return a matrix with 1 row. Note: this is tested in test-coords.R but the test is skipped as it fails.

The documentation only mentions dropping over length(x), but also doesn't state that it won't drop if length(ret) == 1. The doc should be either updated to mention not dropping over ret, or updated to mention to drop over ret and the code updated accordingly.

This however is an api change and too close for 1.14.

Compare more than 2 ROC curves

I've probably missed this, but is there an option in pROC for comparing more than two ROC curves at the same time? I see that DeLong et al 1988 is a reference, but it seems like pROC is missing this ability. If so, could this be considered a feature request? Also amazing would be the ability to test multiple (>2) pAUCs at the same time. Thanks for the great addition to R!

Question: Specify (hardcode) negative outcome in advance

I want to calculate the AUC for many subgroups, one at a time (in a foreach loop).
From my understanding the direction can change every time depending on the relation of 0 vs. 1 in the outcome, so I don't see if the AUC would be under 0.5.

Is it possible to specify the "negative outcome" or the "positive outcome" in advance?
I am currently using a workaround like

direction_i <- if(mean(df_i[["outcome"]]) < 0.5) {">"} else {"<"}

in every step, which is ok for my use case.

However, in general I would find your package much more appealing, if this could be specified directly.
Am I missing sth. obvious?

multiclass.roc is confusing

There seems to be a lot of confusion around this function, what it does and how to use it.

In particular, it seems people would like to pass a "multiclass predictor", a matrix containing probabilities of each datapoint belonging to a class. See for instance this question on StackOverflow

Not sure anything can be saved here.

Error in delongPlacements while calculating DeLong's theta

I was running a piece of code and my code threw this error message:

Error in delongPlacements(roc) :
A problem occured while calculating DeLong's theta: got 0.50057161522129678399 instead of 0.50032663726931247972. This is a bug in pROC, please report it to the maintainer.

Does anyone know what I should do?
Thanks

Return "youden" and "closest.topleft" in coords

Followup of issue #40.

These values are calculated but never returned to the user.

They would be "special" as they would be weighted, unlike all other returned coordinate values.

coords is too slow with many thresholds

response <- rbinom(1E5, 1, .5)
predictor <- rnorm(1E5)
r <- roc(response, predictor)
system.time(coords(r, "a"))
utilisateur     système      écoulé 
     47.791       0.088      47.867

I would expect it to complete more or less instantly.

Significance of a single ROC curve

It should be possible to calculate the significance of a single ROC curve.

This would test H_0: AUC = 0.5.

For a full AUC this should correspond to the Wilcoxon Test. For partial AUC we need to use bootstrapping. Something like this:

roc.test(roc(aSAH$outcome, aSAH$ndka))
roc.test(roc(aSAH$outcome, aSAH$ndka, partial.auc = c(1, 0.9)))

xrobin / proc Goto Github PK

proc's Issues

same distributions of scores for cases and controls

the mean is not 0.5 for pROC whereas it is 0.5 for the other 2 packages

Create ROC Objects

Calculate AUC and CI

Recommend Projects

Recommend Topics

Recommend Org

Jobs