Comments (8)
Thanks for the report. It is related with issue #25 indeed, but this is a really tricky one.
If you define the criteria for the positivity of an observation as x >= t (this is the case in pROC with direction = "<"
), then we have the negative criteria x < t. If x is -Inf you can never have a threshold that makes it negative. If it is in a control and direction = "<", then you have a control which will always be positive, so you cannot reach 100% specificity. (If it's a case then it's just always misclassified which is actually fine).
The opposite is true if you define positivity as x > t (strict) and negative x <= t, you can setup curves that never reach 100% sensitivity. In both cases you have a curve which is not fully defined. Curves not hitting 0 or 100% will create all sort of issues down the line (with AUC etc) and pROC should not create any.
This is definitely not handled properly by the DeLong Theta which is agnostic to the actual threshold and only cares about the ordering. It is also not handled properly by algorithm 2, which actually works like the DeLong Theta making this bit run:
controls <- c(1,2,3,4,5)
cases <- c(2,3,4,5,6,Inf)
roc_data <- roc(controls = controls,
cases = cases,
percent=TRUE, ci = TRUE, print.auc=TRUE,
algorithm=2)
plot(roc_data)
Also note how the curve is not actually complete if the infinity is in the controls (or I guess in the cases with direction = ">").
controls <- c(1,2,3,4,5,Inf)
cases <- c(2,3,4,5,6)
roc_data <- roc(controls = controls,
cases = cases,
percent=TRUE, ci = TRUE, print.auc=TRUE)
coords(roc_data, "all")
all all all all all all all all
threshold -Inf 1.50000 2.50000 3.5 4.50000 5.50000 Inf NA
specificity 0 16.66667 33.33333 50.0 66.66667 83.33333 83.33333 NA
sensitivity 100 100.00000 80.00000 60.0 40.00000 20.00000 0.00000 NA
I am not sure how to handle this correctly yet. Most likely I'll have to return NA
when infinities are present, possibly relaxing it in some selected circumstances. In any case lots of cleanup to do.
from proc.
Two curves that will be interesting to test this bug and define clear test cases.
~~This is one curve that can never reach 100% specificity:~~~~
controls <- c(-Inf, 1,2,3,4,5)
cases <- c(2,3,4,5,6)
roc_data <- roc(controls = controls,
cases = cases, direction = "<")
This one actually does reach 100% sensitivity due to the positivity being x >= threshold, so if t=Inf the last case is positive:
controls <- c(1,2,3,4,5)
cases <- c(2,3,4,5,6, Inf)
roc_data <- roc(controls = controls,
cases = cases, direction = "<")
Edit: The above is just plain wrong and shows how confusing thresholding infinities with infinities is.
from proc.
One more note to self: it is very likely that only algorithms 1 and 3 can deal with infinite values properly, so it will be important to remember to:
- Disable algorithm 2 and DeLong when any +-Inf values are present.
- Switch to algorithm 1 or 3 and bootstrap if appropriate, or throw an error message if the user specifically asked for DeLong or algorithm 2.
- Verify behavior of Obuchowski and Venkatraman
from proc.
The more I look at this issue, the more I think +-Inf should be just flat-out rejected. It is simply too confusing, even I need to think very hard to get which conditions work and which ones don't. I fear that accepting some Inf but not others will be just far too confusing. Refusing to threshold infinities with infinity in all cases makes sense, and is much easier to understand.
This leaves the question: what should roc
do when predictor contains an infinite value? Possible options include:
- return NA, probably with a warning
- stop with an error
- A mix of 1 and 2 depending on the status of
na.rm
Feedback from users is welcome.
from proc.
Thanks for handling this issue so quickly! My personal preference would be option 2: stop with an error. A well-written error message will hopefully help users quickly isolate and understand the problem.
from proc.
Option 2 has my preference too, the previous commit implements it.
from proc.
After lots of thinking and hesitation, I don't think throwing errors is a good idea, and it is likely to break things. Instead I think returning NaN makes more sense, as thresholding infinities with infinities is not defined mathematically, rather than missing.
I need to perform more tests, especially of reverse dependencies before submitting to CRAN.
from proc.
I pushed pROC v1.13.0 to CRAN yesterday and it the submission just went through. It should be available shortly.
from proc.
Related Issues (20)
- ggroc.list parameter legacy.axes break HOT 2
- One-sided CIs for AUCs HOT 2
- Averaging 10 ROC curves HOT 4
- How to print the threshold without specificity and sensitivity HOT 2
- Cannot create a roc curve with a formula and a with clause HOT 2
- CRAN submission failed with new message Apparent methods for exported generics not registered
- Fix warning: `aes_string()` was deprecated in ggplot2 3.0.0 HOT 1
- Move aes_string() to aes() HOT 1
- Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. HOT 1
- The `path` argument of `expect_doppelganger()` is deprecated as of vdiffr 1.0.0. HOT 1
- Uncaught warnings in tests HOT 1
- Support for spaces in column names with formula
- A non-monotonic ROC is being produced by ggroc HOT 2
- Obuchowski and McClish (1997) sample size calculation incorrect HOT 6
- Mean ROC curve in ggroc() HOT 5
- pROC, detectable AUC HOT 2
- What does "direction" mean in roc function HOT 3
- Default method parameter in ci.auc function is different from documentation HOT 1
- Example for AUPRC with confidence interval HOT 1
- Incorrect AUC value and CI [bug] HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proc.