GithubHelp home page GithubHelp logo

Infinite case value causes error about proc HOT 8 CLOSED

jir88 avatar jir88 commented on July 1, 2024
Infinite case value causes error

from proc.

Comments (8)

xrobin avatar xrobin commented on July 1, 2024

Thanks for the report. It is related with issue #25 indeed, but this is a really tricky one.

If you define the criteria for the positivity of an observation as x >= t (this is the case in pROC with direction = "<"), then we have the negative criteria x < t. If x is -Inf you can never have a threshold that makes it negative. If it is in a control and direction = "<", then you have a control which will always be positive, so you cannot reach 100% specificity. (If it's a case then it's just always misclassified which is actually fine).

The opposite is true if you define positivity as x > t (strict) and negative x <= t, you can setup curves that never reach 100% sensitivity. In both cases you have a curve which is not fully defined. Curves not hitting 0 or 100% will create all sort of issues down the line (with AUC etc) and pROC should not create any.

This is definitely not handled properly by the DeLong Theta which is agnostic to the actual threshold and only cares about the ordering. It is also not handled properly by algorithm 2, which actually works like the DeLong Theta making this bit run:

controls <- c(1,2,3,4,5)
cases <- c(2,3,4,5,6,Inf)
roc_data <- roc(controls = controls,
                cases = cases,
                percent=TRUE, ci = TRUE, print.auc=TRUE,
                algorithm=2)
plot(roc_data)

Also note how the curve is not actually complete if the infinity is in the controls (or I guess in the cases with direction = ">").

controls <- c(1,2,3,4,5,Inf)
cases <- c(2,3,4,5,6)
roc_data <- roc(controls = controls,
                cases = cases,
                percent=TRUE, ci = TRUE, print.auc=TRUE)
coords(roc_data, "all")
all       all      all  all      all      all      all all
threshold   -Inf   1.50000  2.50000  3.5  4.50000  5.50000      Inf  NA
specificity    0  16.66667 33.33333 50.0 66.66667 83.33333 83.33333  NA
sensitivity  100 100.00000 80.00000 60.0 40.00000 20.00000  0.00000  NA

I am not sure how to handle this correctly yet. Most likely I'll have to return NA when infinities are present, possibly relaxing it in some selected circumstances. In any case lots of cleanup to do.

from proc.

xrobin avatar xrobin commented on July 1, 2024

Two curves that will be interesting to test this bug and define clear test cases.

~~This is one curve that can never reach 100% specificity:~~~~

controls <- c(-Inf, 1,2,3,4,5)
cases <- c(2,3,4,5,6)
roc_data <- roc(controls = controls,
                cases = cases, direction = "<")

This one actually does reach 100% sensitivity due to the positivity being x >= threshold, so if t=Inf the last case is positive:

controls <- c(1,2,3,4,5)
cases <- c(2,3,4,5,6, Inf)
roc_data <- roc(controls = controls,
                cases = cases, direction = "<")

Edit: The above is just plain wrong and shows how confusing thresholding infinities with infinities is.

from proc.

xrobin avatar xrobin commented on July 1, 2024

One more note to self: it is very likely that only algorithms 1 and 3 can deal with infinite values properly, so it will be important to remember to:

  • Disable algorithm 2 and DeLong when any +-Inf values are present.
  • Switch to algorithm 1 or 3 and bootstrap if appropriate, or throw an error message if the user specifically asked for DeLong or algorithm 2.
  • Verify behavior of Obuchowski and Venkatraman

from proc.

xrobin avatar xrobin commented on July 1, 2024

The more I look at this issue, the more I think +-Inf should be just flat-out rejected. It is simply too confusing, even I need to think very hard to get which conditions work and which ones don't. I fear that accepting some Inf but not others will be just far too confusing. Refusing to threshold infinities with infinity in all cases makes sense, and is much easier to understand.

This leaves the question: what should roc do when predictor contains an infinite value? Possible options include:

  1. return NA, probably with a warning
  2. stop with an error
  3. A mix of 1 and 2 depending on the status of na.rm

Feedback from users is welcome.

from proc.

jir88 avatar jir88 commented on July 1, 2024

Thanks for handling this issue so quickly! My personal preference would be option 2: stop with an error. A well-written error message will hopefully help users quickly isolate and understand the problem.

from proc.

xrobin avatar xrobin commented on July 1, 2024

Option 2 has my preference too, the previous commit implements it.

from proc.

xrobin avatar xrobin commented on July 1, 2024

After lots of thinking and hesitation, I don't think throwing errors is a good idea, and it is likely to break things. Instead I think returning NaN makes more sense, as thresholding infinities with infinities is not defined mathematically, rather than missing.

I need to perform more tests, especially of reverse dependencies before submitting to CRAN.

from proc.

xrobin avatar xrobin commented on July 1, 2024

I pushed pROC v1.13.0 to CRAN yesterday and it the submission just went through. It should be available shortly.

from proc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.