GithubHelp home page GithubHelp logo

Comments (4)

achoum avatar achoum commented on May 17, 2024 1

Some quick remarks.

  • The number of unique values would be the same for the logits and for the probabilities.
  • Some hyper-parameters and the diversity of the dataset examples will impact the number of unique prediction values. For example, increasing the number and depth of the trees (I see that you train RF tree to only max depth 4) will help.

from decision-forests.

achoum avatar achoum commented on May 17, 2024

Unlike Gradient Boosted Trees learning algorithm, the Random Forest algorithm works with a "voting mechanism". In the case of classification, each tree casts a vote for one class (or multiple classes depending on the hyper-parameters). Therefore, the algorithm does not rely on any link functions / logits. This is why the argument apply_link_function does not exist for the Random Forest model.

I rarely saw logits being used with Random Forests. Out of curiosity, do you mind detailing your setup :) ?

If a logit is what you need (i.e. the inverse of the logistic function), you can always compute it from the probabilities (be careful with numerical precision and proba=0 case).

from decision-forests.

Howard-ll avatar Howard-ll commented on May 17, 2024

Thanks for the answer. This request is about the number of unique output values.

I am trying to replace a library with the tfdf. As you can see below screen capture, predict, predict_proba & predict_log_proba give me different output values. I am talking about the number of unique output values.
For my project, I need predict_proba or predict_log_proba. I understand that tfdf predict is similar to predict_proba. This is good. However, if I could get more number of unique output values, that would be really great. As you can see picture 2 & 3 below, predict_proba of sklearn has bigger number of unique output values while tfdf has just many 0s. If this feature can be supported, that will be just great to me because I do need it for my tasks.

In terms of number of unique output values, computing logits from probabilities may be of no use. Because the number of unique output values will be the same after all

** 1) screen capture **
Capture

** 2) Library S output range **
image

** 3) tfdf output range **
image

from decision-forests.

Howard-ll avatar Howard-ll commented on May 17, 2024

Need to try more data-sets but Random Forest has a good enough number of unique output values on my current test data-sets. Thanks!

from decision-forests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.