Is there any requirement of minimum data set size for EBM? I have read papers where ot

How much data is require for EBM??? about interpret HOT 2 OPEN

basnetpro3 commented on May 29, 2024

How much data is require for EBM???

from interpret.

Comments (2)

richcaruana commented on May 29, 2024 1

Because EBMs are a restricted model class so that they remain intelligible, their simplicity means that they do not need large amounts of data compared to some other model types such as neural nets or boosted decision trees. In practice their data complexity is more comparable to linear and logistic regression than it is to deep neural nets, but they do often need/benefit from more data than a linear model would. The more features in the dataset the more data you need to be able to learn an accurate model, and the more complex the function needed for each feature, the more data will be needed to shape those functions accurately, so it is difficult to give numbers without knowing more about the data and problem. Our experience is that useful models with a few dozen features can be trained on data with 500 or more cases if the data is not too imbalanced. I like to look at the size of the smallest important class when I think about data size. If there are 10k training cases, but the data is only 1% positives, then there's only 100 positive cases and this no longer behaves like a large 10k sample. Also there is a difference between classification and regression --- often regression can work with fewer samples because there is more information in the label of each sample compared to Boolean classification where the label is only 0 or 1.

In summary, EBMs are reasonably sample efficient, needing somewhat more data than linear methods, but usually not as much data as more complex black-box methods such as neural nets and unrestricted boosted trees, and EBMs often work well with sample sizes of about 1000 cases or more. If there are very few samples for training, sometimes it helps to play with the EBM hyperparameters to do more outer bagging, fewer bins, and even shorter trees.

from interpret.

basnetpro3 commented on May 29, 2024 1

Thank you very much sir, I really liked your EBM model. It means if we have 3 or 4 features we can still get insights from EBM using less data. There are some research papers using EBM which have used data around 300 or less. If we read research papers we can't really say how much data is actually required for perticular model because every paper's data varies from 50 -100 and more.

from interpret.

Recommend Projects

How much data is require for EBM??? about interpret HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs