Hey, We recently created an alternative LIME implementation, as a pa

Hello. I've also tried both. I think using the probabilities u

Hey, Yeah, that's true, but it doesn't have to

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

experience with alternative implementation about lime HOT 8 CLOSED

marcotcr commented on July 24, 2024

experience with alternative implementation

from lime.

Comments (8)

marcotcr commented on July 24, 2024

Hello.

I've also tried both. I think using the probabilities usually gives you some extra signal. For example, let's say you're explaining the second highest predicted class - it may be that no perturbations quite move it to the highest predicted class, but many perturbations almost make it.
That seems like a good idea, I'll have to take a look into it. I expect it will not be so great for high dimensional data, but the distance functions also degrade with more dimensions
I'm not sure of how useful the score is anyway, since what it means is highly dependent on the kernel (and distance). If one is using the score, it makes sense to compute it using held out data like you did.
Yes, this is true. When doing text, this is not much of an issue, as we can just find a sigma that works reasonably well on cosine distance for most sentences, but I don't know how to select a good sigma for arbitrary tabular data. The distance function used also makes a lot of difference. This is one of the reasons why I made discretization the default for tabular data - it's easier to find more reasonable default values for discrete data.
This is an excellent insight, and maybe undergirds most of your other comments. I have been working on different explanation methods (e.g. this preprint) that produce explanations that are more clear to the user, and also on perturbation distributions that are more natural. I think it's hard for a user to understand what the neighborhood is in the current version of LIME, especially for tabular data (for text it's more intuitive I think).

Thanks for the insightful comments. You guys hit many of the problems that I've been thinking about after writing the original paper, and things I'm working on. I'll take a closer look at ELI5.

from lime.

kmike commented on July 24, 2024

Hey,

Yeah, that's true, but it doesn't have to be regressor to use probabilities, you can train a classifier which learns to approximate the given probability distribution, not just 1/0 scores. Unfortunately scikit-learn doesn't support non 1/0 labels for cross-entropy loss, but for Logistic Regression one can emulate it by oversampling of the training examples according to probabilities (that's the way it is solved in eli5). I'm not sure, but this scikit-learn limitation looks artificial, it is easy to implement cross-entropy loss without requiring target probabilities to be 1/0 e.g. in Theano.
In scikit-learn example for high-dimensional data (images) they applied PCA before runninng KDE, maybe that's an option.
...
Hm, it seems there is some misunderstanding on my side regarding discretization.
Thanks for the link! I was thinking about something similar (using a classifier which aims to produce interpretable rules, e.g. https://arxiv.org/pdf/1511.01644.pdf). eli5 has support for decision trees and can show them as a list of rules; by limiting tree depth it is possible to keep it in control, e.g.

visits <= 1.500  (66.7%)
    visits <= 0.500  (33.3%)  ---> [0.500, 0.500]
    visits > 0.500  (33.3%)  ---> [1.000, 0.000]
visits > 1.500  (33.3%)  ---> [0.000, 1.000]

which can be transformed to a list of 3 (or 2) rules, for 3 (2) outcomes. The example above shows probabilities, but it could show top target classes as well. In your paper it seems you've found a nice rule generation algorithm for that (decision trees are not optimized for interpretability explicitly). Kudos for the evaluation section as well.

from lime.

marcotcr commented on July 24, 2024

1 - I guess if you're training a classifier to approximate a probability distribution, I call that a regressor : ) (a smarter one which constrains things to be between 0-1). I should actually have done what you describe (logreg with cross entropy) instead of what I did. Oh well.
5 - You may also want to take a look at interpretable decision sets - they compare against rule lists and seem to be more interpretable (I certainly prefer their representation).
Yeah, that version of the paper is just a preliminary one, I'm working on evaluation with humans, hopefully I'll submit the full version soon (and I'll release the code)
Best,

from lime.

pramitchoudhary commented on July 24, 2024

@marcotcr is the dev effort for the https://arxiv.org/abs/1611.05817 happening on another branch ?
Will it be possible to provide any info around the same ?
nice effort around eli5 and LIME guys. @kmike @marcotcr

from lime.

marcotcr commented on July 24, 2024

It's happening in a private repository that I plan to release once I think it's good enough (most likely after I write a full paper about it, which I am in the middle of).

from lime.

smrtslckr commented on July 24, 2024

@marcotcr Interesting to hear that you are working on new implementation. I have really enjoyed using the initial project to evaluate my deep learning models. Out of curiosity, is there an ETA you have in mind now for publishing the new paper and LIME implementation?

from lime.

marcotcr commented on July 24, 2024

The paper in under review, I'll probably put it on arxiv soon.
I'm not sure about when I'll make the code available, I'm still changing it too much as I have new ideas and do more research : )

from lime.

smrtslckr commented on July 24, 2024

@marcotcr Thank you so much for your research and open source efforts on this! I'll keep my eyes open for publications and the next gen of LIME.

from lime.

experience with alternative implementation about lime HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs