GithubHelp home page GithubHelp logo

Comments (4)

den-run-ai avatar den-run-ai commented on May 3, 2024 2

related:

#105

from shap.

slundberg avatar slundberg commented on May 3, 2024

Hey!

It is important to note that while KernalExplainer assumes feature independence when estimating the conditional expectations, it still captures the importance of n-grams if the model depends on the n-grams. Assuming feature independence means you will toggle the words independently. I would make the reference value for a word be that the word is not there. That might require a wrapper function around the model that takes a binary vector and maps it to a token sequence with specific words missing.

It might be worth putting together an example notebook for a text processing RNN at some point.

Another option is the integrated gradients method, which is faster but is restricted to a comparison with a reference value (which is not a big deal here where we are using a single reference value anyway).

from shap.

lipka-clazzpl avatar lipka-clazzpl commented on May 3, 2024

Hi Scott,
Thank You for publishing code along with paper on model interpretability.

Found notebooks especially helpful while working on my own problem - explaining black box model that takes as it's input word vectors.
Embedding is part of my pipeline, and actual input to the model as a whole is a list of word tokens.

Let me start describing what I was able to achieve and later ask for a nudge ;)
In current setting, I've enriched each token list with position within the sequence. Additionaly I've assumed reference values need to be position dependend in the text, only than I could reliably steer (indirectly) toggle reference value replacement in KernelExplainer->explain() based on position in in the text to be explained.

Steering is done below with vectorized version of replace_index_with_word. Index is token sequence, generated with Keras by Tokenizer.fit_on_texts().

class SpecialToken(Enum):
    EMPTY = 0

def replace_index_with_word(self, _elem):
    if _elem == SpecialToken.EMPTY.value: # order matters
        return SpecialToken.EMPTY
    if type(_elem) is int: # order matters
       return self.index_word[_elem]
    if isinstance(_elem, float): # handle both python and numpy floats
       return self.index_word[int(_elem)]
    else:
       return

def f(X: np.ndarray):
    vreplace_index_with_word = np.vectorize(replace_index_with_word)

    return pipe.predict_proba(vreplace_index_with_word(X))

Token list to data frame for SHAP.

def list_to_named_columns(tokens):
    return {'pos_{:d}'.format(i + 1): [x] for i, x in enumerate(tokens)}

def tokens_to_data_frame(tokens):
    tokens_with_seq: dict = list_to_named_columns(tokens)

    return pd.DataFrame.from_dict(tokens_with_seq, orient='columns')

For multiple examples, I'm simply concatenating data frames as below

class SpecialToken(Enum):
    EMPTY = 0

def examples_to_data_frame(examples):
    frames = [tokens_to_data_frame(token_list) for token_list in examples]

    return pd.concat(frames).reset_index(drop=True).fillna(SpecialToken.EMPTY)

SpecialToken is a type I've created to skip word vector embedding later on (part of the pipeline - not published here).

The output for single example explanation shows attributions as below
zrzut ekranu 2018-06-01 o 12 28 27

More debug information

Explaning ['taki' 'bzdura' 'musieć' 'napisać' 'jakiś' 'abderyta' '.']  with reference being [<SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>, <SpecialToken.EMPTY: 0>]

It's clear which words attributed the most or which one lower the overall score, but for multiple examples token position (which is group/feature name) doesn't focus attention on the word itself at all, ie.
zrzut ekranu 2018-06-01 o 12 41 30

This leads me to the main idea, as for me word itself carries more information, was thinkin about replacing token sequences with one hot encoding, while maintaing underneath ability to express proper syntethic data (ie. bidirectional mapping based on masking)
Do You think it wouldn't undermine the logic behind Your library or maybe You could suggest different route if anyting on Your mind.
Any comment would be appreciated.

from shap.

slundberg avatar slundberg commented on May 3, 2024

Just saw this. That's an interesting question. I can see that by treating the inputs by position you lose the ability to see the importance of a single word. It would not break SHAP to use a one hot encoding for features, and then just mask those words in the sequence that are not "on" before sending it through then model. Hope that helps.

I should also mention that we are working on deep learning specific DeepExplainer that will make Keras models much faster to explain. I'll post here once the first version of that finishes.

from shap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.