GithubHelp home page GithubHelp logo

Comments (3)

iancovert avatar iancovert commented on August 26, 2024

Hi Louise, thanks for posting. And this is a really cool idea, I'd love to see if we can get this to work. Just to be sure, am I understanding correctly that every image in your dataset has been segmented into a set of regions, and that both the number and the biological/medical meaning of the regions is consistent across images?

If so, I think that's a really interesting use-case for this type of feature importance technique. I can see how you wouldn't expect specific pixels to be informative across scans because of small differences in brain size/position, but you would expect certain regions to be informative. It will take a bit of extra work to run SAGE in this setting, but I'd be happy to help you figure it out.

If I understand the data and segmentations correctly, there will be two main tasks:

  1. Figuring out how to evaluate your model with held-out brain regions. SAGE, just like SHAP, LIME, permutation tests, etc, is based on evaluating your model with parts of the input data removed. There are several options for how to do this, and I would be a bit concerned about simply setting image regions to be all zeros (black pixels), because your model may not know what to make of that. Zeros could be a good starting point for prototyping, but we might want to consider whether there's a better option.
  2. Writing a modified algorithm to estimate SAGE values. This bit will involve writing a bit of new code, which I'm happy to do, but basically we need to tell SAGE that the number of image regions is consistent, but their positions are different for each input. (SAGE supports feature grouping right now but the groups are consistent across inputs.)

Let me know what you think, and I'm happy to chat more either here or over email.

from sage.

LouiseBloch avatar LouiseBloch commented on August 26, 2024

Hi Ian, thank you for your response. Yes, you understood it correctly. The number and meaning of regions are consistent across images, whereas the number of pixels in each region and the locations (slightly) differed. To be honest, I think it might be also possible to use a standardized brain atlas as an alternative solution. I will try this in the next weeks. I just thought about the different way because I lately worked with SHAP values, but I understand that SAGE is not originally a locally based algorithm.
The identified tasks seem to be correct for me. But for now, I am not sure about the best imputation strategy, and I have to get closer to this topic first to develop an idea more specific than using zeros.
For the feature grouping, at the moment, I have a segmentation file with the same dimensions as the input images containing a different (consistent) value for each brain region. What I think is that I need to put both, the image and the segmentation file into the SAGE algorithm and I think one might need a kind of “activation/deactivation” vector. I think it is a little tricky that the dimensions of inputs and outputs differed in my idea. Meaning, that my idea expects an input image but the output might be a kind of feature importance score for each region. Best regards

from sage.

iancovert avatar iancovert commented on August 26, 2024

The imputation question is a tricky one, but if you've used SHAP with this dataset it actually has the same issue: under the hood, SHAP also requires a way to evaluate the model with held out features. Besides using zeros, a couple reasonable options could be:

  1. Replace missing features with samples drawn uniformly from the dataset. This is the most common approach with SHAP, but it can produce some undesirable imputations that don't make sense given the retained features.
  2. Make a version of your model that accommodates missing features. This would require changing how your model is trained, but it's not that difficult of a modification at least for neural networks trained with SGD. For example, if it's a CNN, you would want to introduce dropout at the input layer, where you're dropping entire individual pixels or brain regions at random. (The best reference I have for this is our paper, which discusses why this approach is reasonable in sections 4.1 and 8.2, and it's called "missingness during training.")

As for the feature grouping, what you've described is correct: you would want to pass both the images and the segmentations to the SAGE algorithm, which would internally keep a vector of size d (if you have d regions) indicating which regions to turn on/off. Implementing this may be tricky because you would want to familiarize yourself with how the SAGE algorithm works (specifically, the permutation sampling algorithm in this file), but in the end it may not require changing too many lines: you basically just need to pair the images and their segmentation maps so you can iterate through them jointly, use the segmentation maps to turn the vector of size d into a mask that's the same size as the image, then evaluate the model using that mask.

Anyway, I still find this to be a cool idea, so feel free to reach out if I can be of any help.

from sage.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.