mahmoodlab / mcat Goto Github PK
View Code? Open in Web Editor NEWMultimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
License: GNU General Public License v3.0
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
License: GNU General Public License v3.0
Hello author, I would like to ask if you have ever encountered the problem that loss decreases in the training set and verification set, and C-Index increases in the training set but remains unchanged or decreases in the verification set. May I ask how it was resolved? Looking forward to your reply!
In the class SubsetSequentialSampler,the indices should be iterable,but I get a variable of numpy.int64.My torch version is 2.0.1.And my python version is 3.9.How could I solve the problem?
This is my whole traceback:
Traceback (most recent call last):
File "/mnt/data0/LI_jihao/mydata/MCAT-master/main.py", line 199, in <module>
dataset = Generic_MIL_Survival_Dataset(csv_path = '/home/jupyter-ljh/data/mydata/MCAT-master/dataset_csv/tcga_brca_all_clean.csv',
File "/mnt/data0/LI_jihao/mydata/MCAT-master/main.py", line 71, in main
val_latest, cindex_latest = train(datasets, i, args)
File "/mnt/data0/LI_jihao/mydata/MCAT-master/utils/core_utils.py", line 181, in train
for i,data in enumerate(train_loader):
File "/opt/tljh/user/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
data = self._next_data()
File "/opt/tljh/user/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 676, in _next_data
index = self._next_index() # may raise StopIteration
File "/opt/tljh/user/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/opt/tljh/user/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 254, in __iter__
for idx in self.sampler:
File "/mnt/data0/LI_jihao/mydata/MCAT-master/utils/utils.py", line 36, in __iter__
return iter(self.indices)
TypeError: 'numpy.int64' object is not iterable
Dear authors,
Please help check the following line:
MCAT/datasets/dataset_survival.py
Line 63 in b9cca63
I have tested the code as follows:
import pandas as pd
import numpy as np
csv_path = 'MCAT_master/datasets_sig_csv/tcga_brca_all_clean.csv.zip'
slide_data = pd.read_csv(csv_path, low_memory=False)
if "IDC" in slide_data['oncotree_code']: # must be BRCA (and if so, use only IDCs)
print('Yes, IDC is in there')
else:
print('No, IDC is not in there')
if "IDC" in slide_data['oncotree_code'].values: # must be BRCA (and if so, use only IDCs)
print('Yes, IDC is in there')
else:
print('No, IDC is not in there')
And the output is as follows:
No, IDC is not in there
Yes, IDC is in there
Is this a bug? My pandas version is 1.4.1
. Could you please help check it? Thanks a lot!
Hi, the hyperparameter alpha_surv seems important when the censoring rate is high.
I wonder if you have any recommendations on choosing an appropriate alpha_surv when censoring rate is high.
Hello,
I want to ask whether you can provide these extracted WSI featues stored in pt files? Because when I use CLAM to extract featues, there are some differences. So, can you provide them so that I can keep same with you.
Hello, thank you very much for your paper and project.
Could you please provide the exact codes in Processing Whole Slide Images?
Looking forward to your reply.
Hello!Thanks for you contributions.Why don't you add the test mode to the model?I notice that you only use the validation set to examine the performance of the model,but you have the test mode in your another repo HistoFL.
In the following lines:
Lines 109 to 110 in b9cca63
np.random.choice(np.arange(0, len(split_dataset)), int(len(split_dataset)*0.1), replace = False)
Hi, I am trying to reproduce your results but am having trouble with MI-FC as i do not have the required fast_cluster_ids.pkl file. I saw quite a bit of discussion in various issues, the closest answer to what I was looking for was this:
Hi @genhao3 - the fast_cluster_ids.pkl is a dictionary that I load for each cancer type, which maps case_id to [M x 1]-dim array of cluster assignments 1⋯C, where M is the number of patches and the indices correspond to the cluster assignment of a given patch embedding. You can use packages like faiss to generate these cluster assignments for running MI-FCN / DeepAttnMISL comparisons.
but it is a bit unspecific and the library mentioned is not straightforward for me. Could you provide either code used to create the file or the .pkl file itself? Any help is much appreciated :)
I know it is a baseline and not your proposed model, but would be interested to run this also.
Best
Valentin
Hi,
Thanks for the publishing the nice code. I can't find code in this repo that makes the slide attention visualizations similar to that seen in Figures 2 and 3 of the paper. Is this available somewhere?
Thanks!
Ben
https://github.com/mahmoodlab/MCAT/blob/b9cca63be83c67de7f95308d54a58f80b78b0da1/main.py#L106C1-L107C1 #16
The default setting of fusion now is concat, which will cause "float division by zero" error for WSI only baselines.
MCAT/utils/coattn_train_utils.py
Lines 19 to 21 in b9cca63
Have you updated the code? I still have this problem on my side. Which have mentioned in #4
Thank you for sharing your code!I‘m interested in your research, it gives me a lot of inspiration.
While trying to run the code, I'm confused about the file 'fast_cluster_ids.pkl' in dataset_survival.py. I can't find a description about it.
Could you please tell me what this file contains? Thank you very much!
MCAT/datasets/dataset_survival.py
Line 126 in 4fae60a
Hi there,
Thank you for sharing your nice work!
I met a problem when I try to train your model, it returned the nan loss and risk like below:
batch 99, loss: nan, label: 1, event_time: 14.6800, risk: nan, bag_size:
batch 199, loss: nan, label: 1, event_time: 20.1700, risk: nan, bag_size:
batch 299, loss: nan, label: 2, event_time: 29.3000, risk: nan, bag_size:
The error info are:
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 214, in concordance_index_censored
event_indicator, event_time, estimate = _check_inputs(event_indicator, event_time, estimate)
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 47, in _check_inputs
estimate = _check_estimate_1d(estimate, event_time)
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sksurv/metrics.py", line 36, in _check_estimate_1d
estimate = check_array(estimate, ensure_2d=False, input_name="estimate")
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 921, in check_array
_assert_all_finite(
File "/opt/anaconda3/envs/mcat/lib/python3.11/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input estimate contains NaN.
I checked the input and output of the model and found there are many nan values in the feature of both WSI and omic data which lead to the nan output of the hazards and S. I strictly followed the instructions you provided and really confused why this nan value would appear. If you met this problem before, could you tell me how to solve this?
Thank you!
Best.
It seems that the computation of the survival layer in MCAT_Surv
(link) is wrong, and logits = self.classifier(h).unsqueeze(0)
should be logits = self.classifier(h)
. With the old version, supposing that the batch_size=6
and n_classes=4
, the logits
will be of size of (1,6,4), the hazards
will be of size of (1,6,4), the Y_hat
will be of size of (1,1,4), which certainly does not contain the Y_hat for the 6 samples of the batch. Besides, the S
will means the cumulative production of the survival(i.e. 1-hazards) along the batch dimension, what does this mean? This S
is of size of (1,6,4), then the len(S)
in CoxSurvLoss
(link) will be 1, which certainly is not the batch size as expected.
In the end, could you provide the reference of the equations for you to write this cox loss?
Hi Chen,
I appreciate your excellent repository as usual :)
I encountered an issue while attempting to read the tcga_luad_all_clean.csv.zip file using pandas or through manual extraction. When trying to unzip the folder, I get a popup window saying "The archive is either unknown or damaged", and when attempting to read it with pandas, I encounter the this error: "BadZipFile: File is not a zip file".
I'm unsure whether the file is still valid or if the problem lies on my end. Could you please advise if this is an issue with the file on GitHub or if there might be another problem?
Thanks a lot.
Omnia
Line 333 in b9cca63
for batch_idx, (data_WSI, data_omic, label, event_time, c,a,b,d,e,f) in enumerate(loader):
I still can't run the program successfully,because of the new bug namely 'KeyError: 'x_omic1' '.
I notice that the slide_id of the bug is 'TCGA-5T-A9QA',the first one of the validation list.And it only has the parameter of 'x_omic' instead of 'x_omic1'.
How could I solve the problem?
May I ask which GPU everyone is using when running MCAT? How many were used in total? I am running the code on a server with 2080ti with the following command: CUDA_VISIBLE_DEVICES=0,1 python3 /usr/CI/MCAT-master/main.py. But the run always uses only one GPU and reports a CUDA out of memory error. Has anyone encountered this problem please?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.