floodsung / learningtocompare_fsl Goto Github PK
View Code? Open in Web Editor NEWPyTorch code for CVPR 2018 paper: Learning to Compare: Relation Network for Few-Shot Learning (Few-Shot Learning part)
License: MIT License
PyTorch code for CVPR 2018 paper: Learning to Compare: Relation Network for Few-Shot Learning (Few-Shot Learning part)
License: MIT License
def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * sp.stats.t._ppf((1+confidence)/2., n-1)
return m,h
hello,could you tell me this h means? Volatility value of Paper Result ?
Do model.train and model.eval need to be added during the training?Looking forwards your reply!!
I wanted to check if we can use contrastive loss here, I tried but facing some errors. can anyone confirm and help?
When I begin to run the code, its cpu utilization is about 2000%, the training speed is acceptable. But with time going on, it falls to about 200% or 300%. And the GPU utilization is always low, not higher than 30%. So I think it runs mainly on cpu but not on gpu, and the former constrains the training speed. How to deal with this and accelerate the training speed?
Thank you for providing the code!
I have a concern about the model selection in your miniimagenet_train_few_shot.py.
Line 260: It seems that the best training model is selected as the one with best testing accuracy (not validation accuracy) ?
Spent awhile working on this to test what torch versions you need to keep stuff from breaking.
Python2.7
torch==0.3.1 ##pip or conda install may fail for this, see https://pytorch.org/get-started/previous-versions/ to get a wheel file.
torchvision==0.2.0 ##These next two must be explicitly stated or else training fails.
torchtext==0.2.3
Pin your requirements people!
(using GPU) Running an episode needs 2 or 3 seconds But do I have to wait to train for 100,0000IEPISODE) episodes?it may costs many days...
Have you trained 100,0000(EPISODE) episodes to get the best parameters?
Thanks for your awsome codes which are very clear. But I can't find the module named task_generator_test. So I can't test the code by myself. Could you tell me where I can find it? Thanks again!
The quesition that DS_Store can be soloved by cd LearningToCompare_FSL-master
and then
find . -name '*.DS_Store' -type f -delete
in Terminal.
When I run the python miniimagenet_train_few_shot.py
, it doesnot work. I don't know how to fix it.
Does this work if the training is on miniimagenet or omniglot and test on customer dataset? I wonder how it "learn to compare" in this situation. Many implementation has use miniimagenet or omniglot the demonstrate the concept of FSL but they use the same for testing (with new classes). I wonder what happened if the new classes comes from total different dataset with different intensify/feature distribution.
Thanks for your code.
as the title mentioned, codes at line 170/171 in miniimagenet_train_few_shot.py and miniimagenet_train_one_shot.py :
sample_dataloader = tg.get_mini_imagenet_data_loader(task,num_per_class=SAMPLE_NUM_PER_CLASS,split="train",shuffle=False)
batch_dataloader = tg.get_mini_imagenet_data_loader(task,num_per_class=BATCH_NUM_PER_CLASS,split="test",shuffle=True)
confuse me a lot.
It seems that these codes do not guarantee that the categories in the two sets are the same?
Hi, I wonder if you run your code on a Mac? I run this code on ubuntu16.04 according to your rule. But is suggests that OSError: [Errno 20] Not a directory: '../datas/omniglot_resized/Alphabet_of_the_Magi/.DS_Store'. Obviously, there are something hidden in the file system. How should I fix such problem? Do you test your code on ubuntu? Thanks for your kindly help.
I add a embedding layer(28 dim) in encoder for text classification.
https://github.com/laohur/LearningToCompare_FSL
But the model do not converge. why?
Nice work, but i found a problem that really confuse me.
As shown in the code omniglot_train_few_shot.py
, both in the training and testing phase, the support set (i.e. sample_images) and evaluation set (i.e. test_images) are drawn from the same 5 classes (called one task). And as your way to calculate accuracy, it's easy to achieve ~99% during training.
Here i found a problem and I don't know why ? when I draw support set from one task and draw evaluation set from another task, apparently the two tasks contains different 5 classes.
So I presume that, I will get low confidences after feeding them into the network, but the results are not.
Here is the testing case i used:
** [TESTING set] CLASS_NUM=5, SAMPLE_NUM_PER_CLASS=5
the character classes are:
['Angelic/character11',
'Angelic/character11','
'Angelic/character11',
'Angelic/character11',
'Angelic/character11',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09']
** [Support set] CLASS_NUM=5, SAMPLE_NUM_PER_CLASS=5
the character classes are:
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'
** and i got the output confidences by probs, predict_labels = torch.max(relations.data, 1)
, as below:
0.9999995, 0.9999894, 0.00067013013, 1.0, 0.9999995,
0.0013619913, 0.45683807, 0.003507328, 0.99994755, 0.20433362,
0.9999981, 0.76437086, 0.4761213, 0.99345946, 0.25436476,
0.0002244339, 0.00026010931, 0.87288016, 1.8067769e-05, 0.00053879694,
1.0, 1.0, 1.0, 1.0, 1.0
It's readly weird, support and testing set have different classes, but the output confidences are so high.
If i randomly pick a image from the entire omniglot and I (assume) don't know its class, if I compare it with all possible support sets, how can I recognize its class, because the output confidences barely has discriminability.
Am i missing anything important or misunderstood ?
I found that there is a data leakage in the testing which leads to an increase in the accuracy of the model.
The model contains batch normalization and the batch normalization is supposed to behave differently for training and testing. Since we are testing we suppose to put the model in the model.eval() mode to make the batch normalization behave as in testing but since we didn't add this mode the batch normalization will behave as in the training. it's considered as data leakage and it's increasing the accuracy (for example the Omniglot 5-way 1-shot increases the accuracy from ~90% to 99.6%). Kindly, we need to check this error.
I am trying to run the following command:
python miniimagenet_train_one_shot.py -w 5 -s 1 -b 15
However, I get the following error:
Traceback (most recent call last):
File "miniimagenet_train_one_shot.py", line 269, in <module>
main()
File "miniimagenet_train_one_shot.py", line 169, in main
task = tg.MiniImagenetTask(metatrain_folders,CLASS_NUM,SAMPLE_NUM_PER_CLASS,BATCH_NUM_PER_CLASS)
File "/content/gdrive1/My Drive/learn_to_learn/LearningToCompare_FSL/miniimagenet/task_generator.py", line 55, in __init__
class_folders = random.sample(self.character_folders,self.num_classes)
File "/usr/lib/python2.7/random.py", line 325, in sample
raise ValueError("sample larger than population")
ValueError: sample larger than population
Can any one help me with this issue please?
Thank you for your awesome code.
There is a norm part for both miniimagenet and omniglot data,
"normalize = transforms.Normalize(mean=[0.92206, 0.92206, 0.92206], std=[0.08426, 0.08426, 0.08426])"
but why are the the mean and std values always the same?
sample_features = feature_encoder(Variable(samples)) # 25x6455
sample_features = sample_features.view(CLASS_NUM,SAMPLE_NUM_PER_CLASS,FEATURE_DIM,5,5) sample_features = torch.sum(sample_features,1).squeeze(1)
how to get the csv file .i only get the image.zip file through the googledrive
self.train_labels = [labels[self.get_class(x)] for x in self.train_roots]
KeyError: '..\datas\omniglot_28x28'
Well.Thanks for this code.I have found my Index"idx" out of the range ,the list "self.image roots[]" when i run this code every time.How could i solve this problem?
Hi, Floodsung! Thanks for your great work firstly. Recently, I am reading your codes about rational networks. In your code. Dataset mean value and variance value in Omniglot dataset are same as the miniImageNet dataset. In my opinion, these values should be suited to their own data separately. Especially, Omniglot images are greyscaled images where MiniImageNet images are color images.
I wonder if this unsuitable normalization would affect the final results of classification.
Thanks!
As we know original version was implemented using python2 and pytorch 0.3.
So I corrected depreciated things to fix errors and warnings.
Furthermore, to understand how few-shot learning works, I tested using ipython.
I hope it is helpful to someone who just started to learn few-shot learning like me.
I implemented omniglot first and miniimagenet will be added soon.
If you liked it, please give a star ⭐ !
Thank you for your code.
I noticed that before feature concatenation, you sum the feature of support image for each class.
Could you please explain why not concatenate every single support image feature to qurey image feature?
I have tried that but the result are not as good as before.
You paper didn't refer to that sum operation so I'm puzzled.
I think 'python omniglot_train_one_shot.py -w 5 -s 1 -b 19' can be replaced by 'python omniglot_train_few_shot.py -w 5 -s 1 -b 19', is that right?
If so, maybe there is no need to use omniglot_train_one_shot.py.
how should i train it on my own datasets?
Thank you for your awesome code.
There were some errors when I ran this code and I didn't know what to do.I hope to get some tips from you, thanks.The following is the problem record:
init data folders
init neural networks
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.cu line=58 error=30 : unknown error
Traceback (most recent call last):
File "/home/jesse/workspace/jisheng/LearningToCompare_FSL-master/omniglot/omniglot_train_one_shot.py", line 257, in
main()
File "/home/jesse/workspace/jisheng/LearningToCompare_FSL-master/omniglot/omniglot_train_one_shot.py", line 133, in main
feature_encoder.cuda(GPU)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
param.data = fn(param.data)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in
return self._apply(lambda t: t.cuda(device))
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 69, in cuda
return new_type(self.size()).copy(self, async)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 361, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.cu:58
System Configuration: gtx1080ti, cuda8.0, cudnn v5.1
Torch and Torchvision installed by the conda command
Torch:0.3.0
Torchvision:0.2.0
error.txt
thanks very much
In calculating accuracy of test dataset:
https://github.com/floodsung/LearningToCompare_FSL/blob/master/omniglot/omniglot_train_one_shot.py#L237
sample_images,sample_labels = sample_dataloader.__iter__().next()
test_images,test_labels = test_dataloader.__iter__().next()
sample_features = feature_encoder(Variable(sample_images).cuda(GPU)) # 5x64
test_features = feature_encoder(Variable(test_images).cuda(GPU)) # 20x64
sample_features_ext = sample_features.unsqueeze(0).repeat(SAMPLE_NUM_PER_CLASS*CLASS_NUM,1,1,1,1) test_features_ext = test_features.unsqueeze(0).repeat(SAMPLE_NUM_PER_CLASS*CLASS_NUM,1,1,1,1) test_features_ext = torch.transpose(test_features_ext,0,1)
relation_pairs = torch.cat((sample_features_ext,test_features_ext),2).view(-1,FEATURE_DIM*2,5,5) relations = relation_network(relation_pairs).view(-1,CLASS_NUM)
_,predict_labels = torch.max(relations.data,1)
rewards = [1 if predict_labels[j]==test_labels[j] else 0 for j in range(CLASS_NUM)]
I think the reward must be summed over all images in the batch size, so the j in the last line should be in range(len(test_labels))
Why was it sum over j in CLASS_NUM?
i hava no idea to solve the problem.
can you help me?
Thanks
Hi, I read both old and new version of the paper and found that the "shallow" RN in new version of the paper has lower accuracy than the one in older version of the paper, despite the number of layers and number of filters in each layer are exactly the same, what makes the accuracy become lower?
Also, In the older version of the paper, it report the "deeper" RN with much better performance, and stated that the model can benefit from a deeper structure, but when I added only 2 additional conv layer on embedding part of the model and trained it on mini-imagenet, it was overfitted after about 80000 episode and get poor test accuracy which is below 0.60, Am I missing any trick to make it deeper?
I have run the miniimagenet_train_few_shot.py, but the training process does not converge.
(base) C:\Users\B4-410\lpthw\LearningToCompare_FSL-master\miniimagenet>python miniimagenet_train_one_shot.py -w 5 -s 1 -b 15
init data folders
init neural networks
Traceback (most recent call last):
File "miniimagenet_train_one_shot.py", line 269, in
main()
File "miniimagenet_train_one_shot.py", line 150, in main
feature_encoder.load_state_dict(torch.load(str("./models/miniimagenet_feature_encoder_" + str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl")))
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 613, in _load
result = unpickler.load()
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 576, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 155, in default_restore_location
result = fn(storage, location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 131, in _cuda_deserialize
device = validate_cuda_device(location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 125, in validate_cuda_device
device, torch.cuda.device_count()))
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
Thank you for your awesome code.
Then I cannot find the csv files of miniImagenet dataset. Could you give me the links or the files?
Thanks.
Hello, I am trying to execute the miniimagenet code on different dataset (colored dataset). I have customized the batch number per class to 5, and resized the input images to 84*84. besides these I have not made any changes to code.
can you please help Why I am getting this error and how it can be fixed ?
Traceback (most recent call last):
File "miniimagenet_train_one_shot.py", line 275, in
main()
File "miniimagenet_train_one_shot.py", line 192, in main
relation_pairs = torch.cat((sample_features_ext,batch_features_ext),2).view(-1,FEATURE_DIM*2,19,19)
RuntimeError: shape '[-1, 128, 19, 19]' is invalid for input of size 46656000
I have already tried my best to solve this problem.
But I failed, can anyone give me some suggestions?
(py36) E:\GitHub\LearningToCompare_FSL>F:/Anaconda3/envs/py36/python.exe e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py
init data folders
init neural networks
Training...
F:\Anaconda3\envs\py36\lib\site-packages\torch\nn\functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py", line 264, in <module>
main()
File "e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py", line 188, in main
one_hot_labels = Variable(torch.zeros(BATCH_NUM_PER_CLASS*CLASS_NUM, CLASS_NUM).scatter_(1, batch_labels.view(-1,1), 1).cuda(GPU))
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'
Hi, Floodsung!
Your work is great. I want to know can the network apply to multiple objects of the same class or multiple objects of different class in one image? Thank you!
conda create -n py27 python=2.7
conda deactivate
conda activate py27
pip install https://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl
pip install torchvision==0.2.1
pip install matplotlib scipy
git clone https://github.com/brendenlake/omniglot.git
cd /LearningToCompare_FSL/datas
unzip omniglot_28x28.zip
cd /LearningToCompare_FSL/omniglot
python omniglot_train_one_shot.py -w 5 -s 1 -b 19
我是在矩池云上复现了LearningToCompare_FSL的环境,镜像选用 Tensorflow 1.4 因为他是 cuda8 的。
bash /public/script/switch_conda_source.sh
一键脚本获取:https://github.com/matpool/matools
conda create -n py27 python=2.7
conda deactivate
conda activate py27
接下来的任务是找 torch 0.3 的whl安装包,我从下面的链接中找到了
https://download.pytorch.org/whl/cu80/torch_stable.html
我这里是直接pip,复制下面的命令即可。
pip install https://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install torchvision==0.2.1
pip install matplotlib scipy
pip list
git clone https://github.com/brendenlake/omniglot.git
我这里用了一个github镜像来完成
git clone https://hub.fastgit.org/floodsung/LearningToCompare_FSL.git
cd LearningToCompare_FSL/
ls
cd /LearningToCompare_FSL/datas
unzip omniglot_28x28.zip
cd /LearningToCompare_FSL/omniglot
python omniglot_train_one_shot.py -w 5 -s 1 -b 19
nvidia-smi -l 5
really nice work. In the paper, obviously, omniglot_test_one_shot.py is used to get test results. I am confused about that testset is drawn from total dataset again without fixed random seed. It means that the testset contains some exapmles that have been used in traning phrase. So this is right? Can you give me some references?thanks。
change the image into rgb doesn't solve this problem
thanks for your code!
but a problem confuse me. why didn't you use feature_encoder.eval() and relation_network.eval() in your test code? it actually has an impact on the results.
In miniImagenet_train_few_shot.py and miniImagenet_test_few_shot.py line 15:
import task_generator_test as tg
in task_generator_test.py line 28 and 29:
train_folder = '../datas/miniImagenet/train'
test_folder = '../datas/miniImagenet/test'
it seems that you training and testing on the same dataset, without using the 'val' dataset.
Is this right to train and test in this way???
If I train with train_set and val_set, can I get the same accuracy score in test_set?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.