floodsung / learningtocompare_fsl Goto Github PK

PyTorch code for CVPR 2018 paper: Learning to Compare: Relation Network for Few-Shot Learning (Few-Shot Learning part)

License: MIT License

Python 100.00%

learningtocompare_fsl's Issues

result of paper

def mean_confidence_interval(data, confidence=0.95):
a = 1.0*np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * sp.stats.t._ppf((1+confidence)/2., n-1)
return m,h

hello,could you tell me this h means? Volatility value of Paper Result ?

Do model.train and model.eval need to be added？

Do model.train and model.eval need to be added during the training?Looking forwards your reply!!

Can I use contrastive loss here?

I wanted to check if we can use contrastive loss here, I tried but facing some errors. can anyone confirm and help?

When I begin to run the code, its cpu utilization is about 2000%, the training speed is acceptable. But with time going on, it falls to about 200% or 300%. And the GPU utilization is always low, not higher than 30%. So I think it runs mainly on cpu but not on gpu, and the former constrains the training speed. How to deal with this and accelerate the training speed?

Select the model based on testing accuracy？

Thank you for providing the code!
I have a concern about the model selection in your miniimagenet_train_few_shot.py.
Line 260: It seems that the best training model is selected as the one with best testing accuracy (not validation accuracy) ?

Versioning

Spent awhile working on this to test what torch versions you need to keep stuff from breaking.
Python2.7
torch==0.3.1 ##pip or conda install may fail for this, see https://pytorch.org/get-started/previous-versions/ to get a wheel file.
torchvision==0.2.0 ##These next two must be explicitly stated or else training fails.
torchtext==0.2.3

Pin your requirements people!

When running omniglot_train_one_shot.Time costs。

(using GPU) Running an episode needs 2 or 3 seconds But do I have to wait to train for 100,0000IEPISODE) episodes?it may costs many days...
Have you trained 100,0000(EPISODE) episodes to get the best parameters?

How can I find the task_generator_test.py ?

Thanks for your awsome codes which are very clear. But I can't find the module named task_generator_test. So I can't test the code by myself. Could you tell me where I can find it? Thanks again!

DS_Store Problem Solution

The quesition that DS_Store can be soloved by cd LearningToCompare_FSL-master and then
find . -name '*.DS_Store' -type f -delete in Terminal.

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 1 and 50 in dimension 0 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:111

When I run the python miniimagenet_train_few_shot.py, it doesnot work. I don't know how to fix it.

Test on different dataset, not miniimagenet, not omniglot

Does this work if the training is on miniimagenet or omniglot and test on customer dataset? I wonder how it "learn to compare" in this situation. Many implementation has use miniimagenet or omniglot the demonstrate the concept of FSL but they use the same for testing (with new classes). I wonder what happened if the new classes comes from total different dataset with different intensify/feature distribution.

How to ensure each category in support set is the same as that in query set?

Thanks for your code.
as the title mentioned, codes at line 170/171 in miniimagenet_train_few_shot.py and miniimagenet_train_one_shot.py :

sample_dataloader = tg.get_mini_imagenet_data_loader(task,num_per_class=SAMPLE_NUM_PER_CLASS,split="train",shuffle=False)
batch_dataloader = tg.get_mini_imagenet_data_loader(task,num_per_class=BATCH_NUM_PER_CLASS,split="test",shuffle=True)

confuse me a lot.
It seems that these codes do not guarantee that the categories in the two sets are the same？

Problem about .Ds_Store

Hi, I wonder if you run your code on a Mac? I run this code on ubuntu16.04 according to your rule. But is suggests that OSError: [Errno 20] Not a directory: '../datas/omniglot_resized/Alphabet_of_the_Magi/.DS_Store'. Obviously, there are something hidden in the file system. How should I fix such problem? Do you test your code on ubuntu? Thanks for your kindly help.

for text classification?

I add a embedding layer(28 dim) in encoder for text classification.
https://github.com/laohur/LearningToCompare_FSL

But the model do not converge. why?

About the testing problem

Nice work, but i found a problem that really confuse me.

As shown in the code omniglot_train_few_shot.py, both in the training and testing phase, the support set (i.e. sample_images) and evaluation set (i.e. test_images) are drawn from the same 5 classes (called one task). And as your way to calculate accuracy, it's easy to achieve ~99% during training.

Here i found a problem and I don't know why ? when I draw support set from one task and draw evaluation set from another task, apparently the two tasks contains different 5 classes.

So I presume that, I will get low confidences after feeding them into the network, but the results are not.
Here is the testing case i used:

** [TESTING set] CLASS_NUM=5, SAMPLE_NUM_PER_CLASS=5
the character classes are:
['Angelic/character11',
'Angelic/character11','
'Angelic/character11',
'Angelic/character11',
'Angelic/character11',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Syriac_(Serto)/character08',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Japanese_(hiragana)/character42',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Gujarati/character27',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09',
'Glagolitic/character09']
** [Support set] CLASS_NUM=5, SAMPLE_NUM_PER_CLASS=5
the character classes are:
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'N_Ko/character27',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Japanese_(katakana)/character18',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Oriya/character33',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tibetan/character14',
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'
'Tifinagh/character45'

** and i got the output confidences by probs, predict_labels = torch.max(relations.data, 1), as below:
0.9999995, 0.9999894, 0.00067013013, 1.0, 0.9999995,
0.0013619913, 0.45683807, 0.003507328, 0.99994755, 0.20433362,
0.9999981, 0.76437086, 0.4761213, 0.99345946, 0.25436476,
0.0002244339, 0.00026010931, 0.87288016, 1.8067769e-05, 0.00053879694,
1.0, 1.0, 1.0, 1.0, 1.0

It's readly weird, support and testing set have different classes, but the output confidences are so high.

If i randomly pick a image from the entire omniglot and I (assume) don't know its class, if I compare it with all possible support sets, how can I recognize its class, because the output confidences barely has discriminability.

Am i missing anything important or misunderstood ?

Data Leakage!

I found that there is a data leakage in the testing which leads to an increase in the accuracy of the model.

The model contains batch normalization and the batch normalization is supposed to behave differently for training and testing. Since we are testing we suppose to put the model in the model.eval() mode to make the batch normalization behave as in testing but since we didn't add this mode the batch normalization will behave as in the training. it's considered as data leakage and it's increasing the accuracy (for example the Omniglot 5-way 1-shot increases the accuracy from ~90% to 99.6%). Kindly, we need to check this error.

sample larger than population"

I am trying to run the following command:

python miniimagenet_train_one_shot.py -w 5 -s 1 -b 15

However, I get the following error:

Traceback (most recent call last):
  File "miniimagenet_train_one_shot.py", line 269, in <module>
    main()
  File "miniimagenet_train_one_shot.py", line 169, in main
    task = tg.MiniImagenetTask(metatrain_folders,CLASS_NUM,SAMPLE_NUM_PER_CLASS,BATCH_NUM_PER_CLASS)
  File "/content/gdrive1/My Drive/learn_to_learn/LearningToCompare_FSL/miniimagenet/task_generator.py", line 55, in __init__
    class_folders = random.sample(self.character_folders,self.num_classes)
  File "/usr/lib/python2.7/random.py", line 325, in sample
    raise ValueError("sample larger than population")
ValueError: sample larger than population

Can any one help me with this issue please?

Problem about the normalization part

Thank you for your awesome code.
There is a norm part for both miniimagenet and omniglot data,
"normalize = transforms.Normalize(mean=[0.92206, 0.92206, 0.92206], std=[0.08426, 0.08426, 0.08426])"
but why are the the mean and std values always the same?

What is the meaning of writing this line of code? can you tell me behind the reason?i can't understand why to use torch.sum()

sample_features = feature_encoder(Variable(samples)) # 25x6455
sample_features = sample_features.view(CLASS_NUM,SAMPLE_NUM_PER_CLASS,FEATURE_DIM,5,5) sample_features = torch.sum(sample_features,1).squeeze(1)

Question about mini imagenet dataset

how to get the csv file of mini-imagenet

how to get the csv file .i only get the image.zip file through the googledrive

self.train_labels = [labels[self.get_class(x)] for x in self.train_roots] KeyError: '..\\datas\\omniglot_28x28'

self.train_labels = [labels[self.get_class(x)] for x in self.train_roots]
KeyError: '..\datas\omniglot_28x28'

runing help

Well.Thanks for this code.I have found my Index"idx" out of the range ,the list "self.image roots[]" when i run this code every time.How could i solve this problem?

About the Data Normalize

Hi, Floodsung! Thanks for your great work firstly. Recently, I am reading your codes about rational networks. In your code. Dataset mean value and variance value in Omniglot dataset are same as the miniImageNet dataset. In my opinion, these values should be suited to their own data separately. Especially, Omniglot images are greyscaled images where MiniImageNet images are color images.
I wonder if this unsuitable normalization would affect the final results of classification.

Thanks!

Corrected depreciated functions and tested using ipython

As we know original version was implemented using python2 and pytorch 0.3.

So I corrected depreciated things to fix errors and warnings.

Furthermore, to understand how few-shot learning works, I tested using ipython.

I hope it is helpful to someone who just started to learn few-shot learning like me.

I implemented omniglot first and miniimagenet will be added soon.

Omniglot

If you liked it, please give a star ⭐ !

Problem about the feature concatenation

Thank you for your code.
I noticed that before feature concatenation, you sum the feature of support image for each class.
Could you please explain why not concatenate every single support image feature to qurey image feature?
I have tried that but the result are not as good as before.
You paper didn't refer to that sum operation so I'm puzzled.

MiniImageNet

Is omniglot_train_few_shot.py can also do one-shot?

I think 'python omniglot_train_one_shot.py -w 5 -s 1 -b 19' can be replaced by 'python omniglot_train_few_shot.py -w 5 -s 1 -b 19', is that right?

If so, maybe there is no need to use omniglot_train_one_shot.py.

Train on my own dataset

how should i train it on my own datasets?

There were some errors when I ran this code and hope can get hlep

Thank you for your awesome code.
There were some errors when I ran this code and I didn't know what to do.I hope to get some tips from you, thanks.The following is the problem record:

init data folders
init neural networks
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.cu line=58 error=30 : unknown error
Traceback (most recent call last):
File "/home/jesse/workspace/jisheng/LearningToCompare_FSL-master/omniglot/omniglot_train_one_shot.py", line 257, in
main()
File "/home/jesse/workspace/jisheng/LearningToCompare_FSL-master/omniglot/omniglot_train_one_shot.py", line 133, in main
feature_encoder.cuda(GPU)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
param.data = fn(param.data)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in
return self._apply(lambda t: t.cuda(device))
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 69, in cuda
return new_type(self.size()).copy(self, async)
File "/home/jesse/anaconda2/lib/python2.7/site-packages/torch/cuda/init.py", line 361, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.cu:58

System Configuration: gtx1080ti, cuda8.0, cudnn v5.1
Torch and Torchvision installed by the conda command
Torch:0.3.0
Torchvision:0.2.0
error.txt
thanks very much

Question about calculating accuracy

In calculating accuracy of test dataset:
https://github.com/floodsung/LearningToCompare_FSL/blob/master/omniglot/omniglot_train_one_shot.py#L237

sample_images,sample_labels = sample_dataloader.__iter__().next()                
test_images,test_labels = test_dataloader.__iter__().next()
            
sample_features = feature_encoder(Variable(sample_images).cuda(GPU)) # 5x64                
test_features = feature_encoder(Variable(test_images).cuda(GPU)) # 20x64
                            
sample_features_ext = sample_features.unsqueeze(0).repeat(SAMPLE_NUM_PER_CLASS*CLASS_NUM,1,1,1,1)                test_features_ext = test_features.unsqueeze(0).repeat(SAMPLE_NUM_PER_CLASS*CLASS_NUM,1,1,1,1)                test_features_ext = torch.transpose(test_features_ext,0,1)
relation_pairs = torch.cat((sample_features_ext,test_features_ext),2).view(-1,FEATURE_DIM*2,5,5)                relations = relation_network(relation_pairs).view(-1,CLASS_NUM)
 _,predict_labels = torch.max(relations.data,1)
 rewards = [1 if predict_labels[j]==test_labels[j] else 0 for j in range(CLASS_NUM)]

I think the reward must be summed over all images in the batch size, so the j in the last line should be in range(len(test_labels))

Why was it sum over j in CLASS_NUM?

IndexError: scatter_(): Expected dtype int64 for index.

i hava no idea to solve the problem.
can you help me?
Thanks

some questions regarding the accuracies

Hi, I read both old and new version of the paper and found that the "shallow" RN in new version of the paper has lower accuracy than the one in older version of the paper, despite the number of layers and number of filters in each layer are exactly the same, what makes the accuracy become lower?

Also, In the older version of the paper, it report the "deeper" RN with much better performance, and stated that the model can benefit from a deeper structure, but when I added only 2 additional conv layer on embedding part of the model and trained it on mini-imagenet, it was overfitted after about 80000 episode and get poor test accuracy which is below 0.60, Am I missing any trick to make it deeper?

The training process does not converge?

I have run the miniimagenet_train_few_shot.py, but the training process does not converge.

RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

(base) C:\Users\B4-410\lpthw\LearningToCompare_FSL-master\miniimagenet>python miniimagenet_train_one_shot.py -w 5 -s 1 -b 15
init data folders
init neural networks
Traceback (most recent call last):
File "miniimagenet_train_one_shot.py", line 269, in
main()
File "miniimagenet_train_one_shot.py", line 150, in main
feature_encoder.load_state_dict(torch.load(str("./models/miniimagenet_feature_encoder_" + str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl")))
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 613, in _load
result = unpickler.load()
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 576, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 155, in default_restore_location
result = fn(storage, location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 131, in _cuda_deserialize
device = validate_cuda_device(location)
File "D:\Anaconda3\lib\site-packages\torch\serialization.py", line 125, in validate_cuda_device
device, torch.cuda.device_count()))
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

Where to find the csv files of miniImagenet?

Thank you for your awesome code.
Then I cannot find the csv files of miniImagenet dataset. Could you give me the links or the files?
Thanks.

RuntimeError: shape '[-1, 128, 19, 19]' is invalid for input of size 46656000 on miniimagenet code

Hello, I am trying to execute the miniimagenet code on different dataset (colored dataset). I have customized the batch number per class to 5, and resized the input images to 84*84. besides these I have not made any changes to code.
can you please help Why I am getting this error and how it can be fixed ?

Traceback (most recent call last):
File "miniimagenet_train_one_shot.py", line 275, in
main()
File "miniimagenet_train_one_shot.py", line 192, in main
relation_pairs = torch.cat((sample_features_ext,batch_features_ext),2).view(-1,FEATURE_DIM*2,19,19)
RuntimeError: shape '[-1, 128, 19, 19]' is invalid for input of size 46656000

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'

I have already tried my best to solve this problem.
But I failed, can anyone give me some suggestions?

(py36) E:\GitHub\LearningToCompare_FSL>F:/Anaconda3/envs/py36/python.exe e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py
init data folders
init neural networks
Training...
F:\Anaconda3\envs\py36\lib\site-packages\torch\nn\functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py", line 264, in <module>
    main()
  File "e:/GitHub/LearningToCompare_FSL/omniglot/omniglot_train_few_shot.py", line 188, in main
    one_hot_labels = Variable(torch.zeros(BATCH_NUM_PER_CLASS*CLASS_NUM, CLASS_NUM).scatter_(1, batch_labels.view(-1,1), 1).cuda(GPU))
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'

multiple classification?

Hi, Floodsung!
Your work is great. I want to know can the network apply to multiple objects of the same class or multiple objects of different class in one image? Thank you!

💪 Reproduce LearningToCompare_FSL environment on Ubuntu 16.04 CUDA 8.0

English

conda create -n py27 python=2.7
conda deactivate
conda activate py27

pip install https://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl
pip install torchvision==0.2.1
pip install matplotlib scipy

git clone https://github.com/brendenlake/omniglot.git
cd /LearningToCompare_FSL/datas
unzip omniglot_28x28.zip
cd /LearningToCompare_FSL/omniglot
python omniglot_train_one_shot.py -w 5 -s 1 -b 19

中文

我是在矩池云上复现了LearningToCompare_FSL的环境，镜像选用 Tensorflow 1.4 因为他是 cuda8 的。

切换conda源

bash /public/script/switch_conda_source.sh

一键脚本获取：https://github.com/matpool/matools

创建虚拟python环境

conda create -n py27 python=2.7

conda deactivate
conda activate py27

安装 torch 0.3

接下来的任务是找 torch 0.3 的whl安装包，我从下面的链接中找到了

https://download.pytorch.org/whl/cu80/torch_stable.html

我这里是直接pip，复制下面的命令即可。

pip install https://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install torchvision==0.2.1
pip install matplotlib scipy

pip list

拉取github库

git clone https://github.com/brendenlake/omniglot.git

我这里用了一个github镜像来完成

git clone https://hub.fastgit.org/floodsung/LearningToCompare_FSL.git
cd LearningToCompare_FSL/
ls

解压文件并测试运行

cd /LearningToCompare_FSL/datas
unzip omniglot_28x28.zip
cd /LearningToCompare_FSL/omniglot
python omniglot_train_one_shot.py -w 5 -s 1 -b 19

查看有没有使用到gpu

nvidia-smi -l 5

查看文章

矩池云上如何加速 GitHub 下载？

矩池云上执行 conda install 的时候下载特别慢怎么办？怎么切换源？

https://pytorch.org/get-started/previous-versions/

Hi, I have a problem about test results。

really nice work. In the paper, obviously, omniglot_test_one_shot.py is used to get test results. I am confused about that testset is drawn from total dataset again without fixed random seed. It means that the testset contains some exapmles that have been used in traning phrase. So this is right? Can you give me some references？thanks。

it seems that you training and testing on the same dataset, without using the 'val' dataset.

Is this right to train and test in this way???

If I train with train_set and val_set, can I get the same accuracy score in test_set?