bingykang / fewshot_detection Goto Github PK
View Code? Open in Web Editor NEWFew-shot Object Detection via Feature Reweighting
Home Page: https://arxiv.org/abs/1812.01866
Few-shot Object Detection via Feature Reweighting
Home Page: https://arxiv.org/abs/1812.01866
What should weightfile be? I try to use 1 weight, for example 000080.weights, to evaluate the model. But the result of each class is 0. And I try to use all 8 weights by toweightfile. The error is "shape '[32, 3, 3, 3]' is invalid for input of size 85". I don't have any solution.
I am trying to implement the code in Google Colab. I am getting this error, I had a similar issue in cgf.py
but I solved it.
Below is the output and error that I am getting after running train_meta.py
!python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 4 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
6 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
7 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
8 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
9 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
10 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
11 max 2 x 2 / 2 13 x 13 x1024 -> 6 x 6 x1024
12 conv 1024 3 x 3 / 1 6 x 6 x1024 -> 6 x 6 x1024
13 glomax 6 x 6 / 1 6 x 6 x1024 -> 1 x 1 x1024
1 14554 80200 32
10
===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
===> Number of samples (before filtring): 14554
===> Number of samples (after filtring): 14554
('num classes: ', 15)
2020-07-03 08:55:33 epoch 0/177, processed 0 samples, lr 0.000033
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "train_meta.py", line 328, in
train(epoch)
File "train_meta.py", line 223, in train
loss = region_loss(output, target)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/Fewshot_Detection/region_loss.py", line 294, in forward
pred_boxes[0] = x.data + grid_x
RuntimeError: The size of tensor a (13) must match the size of tensor b (70135) at non-singleton dimension 3
I can see that the repository code has been trained on 448x448 and this works fine on Pascal VOC.
Now if i want to adapt the code to another dataset which has a ratio of 16:9, I can modify the input width, height to roughly -> 768, 448
Now because of this the final output layer of the reweighting network becomes 1x0x1024
layer filters size input output
0 conv 32 3 x 3 / 1 768 x 448 x 4 -> 768 x 448 x 32
1 max 2 x 2 / 2 768 x 448 x 32 -> 384 x 224 x 32
2 conv 64 3 x 3 / 1 384 x 224 x 32 -> 384 x 224 x 64
3 max 2 x 2 / 2 384 x 224 x 64 -> 192 x 112 x 64
4 conv 128 3 x 3 / 1 192 x 112 x 64 -> 192 x 112 x 128
5 max 2 x 2 / 2 192 x 112 x 128 -> 96 x 56 x 128
6 conv 256 3 x 3 / 1 96 x 56 x 128 -> 96 x 56 x 256
7 max 2 x 2 / 2 96 x 56 x 256 -> 48 x 28 x 256
8 conv 512 3 x 3 / 1 48 x 28 x 256 -> 48 x 28 x 512
9 max 2 x 2 / 2 48 x 28 x 512 -> 24 x 14 x 512
10 conv 1024 3 x 3 / 1 24 x 14 x 512 -> 24 x 14 x1024
11 max 2 x 2 / 2 24 x 14 x1024 -> 12 x 7 x1024
12 conv 1024 3 x 3 / 1 12 x 7 x1024 -> 12 x 7 x1024
13 glomax 12 x 12 / 1 12 x 7 x1024 -> 1 x 0 x1024
What is the right way to handle this change?
Traceback (most recent call last):
File "train_meta.py", line 87, in
model.load_weights(weightfile)
File "/home/zjp/Fewshot_Detection/darknet_meta.py", line 381, in load_weights
start = load_conv_bn(buf, start, model[0], model[1])
File "/home/zjp/Fewshot_Detection/cfg.py", line 461, in load_conv_bn
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
RuntimeError: The size of tensor a (3) must match the size of tensor b (864) at non-singleton dimension 3
sorry, When I loaded your pretrained model darknet19_448.conv.23, It came out the above problem.
Thank you!
Nice released code. Thanks.
Could you share a weight file for the proposed model?
Thanks for this released code and nice instructions for training on VOC datasets. I am wondering what should be modified and preprocessed for reproducing your results on COCO datasets, like the preprocessing of labels and any critical hyperparameters. Thanks and waiting for your response!
Hi,
I observed that the number of samples in the Support (context) set for most of the classes are less than the actual number. Is there a reason for this?
e.g. https://github.com/bingykang/Fewshot_Detection/blob/master/data/vocsplit/box_10shot_boat_train.txt contains 6 samples.
This pattern repeats throughout the data folder.
Thanks!
Hi, thanks for your sharing, but I have two questions.
In training phase, we'll concat masks and images before input them to meta model. However, the mask information is come from ground truth label, and we won't have it in testing phase. So when we testing, the inputs for the meta model are totally different, how can we solve this or I have any misunderstanding?
In testing phase, after we got N set of reweighting coefficients, how to know which coefficients in the N set should we use for the testing sample?
Hope you can help me to clarify this question, thanks.
Sorry for troubling you. When I run train_meta.py and load weightfile, a runtimeerror occured:
RuntimeErrorTraceback (most recent call last)
in ()
14 region_loss = model.loss
15
---> 16 model.load_weights(weightfile)
17 model.print_network()
~/lkj项目/FSD_yolo/darknet_meta.py in load_weights(self, weightfile)
376 batch_normalize = int(block['batch_normalize'])
377 if batch_normalize:
--> 378 start = load_conv_bn(buf, start, model[0], model[1])
379 else:
380
~/lkj项目/FSD_yolo/cfg.py in load_conv_bn(buf, start, conv_model, bn_model)
453 bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
454 bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
--> 455 conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
456 return start
457
RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3. Target sizes: [32, 3, 3, 3]. Tensor sizes: [864]
Do you know what's wrong with this? Thank you so much.
Looking forward to the code release of "Few-shot Object Detection via Feature Reweighting"
I use two gpus, the other configurations is the same as the author, why is my performance poor?
The following figure is the eval of 500 epochs for base model.
The following figure is the eval of 5 epochs for the fine-tune.
The following figure is the eval of 10 epochs for the fine-tune.
@bingykang
I think FT performance could be dependent on samples of novel class at Fine-tuning phase.
Did you select the samples totally randomly to produce the paper's result?
I have been trying to train a base model for some time now.
I have had issues with the version of pytorch the code was built on. 0.3.1 would not work with CUDA versions past 8.0. But my GeGorce RTX 2080 would not work with CUDA versions below 9.0.
I managed to have the code base work with PyTorch 0.4.0 and 0.4.1, with CUDA 10.1.
I have two GPUs, each with 10986MB. I managed to have the base training run for many epochs, but then my whole machine would shut down all of the sudden, through the training. I suspect this is because of my RAM.
I did have to reduce the batch size and subdivisions, to get the training to start.
But this is all to say that I am not able to get a base model, and I am wondering if there is anyone who has a model to share?
I will commit my code for PyTorch >= 0.4.0 soon, on my fork, but it would be so nice to have weights I could use.
paper:The second phase is few-shot fine-tuning. In this phase, we train the model on both base and novel classes. As only k labeled bounding boxes are available for the novel classes, to balance between samples from the base and novel classes, we also include k boxes for each base class. (3.2. Learning Scheme Section ).
but the code that Feature Extractor learner label is large data (from train = /home/bykang/voc/voc_train.txt) In the few-shot fine-tuning phase. Do you have any suggestion or solution? can you help me? thanks.
I hope to see the code soon!
Anyone have a backup file in your program?
python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23 /home/myh/Documents/program/few-shot-learning/Fewshot_Detection-master/data/coco.names save_interval 10 ['bird', 'bus', 'cow', 'motorbike', 'sofa'] base_ids [0, 1, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19] logging to backup/metayolo_novel0_neg1 Traceback (most recent call last): File "train_meta.py", line 79, in <module> os.mkdir(backupdir) FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'
excuse me, '/home/bykang/voc/voc_train.tx' , Does this file contain 20 classes? It includes all the novel samples. Does it need to be 15 classes? all the 'train' samples should be the same with the 'meta' samples ? @bingykang
hello, @bingykang ,
I am very interested in this work. I have run the voc_label_1c.py and got the support set for test. However, I wonder how I can get the query set for test. You have mentioned that you used 3 splits for meta training and test. I want to be consistent with you in terms of experimental settings. So can you help me to solve this problem? Thanks very much!
Best Regard
Yukuan Yang
I've converted the coco_dataset into the voc style, the flag txt in ImageSets for each class and rewrite the label and label_1c for coco which generates labels' txt.
I think it's not easy like this
Actually, the data split for coco is also released in folder "data". Just change the dataset config and number of classes, everything should be good.
Though you give the process_coco.py
, it dosen't work, and i think it misses the flag txt for each class
And the batchsize setting:
batch=64
subdivisions=8
will fill with memory and raise the out of memory
error.
Now I change the batch_size setting:
batch=8
subdivisions=8
even though the smallest batch_size:
batch=4
subdivisions=4
It can forward successfuly, but i met the same out of memory
in backward:
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_meta.py", line 344, in <module>
train(epoch)
File "train_meta.py", line 242, in train
loss.backward()
File "/home/aringsan/anaconda2/envs/pytorch2/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/aringsan/anaconda2/envs/pytorch2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
I want to know your coco_setting and your hardware, I use 4 Titan XP whose memory is 12G
Thanks
hello, after fine tuning , how to detect an picture using the weight file.
When I run
python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23
A FileNotFoundError occured:
FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'
Do you know what's wrong? Thank you so much.
Excese me ,thanks for your code,butwhen i run the base train,i have 'a question like that'===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "train_meta.py", line 142, in
model = torch.nn.DataParallel(model).cuda()
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
param.data = fn(param.data)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in
return self._apply(lambda t: t.cuda(device))
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/_utils.py", line 69, in cuda
return new_type(self.size()).copy(self, async)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/cuda/init.py", line 384, in _lazy_new
_lazy_init()
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/cuda/init.py", line 142, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/torch/lib/THC/THCGeneral.c:70'
my cuda is 9.1,is it not fit the python2.7 and torch0.3.1?Looking forward to your reply.Thank you.
Sorry for troubling you, but I don't know how to compute loss for meta-model and feature-extractor.
In my idea, we'll have predict vector with shape (B, N, 13, 13, A, (5+N)) after feature reweighting, where B is batch, N is classes and A is anchors. If so, should I split my ground truth to N vectors according to different classes and compute loss for each channel of N in predict vector?
And the second question is the loss for meta-model and feature-extractor is the same one?
I fondly anticipate your reply, thanks.
Sorry for troubling you. When I run python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23,a runtimeerror occured:
Traceback (most recent call last):
File "train_meta.py", line 325, in
train(epoch)
File "train_meta.py", line 218, in train
output = model(data, metax, mask)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 199, in forward
dynamic_weights = self.meta_forward(metax, mask)
File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 122, in meta_forward
metax = model(metax)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
I am reproducing the result using the instruction provided in the README file.
I am training base model with 1 GeForce GTX 1080 Ti with 12GB of memory. I modify batch_size=32.
when it runs about 20 epoches, cuda run time error occurs.
2020-05-29 13:14:00 epoch 20/177, processed 291080 samples, lr 0.000333
291112: nGT 77, recall 66, proposals 235, loss: x 2.222131, y 2.640358, w 2.185382, h 1.743314, conf 52.697956, cls 99.193832, total 160.682968
291144: nGT 77, recall 62, proposals 243, loss: x 1.478266, y 1.245305, w 2.208532, h 0.684470, conf 43.636837, cls 76.594849, total 125.848259
291176: nGT 70, recall 63, proposals 243, loss: x 1.873798, y 1.179447, w 1.839549, h 1.049649, conf 52.927620, cls 101.017876, total 159.887939
291208: nGT 75, recall 67, proposals 175, loss: x 1.820341, y 1.697263, w 1.052775, h 0.799489, conf 50.626663, cls 113.858749, total 169.855286
291240: nGT 105, recall 93, proposals 253, loss: x 3.521058, y 2.495901, w 3.214825, h 2.059216, conf 74.303398, cls 172.366638, total 257.961029
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_meta.py", line 325, in <module>
train(epoch)
File "train_meta.py", line 223, in train
loss.backward()
File "/home/super/anaconda3/envs/torch0.3.1/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/super/anaconda3/envs/torch0.3.1/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:5
How can I solve this problem.
Thanks:)
When I trained the base model, I observed that the memory usage continued to rise and finally reached the maximum limit and the process was killed. My RAM is 250GB and I used four 16GB Tesla V100 GPU. I tried to decrease the batch size but it did not make any difference. I wonder how much memory you used during the base training and if there is something wrong I did. Thank you.
@bingykang Would you mind elaborating a little more on the results, in term of interpretation of them?
For instance in the second row what do "0.5:0.95", "S M L", and the "1 10 100" refer to? Sorry I couldn't seem to find related explanation of these notations in the paper. Let me know if I missed something. Thanks.
Hi,
I've trained the model following the instructions, after finetuning on novel classes, the AP results are as follows:
AP for aeroplane = 0.6602
AP for bicycle = 0.4766
AP for bird = 0.3573
AP for boat = 0.4778
AP for bottle = 0.3153
AP for bus = 0.2155
AP for car = 0.6892
AP for cat = 0.8144
AP for chair = 0.3652
AP for cow = 0.3928
AP for diningtable = 0.5538
AP for dog = 0.6887
AP for horse = 0.6958
AP for motorbike = 0.4364
AP for person = 0.6416
AP for pottedplant = 0.2839
AP for sheep = 0.5383
AP for sofa = 0.3712
AP for train = 0.7394
AP for tvmonitor = 0.6614
~~~~~~~~
Mean AP = 0.5187
Mean Base AP = 0.5734
Mean Novel AP = 0.3546
Then, I want to make predictions using pre-trained weights, but I found it in darknet_dynamic.cfg, classes=1
. I modify classes=20
during inference in order to make predictions on VOC.
but the result is worse.
how can I use pre-trained weight to make the right predictions?
thanks:)
I have finished trained the model, but when I tried to evaluate the model, it printed out 'CUDA out of memory'. I have 2 GPUs and 32GB. But I can't use 2GPUs to evaluate. It evaluated with 1GPU every time even though I have changed gpu to "0,1" in valid_ensemble.py. Do you have any suggestion or solution?
Hi, sorry to bother you! I ran into the following error when trying to train the base model. I am using pytorch 0.3.1 and python 2.7. I attached the full log of stdout and the modified code to print out the size.
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 4 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
6 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
7 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
8 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
9 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
10 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
11 max 2 x 2 / 2 13 x 13 x1024 -> 6 x 6 x1024
12 conv 1024 3 x 3 / 1 6 x 6 x1024 -> 6 x 6 x1024
13 glomax 6 x 6 / 1 6 x 6 x1024 -> 1 x 1 x1024
1 14554 80200 64
10
===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
===> Number of samples (before filtring): 14554
===> Number of samples (after filtring): 14554
('num classes: ', 15)
2019-12-02 20:30:00 epoch 0/353, processed 0 samples, lr 0.000033
('nA', 5)
('nC', 1)
('nH', 13)
('nW', 13)
('bs', 64)
('cs', 15)
('output.shape', (1280L, 30L, 13L, 13L))
('cls.shape', (1280L, 5L, 6L, 13L, 13L))
('cls.shape', (1280L, 5L, 13L, 13L))
Traceback (most recent call last):
File "train_meta.py", line 325, in
train(epoch)
File "train_meta.py", line 221, in train
loss = region_loss(output, target)
File "/home/quan/miniconda3/envs/featurereweight/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/quan/few_shot/Fewshot_Detection/region_loss.py", line 277, in forward
cls = cls.view(bs, cs, nA * nC * nH * nW).transpose(1, 2).contiguous().view(bs * nA * nC * nH * nW, cs)
RuntimeError: invalid argument 2: size '[64 x 15 x 845]' is invalid for input with 1081600 elements at /opt/conda/conda-bld/pytorch_1518238581238/work/torch/lib/TH/THStorage.c:41
--- Please find below the sequences of print statement
print('nA', nA)
print('nC', nC)
print('nH', nH)
print('nW', nW)
print('bs', bs)
print('cs', cs)
print('output.shape', output.shape)
cls = output.view(output.size(0), nA, (5 + nC), nH, nW)
print('cls.shape', cls.shape)
cls = cls.index_select(2, Variable(torch.linspace(5, 5 + nC - 1, nC).long().cuda())).squeeze()
print('cls.shape', cls.shape)
cls = cls.view(bs, cs, nA * nC * nH * nW).transpose(1, 2).contiguous().view(bs * nA * nC * nH * nW, cs)
print('cls.shape', cls.shape)
I cound't find the metayolo.data file in data directory,but i found it in the cfg directory,should i copy this file from cfg to data?
I have finished trained the model, but when I tried to evaluate the model, it printed out 'CUDA out of memory'. I have 2 GPUs and 32GB. But I can't use 2GPUs to evaluate. It evaluated with 1GPU every time even though I have changed gpu to "0,1" in valid_ensemble.py. Do you have any suggestion or solution?
Hi, thanks for your sharing,
Because I want to evaluate training results each epoch, so I uncomment test(epoch) in the train_meta.py.
But I got the error like this.
And I found the shape of output from model(data, metax, mask).data is (480,30,13,13) and the shape for target is (32,15,250) , so the index error happened. (I changed batch_size to 32)
Do you know how to solve this error? Thanks
I've prepare the coco dataset (I think it's not easy, i've get each image2class_flag txt and change many things, i spend a long time modifing the label_voc/_1.py to fit coco dataset), it may be work.
But now i met problem just in loading weights
I run python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23
, but get:
/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/data/coco.names
('save_interval', 2)
['airplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'dining table', 'dog', 'horse', 'motorcycle', 'person', 'potted plant', 'sheep', 'couch', 'train', 'tv']
('base_ids', [7, 9, 10, 11, 12, 13, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 59, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
logging to backup/metayolo_novel0_neg1
('class_scale', 1)
Traceback (most recent call last):
File "train_meta.py", line 87, in <module>
model.load_weights(weightfile)
File "/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/darknet_meta.py", line 378, in load_weights
start = load_conv_bn(buf, start, model[0], model[1])
File "/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/cfg.py", line 461, in load_conv_bn
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
RuntimeError: The size of tensor a (3) must match the size of tensor b (864) at non-singleton dimension 3
Thanks
I am reproducing the result using the instruction provided in the README file.
I was able to train the base model and obtain AP of 0.6862, which matches what the paper reports. However, when I tried to run the fine-tuning command, the process exits with an out of memory error for the backward pass.
I am training with 4 GeForce GTX 1080 Ti with roughly 12Gb of memory. Did you use GPUs with more memory or is something weird happening?
I wonder if the results on paper are single value (from single few-sampled set) or averaged value.
If it is average, how many times did you iterate it with shuffle?
Best regards,
For those who face GPU memory issues, using VOC to train this model, with no adjustments will need approximately 8GB ram for each GPU.
Could you please let me know how much memory your model needs for training?
My GPU has 12GB but I fail to start training though I put nw = 0 or 1 and down batchsize of yolo to 16 or 8?
When I run
python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23
A runtime error occured
2019-10-18 23:20:07 epoch 0/353, processed 0 samples, lr 0.000033
Traceback (most recent call last):
File "train_meta.py", line 325, in <module> train(epoch)
File "train_meta.py", line 218, in train output = model(data, metax, mask) File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home1/ubuntu/project/Fewshot_Detection/darknet_meta.py", line 199, in forward
dynamic_weights = self.meta_forward(metax, mask)
File "/home1/ubuntu/project/Fewshot_Detection/darknet_meta.py", line 122, in meta_forward
metax = model(metax)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 89, in conv2d
torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)
Do you know what's wrong? Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.