bingykang / fewshot_detection Goto Github PK

View Code? Open in Web Editor NEW

525.0 525.0 110.0 121 KB

Few-shot Object Detection via Feature Reweighting

Home Page: https://arxiv.org/abs/1812.01866

Python 81.83% Makefile 0.34% C 9.35% C++ 0.41% Cuda 8.07%

fewshot_detection's People

Contributors

Stargazers

Watchers

Forkers

crh19970307 bruinxiong xtczq dl-alva xiaoweihu xiangyi1996 xychen9459 zhengfangwu xiaoshitou4 christegho wufaming polysider jiangbingqing zhongdajian ninazizi agholian eliax1996 sainisanjay barchid artificially-ai jingwenzzhu william-vu gandad amieezhou cxj273 saqibmamoon peace-zy zmbhou ailty william9527wn yihong11 atticusjohnson bobrown youtang1993 raghavgoyal14 crossxxd wayne980 weixingithubjiang arpit196 danieldworakowski largefishpku leozhang97 houliping stephanessy yczlab zhushaoquan swimmingswam dhw-master jakezdk buaali fengqiliu1221 yzlrc asd183831 leeeejunnnn crystalsixone qinzhengmei mirrola allenstin muhammad-talha-mt ve-yyq d-misra deeplearning666 joseph9303 liuxinyu12378 lemonfmr zpx16900 ypy516478793 jayeshk10 rohitgarg2411 aja9675 amirunpri2018 yale1417 rongisdragon tomnevercry tualgfhite nayoung1124 weix-liu cvjie scott-mao yujuan-ge samxiaosheng danke1896 snm511 ojipadeson duanhan123 zml110120 xyqqqfs ladbj korenmary hugodegeorges erica-yang varlit sarielwxm akhilgakhar yomik-js wei-baldwin-zeng imyjx woojp hanwhapaullee ragini9m

fewshot_detection's Issues

result is 0

What should weightfile be? I try to use 1 weight, for example 000080.weights, to evaluate the model. But the result of each class is 0. And I try to use all 8 weights by toweightfile. The error is "shape '[32, 3, 3, 3]' is invalid for input of size 85". I don't have any solution.

ValueError: 'data/voc_novels.txt // file contains novel splits' is not in list

run：python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23

RuntimeError: The size of tensor a (13) must match the size of tensor b (70135) at non-singleton dimension 3

I am trying to implement the code in Google Colab. I am getting this error, I had a similar issue in cgf.py but I solved it.
Below is the output and error that I am getting after running train_meta.py
!python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23

/content/Fewshot_Detection/data/coco.names
('save_interval', 10)
['bird', 'bus', 'cow', 'motorbike', 'sofa']
('base_ids', [0, 1, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19])
logging to backup/metayolofix_novel0_neg1
('class_scale', 1)
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 dconv 1024 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x1024
31 conv 30 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 30
32 detection

layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 4 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
6 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
7 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
8 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
9 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
10 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
11 max 2 x 2 / 2 13 x 13 x1024 -> 6 x 6 x1024
12 conv 1024 3 x 3 / 1 6 x 6 x1024 -> 6 x 6 x1024
13 glomax 6 x 6 / 1 6 x 6 x1024 -> 1 x 1 x1024
1 14554 80200 32
10
===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
===> Number of samples (before filtring): 14554
===> Number of samples (after filtring): 14554
('num classes: ', 15)
2020-07-03 08:55:33 epoch 0/177, processed 0 samples, lr 0.000033
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "train_meta.py", line 328, in
train(epoch)
File "train_meta.py", line 223, in train
loss = region_loss(output, target)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/Fewshot_Detection/region_loss.py", line 294, in forward
pred_boxes[0] = x.data + grid_x
RuntimeError: The size of tensor a (13) must match the size of tensor b (70135) at non-singleton dimension 3

The model is not start training?

I try to start training following your instructions. However, after loading model, I see that it is here to stay and do nothing like this:

and the code is here:

Do you know this problem? Thank you so much!

Training for 16:9 aspect ratio

I can see that the repository code has been trained on 448x448 and this works fine on Pascal VOC.

Now if i want to adapt the code to another dataset which has a ratio of 16:9, I can modify the input width, height to roughly -> 768, 448

Now because of this the final output layer of the reweighting network becomes 1x0x1024

layer filters size input output
0 conv 32 3 x 3 / 1 768 x 448 x 4 -> 768 x 448 x 32
1 max 2 x 2 / 2 768 x 448 x 32 -> 384 x 224 x 32
2 conv 64 3 x 3 / 1 384 x 224 x 32 -> 384 x 224 x 64
3 max 2 x 2 / 2 384 x 224 x 64 -> 192 x 112 x 64
4 conv 128 3 x 3 / 1 192 x 112 x 64 -> 192 x 112 x 128
5 max 2 x 2 / 2 192 x 112 x 128 -> 96 x 56 x 128
6 conv 256 3 x 3 / 1 96 x 56 x 128 -> 96 x 56 x 256
7 max 2 x 2 / 2 96 x 56 x 256 -> 48 x 28 x 256
8 conv 512 3 x 3 / 1 48 x 28 x 256 -> 48 x 28 x 512
9 max 2 x 2 / 2 48 x 28 x 512 -> 24 x 14 x 512
10 conv 1024 3 x 3 / 1 24 x 14 x 512 -> 24 x 14 x1024
11 max 2 x 2 / 2 24 x 14 x1024 -> 12 x 7 x1024
12 conv 1024 3 x 3 / 1 12 x 7 x1024 -> 12 x 7 x1024
13 glomax 12 x 12 / 1 12 x 7 x1024 -> 1 x 0 x1024

What is the right way to handle this change?

could you please regroup you codes?

About the loading model

Traceback (most recent call last):
File "train_meta.py", line 87, in
model.load_weights(weightfile)
File "/home/zjp/Fewshot_Detection/darknet_meta.py", line 381, in load_weights
start = load_conv_bn(buf, start, model[0], model[1])
File "/home/zjp/Fewshot_Detection/cfg.py", line 461, in load_conv_bn
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
RuntimeError: The size of tensor a (3) must match the size of tensor b (864) at non-singleton dimension 3

sorry, When I loaded your pretrained model darknet19_448.conv.23, It came out the above problem.
Thank you!

Could you share a weight file?

Nice released code. Thanks.
Could you share a weight file for the proposed model?

Could you release instructions on reproducing your results on COCO dataset as well?

Thanks for this released code and nice instructions for training on VOC datasets. I am wondering what should be modified and preprocessed for reproducing your results on COCO datasets, like the preprocessing of labels and any critical hyperparameters. Thanks and waiting for your response!

Support (context) set cardinality less than the expected number of shots

Hi,

I observed that the number of samples in the Support (context) set for most of the classes are less than the actual number. Is there a reason for this?

e.g. https://github.com/bingykang/Fewshot_Detection/blob/master/data/vocsplit/box_10shot_boat_train.txt contains 6 samples.

This pattern repeats throughout the data folder.

Thanks!

train error like this:

train error like this, i trained on python2.7, can you help me fix it ?

Questions for implement.

Hi, thanks for your sharing, but I have two questions.

In training phase, we'll concat masks and images before input them to meta model. However, the mask information is come from ground truth label, and we won't have it in testing phase. So when we testing, the inputs for the meta model are totally different, how can we solve this or I have any misunderstanding?
In testing phase, after we got N set of reweighting coefficients, how to know which coefficients in the N set should we use for the testing sample?

Hope you can help me to clarify this question, thanks.

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3. Target sizes: [32, 3, 3, 3]. Tensor sizes: [864]

Sorry for troubling you. When I run train_meta.py and load weightfile, a runtimeerror occured:

logging to backup/metayolo_novel0_neg1
class_scale 1

RuntimeErrorTraceback (most recent call last)
in ()
14 region_loss = model.loss
15
---> 16 model.load_weights(weightfile)
17 model.print_network()

~/lkj项目/FSD_yolo/darknet_meta.py in load_weights(self, weightfile)
376 batch_normalize = int(block['batch_normalize'])
377 if batch_normalize:
--> 378 start = load_conv_bn(buf, start, model[0], model[1])
379 else:
380

~/lkj项目/FSD_yolo/cfg.py in load_conv_bn(buf, start, conv_model, bn_model)
453 bn_model.running_mean.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
454 bn_model.running_var.copy_(torch.from_numpy(buf[start:start+num_b])); start = start + num_b
--> 455 conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
456 return start
457

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3. Target sizes: [32, 3, 3, 3]. Tensor sizes: [864]

Do you know what's wrong with this? Thank you so much.

When will the code be released?

Looking forward to the code release of "Few-shot Object Detection via Feature Reweighting"

details about the base model and the fine-tuned

I use two gpus, the other configurations is the same as the author, why is my performance poor?

The following figure is the eval of 500 epochs for base model.
The following figure is the eval of 5 epochs for the fine-tune.
The following figure is the eval of 10 epochs for the fine-tune.

@bingykang

How did you select the samples of novel classes?

I think FT performance could be dependent on samples of novel class at Fine-tuning phase.
Did you select the samples totally randomly to produce the paper's result?

Issue training base model

I have been trying to train a base model for some time now.

I have had issues with the version of pytorch the code was built on. 0.3.1 would not work with CUDA versions past 8.0. But my GeGorce RTX 2080 would not work with CUDA versions below 9.0.

I managed to have the code base work with PyTorch 0.4.0 and 0.4.1, with CUDA 10.1.

I have two GPUs, each with 10986MB. I managed to have the base training run for many epochs, but then my whole machine would shut down all of the sudden, through the training. I suspect this is because of my RAM.

I did have to reduce the batch size and subdivisions, to get the training to start.

But this is all to say that I am not able to get a base model, and I am wondering if there is anyone who has a model to share?

I will commit my code for PyTorch >= 0.4.0 soon, on my fork, but it would be so nice to have weights I could use.

After a period of training the proposals become zero

In the few-shot fine-tuning phase, As only k labeled bounding boxes are available for the novel classes, we also include k boxes for each base class.

paper:The second phase is few-shot fine-tuning. In this phase, we train the model on both base and novel classes. As only k labeled bounding boxes are available for the novel classes, to balance between samples from the base and novel classes, we also include k boxes for each base class. (3.2. Learning Scheme Section ).
but the code that Feature Extractor learner label is large data (from train = /home/bykang/voc/voc_train.txt) In the few-shot fine-tuning phase. Do you have any suggestion or solution? can you help me? thanks.

About training on your own data set

I raised this question in region.loss_py when I was training with my own data set. I don't know what flags and ratio stand for. I hope I can get some help from you，thank you

wait your code released..

I hope to see the code soon!

FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'

Anyone have a backup file in your program?
python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23 /home/myh/Documents/program/few-shot-learning/Fewshot_Detection-master/data/coco.names save_interval 10 ['bird', 'bus', 'cow', 'motorbike', 'sofa'] base_ids [0, 1, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19] logging to backup/metayolo_novel0_neg1 Traceback (most recent call last): File "train_meta.py", line 79, in <module> os.mkdir(backupdir) FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'

Hi, few shot tuning

Thank you very much for your code, which is very interesting. But I have a question，
This is the result of my base training

This is the result of my few shot tuning

The MAP of novel class is better , but the base class is worse, is there something wrong?

The classes used in base training

excuse me, '/home/bykang/voc/voc_train.tx' , Does this file contain 20 classes? It includes all the novel samples. Does it need to be 15 classes? all the 'train' samples should be the same with the 'meta' samples ? @bingykang

Where is the query set?

hello, @bingykang ,

I am very interested in this work. I have run the voc_label_1c.py and got the support set for test. However, I wonder how I can get the query set for test. You have mentioned that you used 3 splits for meta training and test. I want to be consistent with you in terms of experimental settings. So can you help me to solve this problem? Thanks very much!

Best Regard
Yukuan Yang

Training about COCO

I've converted the coco_dataset into the voc style, the flag txt in ImageSets for each class and rewrite the label and label_1c for coco which generates labels' txt.

I think it's not easy like this

Actually, the data split for coco is also released in folder "data". Just change the dataset config and number of classes, everything should be good.

Though you give the process_coco.py, it dosen't work, and i think it misses the flag txt for each class

And the batchsize setting:

batch=64
subdivisions=8

will fill with memory and raise the out of memory error.

Now I change the batch_size setting:

batch=8
subdivisions=8

even though the smallest batch_size:

batch=4
subdivisions=4

It can forward successfuly, but i met the same out of memory in backward:

THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train_meta.py", line 344, in <module>
    train(epoch)
  File "train_meta.py", line 242, in train
    loss.backward()
  File "/home/aringsan/anaconda2/envs/pytorch2/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/aringsan/anaconda2/envs/pytorch2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I want to know your coco_setting and your hardware, I use 4 Titan XP whose memory is 12G

Thanks

some questions about detection

hello, after fine tuning , how to detect an picture using the weight file.

FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'

When I run
python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23

A FileNotFoundError occured:
FileNotFoundError: [Errno 2] No such file or directory: 'backup/metayolo_novel0_neg1'

Do you know what's wrong? Thank you so much.

Does the cuda have some require?

Excese me ,thanks for your code,butwhen i run the base train,i have 'a question like that'===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "train_meta.py", line 142, in
model = torch.nn.DataParallel(model).cuda()
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
param.data = fn(param.data)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in
return self._apply(lambda t: t.cuda(device))
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/_utils.py", line 69, in cuda
return new_type(self.size()).copy(self, async)
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/cuda/init.py", line 384, in _lazy_new
_lazy_init()
File "/home/wangning/anaconda3/envs/python27/lib/python2.7/site-packages/torch/cuda/init.py", line 142, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/torch/lib/THC/THCGeneral.c:70'

my cuda is 9.1,is it not fit the python2.7 and torch0.3.1?Looking forward to your reply.Thank you.

Problems for programing details.

Sorry for troubling you, but I don't know how to compute loss for meta-model and feature-extractor.

In my idea, we'll have predict vector with shape (B, N, 13, 13, A, (5+N)) after feature reweighting, where B is batch, N is classes and A is anchors. If so, should I split my ground truth to N vectors according to different classes and compute loss for each channel of N in predict vector?
And the second question is the loss for meta-model and feature-extractor is the same one?

I fondly anticipate your reply, thanks.

Loss becomes nan after few batches

Hello, The number of proposals are reduced to zeros after a few iterations, and later the loss becomes nan. VOC dataset is used for training. Has anyone run into the same problem?

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Sorry for troubling you. When I run python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23,a runtimeerror occured:
Traceback (most recent call last):
File "train_meta.py", line 325, in
train(epoch)
File "train_meta.py", line 218, in train
output = model(data, metax, mask)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 199, in forward
dynamic_weights = self.meta_forward(metax, mask)
File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 122, in meta_forward
metax = model(metax)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/m/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

out of memory for base training

I am reproducing the result using the instruction provided in the README file.

I am training base model with 1 GeForce GTX 1080 Ti with 12GB of memory. I modify batch_size=32.

when it runs about 20 epoches, cuda run time error occurs.

2020-05-29 13:14:00 epoch 20/177, processed 291080 samples, lr 0.000333
291112: nGT 77, recall 66, proposals 235, loss: x 2.222131, y 2.640358, w 2.185382, h 1.743314, conf 52.697956, cls 99.193832, total 160.682968
291144: nGT 77, recall 62, proposals 243, loss: x 1.478266, y 1.245305, w 2.208532, h 0.684470, conf 43.636837, cls 76.594849, total 125.848259
291176: nGT 70, recall 63, proposals 243, loss: x 1.873798, y 1.179447, w 1.839549, h 1.049649, conf 52.927620, cls 101.017876, total 159.887939
291208: nGT 75, recall 67, proposals 175, loss: x 1.820341, y 1.697263, w 1.052775, h 0.799489, conf 50.626663, cls 113.858749, total 169.855286
291240: nGT 105, recall 93, proposals 253, loss: x 3.521058, y 2.495901, w 3.214825, h 2.059216, conf 74.303398, cls 172.366638, total 257.961029
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train_meta.py", line 325, in <module>
    train(epoch)
  File "train_meta.py", line 223, in train
    loss.backward()
  File "/home/super/anaconda3/envs/torch0.3.1/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/super/anaconda3/envs/torch0.3.1/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:5

How can I solve this problem.
Thanks:)

Base Training error

Sorry for troubling you.I follow your instruction and when I run train_meta.py, a runtimeerror occured as follow:

Could you tell me how to solve it?

Some questions about memory usage when training base model

When I trained the base model, I observed that the memory usage continued to rise and finally reached the maximum limit and the process was killed. My RAM is 250GB and I used four 16GB Tesla V100 GPU. I tried to decrease the batch size but it did not make any difference. I wonder how much memory you used during the base training and if there is something wrong I did. Thank you.

Question about the denotations in experimental results

@bingykang Would you mind elaborating a little more on the results, in term of interpretation of them?
For instance in the second row what do "0.5:0.95", "S M L", and the "1 10 100" refer to? Sorry I couldn't seem to find related explanation of these notations in the paper. Let me know if I missed something. Thanks.

Has any one run detect.py ?

Hi,
I've trained the model following the instructions, after finetuning on novel classes, the AP results are as follows:

AP for aeroplane = 0.6602
AP for bicycle = 0.4766
AP for bird = 0.3573
AP for boat = 0.4778
AP for bottle = 0.3153
AP for bus = 0.2155
AP for car = 0.6892
AP for cat = 0.8144
AP for chair = 0.3652
AP for cow = 0.3928
AP for diningtable = 0.5538
AP for dog = 0.6887
AP for horse = 0.6958
AP for motorbike = 0.4364
AP for person = 0.6416
AP for pottedplant = 0.2839
AP for sheep = 0.5383
AP for sofa = 0.3712
AP for train = 0.7394
AP for tvmonitor = 0.6614
~~~~~~~~
Mean AP = 0.5187
Mean Base AP = 0.5734
Mean Novel AP = 0.3546

Then, I want to make predictions using pre-trained weights, but I found it in darknet_dynamic.cfg, classes=1. I modify classes=20 during inference in order to make predictions on VOC.
but the result is worse.

how can I use pre-trained weight to make the right predictions?

thanks:)

CUDA out of memory

I have finished trained the model, but when I tried to evaluate the model, it printed out 'CUDA out of memory'. I have 2 GPUs and 32GB. But I can't use 2GPUs to evaluate. It evaluated with 1GPU every time even though I have changed gpu to "0,1" in valid_ensemble.py. Do you have any suggestion or solution?

Size mismatch, unable to train base model

Hi, sorry to bother you! I ran into the following error when trying to train the base model. I am using pytorch 0.3.1 and python 2.7. I attached the full log of stdout and the modified code to print out the size.

(featurereweight) quan@Bayes:~/few_shot/Fewshot_Detection$ python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23
/home/quan/few_shot/Fewshot_Detection/data/coco.names
('save_interval', 10)
['bird', 'bus', 'cow', 'motorbike', 'sofa']
('base_ids', [0, 1, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19])
logging to backup/metayolo_novel0_neg1
('class_scale', 1)
/home/quan/few_shot/Fewshot_Detection/cfg.py:455: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 dconv 1024 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x1024
31 conv 30 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 30
32 detection

layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 4 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
6 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
7 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
8 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
9 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
10 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
11 max 2 x 2 / 2 13 x 13 x1024 -> 6 x 6 x1024
12 conv 1024 3 x 3 / 1 6 x 6 x1024 -> 6 x 6 x1024
13 glomax 6 x 6 / 1 6 x 6 x1024 -> 1 x 1 x1024
1 14554 80200 64
10
===> Number of samples (before filtring): 4952
===> Number of samples (after filtring): 4952
('num classes: ', 15)
factor: 3.0
===> Number of samples (before filtring): 14554
===> Number of samples (after filtring): 14554
('num classes: ', 15)
2019-12-02 20:30:00 epoch 0/353, processed 0 samples, lr 0.000033
('nA', 5)
('nC', 1)
('nH', 13)
('nW', 13)
('bs', 64)
('cs', 15)
('output.shape', (1280L, 30L, 13L, 13L))
('cls.shape', (1280L, 5L, 6L, 13L, 13L))
('cls.shape', (1280L, 5L, 13L, 13L))
Traceback (most recent call last):
File "train_meta.py", line 325, in
train(epoch)
File "train_meta.py", line 221, in train
loss = region_loss(output, target)
File "/home/quan/miniconda3/envs/featurereweight/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/quan/few_shot/Fewshot_Detection/region_loss.py", line 277, in forward
cls = cls.view(bs, cs, nA * nC * nH * nW).transpose(1, 2).contiguous().view(bs * nA * nC * nH * nW, cs)
RuntimeError: invalid argument 2: size '[64 x 15 x 845]' is invalid for input with 1081600 elements at /opt/conda/conda-bld/pytorch_1518238581238/work/torch/lib/TH/THStorage.c:41

--- Please find below the sequences of print statement

    print('nA', nA)
    print('nC', nC)
    print('nH', nH)
    print('nW', nW)
    print('bs', bs)
    print('cs', cs)

    print('output.shape', output.shape)
    cls = output.view(output.size(0), nA, (5 + nC), nH, nW)

    print('cls.shape', cls.shape)
    cls = cls.index_select(2, Variable(torch.linspace(5, 5 + nC - 1, nC).long().cuda())).squeeze()

    print('cls.shape', cls.shape)
    cls = cls.view(bs, cs, nA * nC * nH * nW).transpose(1, 2).contiguous().view(bs * nA * nC * nH * nW, cs)

    print('cls.shape', cls.shape)

con't find data/metayolo.data file

I cound't find the metayolo.data file in data directory,but i found it in the cfg directory,should i copy this file from cfg to data?

CUDA out of memory

Dimension problem in test()

Hi, thanks for your sharing,
Because I want to evaluate training results each epoch, so I uncomment test(epoch) in the train_meta.py.
But I got the error like this.

And I found the shape of output from model(data, metax, mask).data is (480,30,13,13) and the shape for target is (32,15,250) , so the index error happened. (I changed batch_size to 32)
Do you know how to solve this error? Thanks

Problem about loading weights in coco training

I've prepare the coco dataset (I think it's not easy, i've get each image2class_flag txt and change many things, i spend a long time modifing the label_voc/_1.py to fit coco dataset), it may be work.

But now i met problem just in loading weights
I run python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23, but get:

/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/data/coco.names
('save_interval', 2)
['airplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'dining table', 'dog', 'horse', 'motorcycle', 'person', 'potted plant', 'sheep', 'couch', 'train', 'tv']
('base_ids', [7, 9, 10, 11, 12, 13, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 59, 61, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
logging to backup/metayolo_novel0_neg1
('class_scale', 1)
Traceback (most recent call last):
  File "train_meta.py", line 87, in <module>
    model.load_weights(weightfile)
  File "/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/darknet_meta.py", line 378, in load_weights
    start = load_conv_bn(buf, start, model[0], model[1])
  File "/mnt/Disk1/liangzh/code/Fewshot_Detection_coco/cfg.py", line 461, in load_conv_bn
    conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
RuntimeError: The size of tensor a (3) must match the size of tensor b (864) at non-singleton dimension 3

Thanks

out of memory for fine tuning

I am reproducing the result using the instruction provided in the README file.

I was able to train the base model and obtain AP of 0.6862, which matches what the paper reports. However, when I tried to run the fine-tuning command, the process exits with an out of memory error for the backward pass.

I am training with 4 GeForce GTX 1080 Ti with roughly 12Gb of memory. Did you use GPUs with more memory or is something weird happening?

python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23

A runtime error occured

2019-10-18 23:20:07 epoch 0/353, processed 0 samples, lr 0.000033
Traceback (most recent call last):
  File "train_meta.py", line 325, in <module>    train(epoch)
  File "train_meta.py", line 218, in train    output = model(data, metax, mask)  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__    result = self.forward(*input, **kwargs)  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward    outputs = self.parallel_apply(replicas, inputs, kwargs)  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
    raise output
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
    output = module(*input, **kwargs)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home1/ubuntu/project/Fewshot_Detection/darknet_meta.py", line 199, in forward
    dynamic_weights = self.meta_forward(metax, mask)
  File "/home1/ubuntu/project/Fewshot_Detection/darknet_meta.py", line 122, in meta_forward
    metax = model(metax)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
    self.padding, self.dilation, self.groups)
  File "/home/ubuntu/home1/software/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 89, in conv2d
    torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)

Do you know what's wrong? Thanks

bingykang / fewshot_detection Goto Github PK

fewshot_detection's People

Contributors

Stargazers

Watchers

Forkers

fewshot_detection's Issues

logging to backup/metayolo_novel0_neg1 class_scale 1

Recommend Projects

Recommend Topics

Recommend Org

Jobs

logging to backup/metayolo_novel0_neg1
class_scale 1