haomood / bilinear-cnn Goto Github PK

PyTorch implementation of bilinear CNN for fine-grained image recognition

License: GNU General Public License v3.0

Python 100.00%

bilinear-cnn's Introduction

               Bilinear CNN (B-CNN) for Fine-grained recognition


DESCRIPTIONS
    After getting the deep descriptors of an image, bilinear pooling computes
    the sum of the outer product of those deep descriptors. Bilinear pooling
    captures all pairwise descriptor interactions, i.e., interactions of
    different part, in a translational invariant manner.

    B-CNN provides richer representations than linear models, and B-CNN achieves
    better performance than part-based fine-grained models with no need for
    further part annotation.
    
    Please note that this repo is relative old, which is writen in PyTorch 0.3.0.
    If you are using newer version of PyTorch (say, >=0.4.0), it is suggested to
    consider using this repo https://github.com/HaoMood/blinear-cnn-faster instead.


REFERENCE
    T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for
    fine-grained visual recognition. In Proceedings of the IEEE International
    Conference on Computer Vision, pages 1449--1457, 2015.


PREREQUIREMENTS
    Python3.6 with Numpy supported
    PyTorch


LAYOUT
    ./data/                 # Datasets
    ./doc/                  # Automatically generated documents
    ./src/                  # Source code


USAGE
    Step 1. Fine-tune the fc layer only. It gives 76.77% test set accuracy.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_fc.py --base_lr 1.0 \
          --batch_size 64 --epochs 55 --weight_decay 1e-8 \
          | tee "[fc-] base_lr_1.0-weight_decay_1e-8-epoch_.log"

    Step 2. Fine-tune all layers. It gives 84.17% test set accuracy.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_all.py --base_lr 1e-2 \
          --batch_size 64 --epochs 25 --weight_decay 1e-5 \
          --model "model.pth" \
          | tee "[all-] base_lr_1e-2-weight_decay_1e-5-epoch_.log"


AUTHOR
    Hao Zhang: [email protected]


LICENSE
    CC BY-SA 3.0

bilinear-cnn's People

Contributors

Stargazers

Watchers

Forkers

zbxzc35 nsl2014fm irvingshu yzspku coderhhx jasminesmay chen94yue poemlin jxlijunhao shlpu grseb9s ywz1993 zwx8981 ixhorse holyhao dingrizhi tqdavid touchylk haofusheng ml-lab mgq1507 codes-kzhan neulrl daijucug wh-forker itbeyond1230 yutinghu amitayus johndan070 githubpgq miss993 xmumath ewyuanzhang l475139105 mathpopo pinglmlcv julienyulinma bflfanbo mygit007hub fuaiguo nikhilrangarajan prantikhowlader szprestonhuang kmfeng barbecacov snowstu answerlinyi mylszd moulicm111 ttl518 shadowclouds swtju14 yutingchi kar98kbang liyang328 tommy3266 hiteyoshi sunshiding wanqiansucceed jxaaron sofine2000sf2k aleng1215 nashsu xrosliang jiangbingqing duanyaohui zhangyingyue hungkyun salieri-foster xw-666 cvsch liujingyao203081 yanzhicong naixinlu hongbo-sun 521lx cyx727 cxchenxi-0313 nobelvictory herry66 wcz1997 qtjiebin catalictic zouyunkai mengzmd

bilinear-cnn's Issues

Hi

Hi，I just used your class BCNN as a module, but what i get is the same classification result of different images.
Is there something wrong?

here is the output of predict class and truct class

predict_class: tensor([61, 61, 61, 61, 61, 61, 61, 61], device='cuda:0')
truth_class: tensor([180, 151, 187, 33, 70, 36, 109, 54], device='cuda:0')

the training process:

            data = data.to(opt.device)
            label = (label).to(opt.device)
            optimizer.zero_grad()
            score  = bcnn_model(data)
            loss = criterion(score,label)
            loss.backward()
            optimizer.step()

How to get the mean and variance of the data normalize transform in your code?

Hi, I am confused about a question in your code. The mean and variance of the data normalize transform in your code is [(0.485, 0.456, 0.406), (0.229, 0.224, 0.225)]. But I computed the mean and variance of train data and got [(0.4856, 0.4994, 0.4324), (0.1817, 0.1811, 0.1927)]. Then I used the mean and variance I computed, but the test accuracy of the result I got is lower than yours. First, I thought maybe you used mean and variance of the whole dataset, then I computed it but got a result very close to what I got before. So can you tell me how to get the mean and std of the data normalize transform in your code? Thank you!

out of memory

Hi! After 2 epochs the backward runs out of memory :( First epoch its okey but then crash on second one. It seems that stores the graph or somethin but I change some things and crash:

`
for X, y in self._train_loader:
# Data.

            # Clear the existing gradients.
            X = X.cuda()
            y = y.cuda()

            # Forward pass.
            score = self._net(X)
            loss = self._criterion(score, y.long())

            with torch.no_grad():
                epoch_loss += loss.item()
                # Prediction.
                prediction = torch.argmax(score, dim=1)
                num_total += y.size(0)
                num_correct += torch.sum(prediction == y.long()).item()

            # Backward pass.
            self._optimizer.zero_grad()
            loss.backward()
            self._optimizer.step()
            
            total_batches+=1
            del X, y, score, loss, prediction

关于双线性汇合特征的规范化

请问torch.sqrt与F.normalize的作用分别是什么？
谢谢。

maybe you could offer bilinear cnn of resnet?

Why there is one feature extractor in BCNN class?

Hi, Thank you for your code.

I wonder why there's only one feature extractor in BCNN class.
I think BCNN has two feature extractor, can you explain please?

Thank you

download the model.pth

Thank you very much for your code! But where can I find that model for fine-tuning? Or it need to be trained by myself?

About step1 and step2

Hi, is stetp2 's network based on step1's FC parameters or just training a vgg16 net from scratch?

About step 2

@HaoMood ，你好，非常感谢你的工作。
我在跑你代码时第一步的test acc是76%，保存的最好结果的model是vgg_16_epoch_21.pth
但是在step 2中load 这个21.pth 的model得到的train acc 是1%， test acc 是0.
请问这是什么问题？

In the Step 1, It gives around 56% test set accuracy. I have tried it two times.

confusion about the parameters of Normalize

Regarding line 129-130 in bilinear_cnn_fc.py, I'm confused about the magic number in

Normalize(mean=(0.485, 0.456, 0.406),
                 std=(0.229, 0.224, 0.225))

where are these numbers from?

A question about the result of this model

Hello，Could you achieve the result as the paper said, if I may ask?

"model" in `bilinear_cnn_all.py`

How do we obtain the model.pth file for fine tuning all the layers ?

Cannot reproduce accuracy 84% (after step2)

Hi Hao,

Thank you for a neat implementation.

I wonde if training with the hyperparameters written in README

 --base_lr 1e-2 \
 --batch_size 64 --epochs 25 --weight_decay 1e-5 \
 --model "model.pth"

gives 84.17% test accuracy?

I used exactly the commads which you provide in the README:

    Step 1.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_fc.py --base_lr 1.0 \
          --batch_size 64 --epochs 55 --weight_decay 1e-8 \
          | tee "[fc-] base_lr_1.0-weight_decay_1e-8-epoch_.log"

    Step 2. 
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_all.py --base_lr 1e-2 \
          --batch_size 64 --epochs 25 --weight_decay 1e-5 \
          --model "model.pth" \
          | tee "[all-] base_lr_1e-2-weight_decay_1e-5-epoch_.log"

I have trained step1 model and got 76.67% accuracy on test. I use this as initialization for step2 model and finetune all the layers further. But the accuracy saturates at 76.61% and doesn't grow further.

Are there any extra tricks to get the desired performance?

外积问题

X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear

特征图A的尺寸为(C,M)，B的尺寸为（C,N）
论文中提到
If fA and fB extract features of size C ×M and
C ×N respectively, then Φ(I) is of size M × N.
但是按照您的写法，这个结果是C × C
但是论文experiment部分似乎结果也是您的512*512，即C × C
我很困惑，望您能解答以下，谢谢。

About memory

In your README, I see you used 4 gpus. So, how much memory has been used totally in your step1?

question about the bilinear pooling operation

in the forward function of BCNN class, the bilinear operation is

X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear

why does it require the result of matrix multiplication being divided by (28 ** 2)?

请问，这个PCA要怎么加上去呢？

bilinear sqrt with sign

Hi, it is a concise and useful code for bilinear CNNs, however, from the paper I read about the
" elementwise signed square-root (x ← sign(x)􏰊|x|) and l2 normalization is applied to the matrix A"
which means it should be multiplied by the sign. But in this code just "X = torch.sqrt(X + 1e-5)"

Am I missing something? and even this not same thoroughly, I got the same result (84.2%) which suggests it should be a right answer?

zombie process when using multiple gpu

hi, thanks a lot for your code! Everything works well when I only use one gpu by setting cuda_visible_devices=0 (for example), but when I use multiple gpus by setting cuda_visible_devices=0,1 (for example), the process will become a zombie process, which means it is not actually training, but it still holds the gpu and cpu resources. What's the worst is, you even cannot kill it through "kill -9 PID". The only thing you can do is a reboot. Have u come across the same issue before? Thanks a lot!

Signed square root

Hi Hao

First of all thanks for the excellent implementation. I have used the code here as a reference for my own implementations.

In the original paper (http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf) the authors have used signed square root operation. Something like:

X = torch.mul(torch.sign(X),torch.sqrt(torch.abs(X)+1e-5))

instead of the normal square root you have used X = torch.sqrt(X + 1e-5)

Was there a particular reason for using this ?

haomood / bilinear-cnn Goto Github PK

bilinear-cnn's Introduction

bilinear-cnn's People

Contributors

Stargazers

Watchers

Forkers

bilinear-cnn's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs