GithubHelp home page GithubHelp logo

haomood / bilinear-cnn Goto Github PK

View Code? Open in Web Editor NEW
390.0 6.0 86.0 38 KB

PyTorch implementation of bilinear CNN for fine-grained image recognition

License: GNU General Public License v3.0

Python 100.00%

bilinear-cnn's Introduction

               Bilinear CNN (B-CNN) for Fine-grained recognition


DESCRIPTIONS
    After getting the deep descriptors of an image, bilinear pooling computes
    the sum of the outer product of those deep descriptors. Bilinear pooling
    captures all pairwise descriptor interactions, i.e., interactions of
    different part, in a translational invariant manner.

    B-CNN provides richer representations than linear models, and B-CNN achieves
    better performance than part-based fine-grained models with no need for
    further part annotation.
    
    Please note that this repo is relative old, which is writen in PyTorch 0.3.0.
    If you are using newer version of PyTorch (say, >=0.4.0), it is suggested to
    consider using this repo https://github.com/HaoMood/blinear-cnn-faster instead.


REFERENCE
    T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for
    fine-grained visual recognition. In Proceedings of the IEEE International
    Conference on Computer Vision, pages 1449--1457, 2015.


PREREQUIREMENTS
    Python3.6 with Numpy supported
    PyTorch


LAYOUT
    ./data/                 # Datasets
    ./doc/                  # Automatically generated documents
    ./src/                  # Source code


USAGE
    Step 1. Fine-tune the fc layer only. It gives 76.77% test set accuracy.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_fc.py --base_lr 1.0 \
          --batch_size 64 --epochs 55 --weight_decay 1e-8 \
          | tee "[fc-] base_lr_1.0-weight_decay_1e-8-epoch_.log"

    Step 2. Fine-tune all layers. It gives 84.17% test set accuracy.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_all.py --base_lr 1e-2 \
          --batch_size 64 --epochs 25 --weight_decay 1e-5 \
          --model "model.pth" \
          | tee "[all-] base_lr_1e-2-weight_decay_1e-5-epoch_.log"


AUTHOR
    Hao Zhang: [email protected]


LICENSE
    CC BY-SA 3.0

bilinear-cnn's People

Contributors

haomood avatar zhosteven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bilinear-cnn's Issues

Hi

Hi,I just used your class BCNN as a module, but what i get is the same classification result of different images.
Is there something wrong?

here is the output of predict class and truct class

predict_class: tensor([61, 61, 61, 61, 61, 61, 61, 61], device='cuda:0')
truth_class: tensor([180, 151, 187, 33, 70, 36, 109, 54], device='cuda:0')

the training process:

            data = data.to(opt.device)
            label = (label).to(opt.device)
            optimizer.zero_grad()
            score  = bcnn_model(data)
            loss = criterion(score,label)
            loss.backward()
            optimizer.step()

How to get the mean and variance of the data normalize transform in your code?

Hi, I am confused about a question in your code. The mean and variance of the data normalize transform in your code is [(0.485, 0.456, 0.406), (0.229, 0.224, 0.225)]. But I computed the mean and variance of train data and got [(0.4856, 0.4994, 0.4324), (0.1817, 0.1811, 0.1927)]. Then I used the mean and variance I computed, but the test accuracy of the result I got is lower than yours. First, I thought maybe you used mean and variance of the whole dataset, then I computed it but got a result very close to what I got before. So can you tell me how to get the mean and std of the data normalize transform in your code? Thank you!

out of memory

Hi! After 2 epochs the backward runs out of memory :( First epoch its okey but then crash on second one. It seems that stores the graph or somethin but I change some things and crash:

`
for X, y in self._train_loader:
# Data.

            # Clear the existing gradients.
            X = X.cuda()
            y = y.cuda()

            # Forward pass.
            score = self._net(X)
            loss = self._criterion(score, y.long())

            with torch.no_grad():
                epoch_loss += loss.item()
                # Prediction.
                prediction = torch.argmax(score, dim=1)
                num_total += y.size(0)
                num_correct += torch.sum(prediction == y.long()).item()

            # Backward pass.
            self._optimizer.zero_grad()
            loss.backward()
            self._optimizer.step()
            
            total_batches+=1
            del X, y, score, loss, prediction

`

download the model.pth

Thank you very much for your code! But where can I find that model for fine-tuning? Or it need to be trained by myself?

About step1 and step2

Hi, is stetp2 's network based on step1's FC parameters or just training a vgg16 net from scratch?

About step 2

@HaoMood ,你好,非常感谢你的工作。
我在跑你代码时第一步的test acc是76%,保存的最好结果的model是vgg_16_epoch_21.pth
但是在step 2中load 这个21.pth 的model得到的train acc 是1%, test acc 是0.
请问这是什么问题?

confusion about the parameters of Normalize

Regarding line 129-130 in bilinear_cnn_fc.py, I'm confused about the magic number in

Normalize(mean=(0.485, 0.456, 0.406),
                 std=(0.229, 0.224, 0.225))

where are these numbers from?

Cannot reproduce accuracy 84% (after step2)

Hi Hao,

Thank you for a neat implementation.

I wonde if training with the hyperparameters written in README

 --base_lr 1e-2 \
 --batch_size 64 --epochs 25 --weight_decay 1e-5 \
 --model "model.pth" 

gives 84.17% test accuracy?

I used exactly the commads which you provide in the README:

    Step 1.
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_fc.py --base_lr 1.0 \
          --batch_size 64 --epochs 55 --weight_decay 1e-8 \
          | tee "[fc-] base_lr_1.0-weight_decay_1e-8-epoch_.log"

    Step 2. 
    $ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_all.py --base_lr 1e-2 \
          --batch_size 64 --epochs 25 --weight_decay 1e-5 \
          --model "model.pth" \
          | tee "[all-] base_lr_1e-2-weight_decay_1e-5-epoch_.log"

I have trained step1 model and got 76.67% accuracy on test. I use this as initialization for step2 model and finetune all the layers further. But the accuracy saturates at 76.61% and doesn't grow further.

Are there any extra tricks to get the desired performance?

外积问题

X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear

特征图A的尺寸为(C,M),B的尺寸为(C,N)
论文中提到
If fA and fB extract features of size C ×M and
C ×N respectively, then Φ(I) is of size M × N.
但是按照您的写法,这个结果是C × C
但是论文experiment部分似乎结果也是您的512*512,即C × C
我很困惑,望您能解答以下,谢谢。

About memory

In your README, I see you used 4 gpus. So, how much memory has been used totally in your step1?

question about the bilinear pooling operation

in the forward function of BCNN class, the bilinear operation is

X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear

why does it require the result of matrix multiplication being divided by (28 ** 2)?

bilinear sqrt with sign

Hi, it is a concise and useful code for bilinear CNNs, however, from the paper I read about the
" elementwise signed square-root (x ← sign(x)􏰊|x|) and l2 normalization is applied to the matrix A"
which means it should be multiplied by the sign. But in this code just "X = torch.sqrt(X + 1e-5)"

Am I missing something? and even this not same thoroughly, I got the same result (84.2%) which suggests it should be a right answer?

zombie process when using multiple gpu

hi, thanks a lot for your code! Everything works well when I only use one gpu by setting cuda_visible_devices=0 (for example), but when I use multiple gpus by setting cuda_visible_devices=0,1 (for example), the process will become a zombie process, which means it is not actually training, but it still holds the gpu and cpu resources. What's the worst is, you even cannot kill it through "kill -9 PID". The only thing you can do is a reboot. Have u come across the same issue before? Thanks a lot!

Signed square root

Hi Hao

First of all thanks for the excellent implementation. I have used the code here as a reference for my own implementations.

In the original paper (http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf) the authors have used signed square root operation. Something like:

X = torch.mul(torch.sign(X),torch.sqrt(torch.abs(X)+1e-5))

instead of the normal square root you have used X = torch.sqrt(X + 1e-5)

Was there a particular reason for using this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.