GithubHelp home page GithubHelp logo

gengdavid / pytorch-cpn Goto Github PK

View Code? Open in Web Editor NEW
479.0 14.0 102.0 98 KB

A PyTorch re-implementation of CPN (Cascaded Pyramid Network for Multi-Person Pose Estimation)

License: GNU General Public License v3.0

Python 100.00%
pytorch deep-learning deep-neural-networks keypoint-estimation keypoint-localization pose-estimation computer-vision

pytorch-cpn's Introduction

PyTorch CPN(Cascaded Pyramid Network)

This is a PyTorch re-implementation of CPN (Cascaded Pyramid Network), winner of MSCOCO keypoints2017 challenge. The TensorFlow version can be found here, which is implemented by the paper author.

Evaluation results on COCO minival dataset

Method Base Model Input Size BBox AP @0.5:0.95 AP @0.5 AP @0.75 AP medium AP large
CPN ResNet-50 256x192 Ground Truth 71.2 91.4 78.3 68.6 75.2
CPN ResNet-50 256x192 Detection Result 69.2 88.0 76.2 65.8 75.6
CPN ResNet-50 384x288 Ground Truth 74.1 92.5 80.6 70.6 79.5
CPN ResNet-50 384x288 Detection Result 72.2 89.2 78.6 68.1 79.3
CPN ResNet-101* 384x288 Ground Truth 74.0 92.3 80.6 71.1 78.7
CPN ResNet-101* 384x288 Detection Result 72.3 89.2 78.9 68.7 79.1

Thanks Tiamo666 and mingloo for training and testing ResNet-50-384x288CPN model. And thanks Tiamo666 for training and testing ResNet-101-384x288CPN model.
If you have interests in this repo, welcome to test other model configurations together.

* CPN-ResNet-101-384x288 model is fine-tuned from the previous pre-trained model. If you train it from scratch, it should get a higher result.

Usage

For training

  1. Clone the repository
git clone https://github.com/GengDavid/pytorch-cpn

We'll call the directory that you cloned ROOT_DIR.

  1. Download MSCOCO2017 images and annotations from http://cocodataset.org/#download. And put images and annotation files follow the struture showed in data/README.md
    After placing data and annotation files. Please run label_transform.py at ROOT_DIR to transform the annotation fomat.

  2. Initialize cocoapi

git submodule init
git submodule update
cd cocoapi/PythonAPI
make

It will build cocoapi tools automatically.

  1. Install requirement
    This repo require following dependences.
  • PyTorch >= 0.4.1
  • numpy >= 1.7.1
  • scipy >= 0.13.2
  • python-opencv >= 3.3.1
  • tqdm > 4.11.1
  • skimage >= 0.13.1
  1. Training
cd ROOT_DIR/MODEL_DIR/
python3 train.py

For example, to train CPN with input resolution 256x192, just change directory into ROOT_DIR/256.192.model, and run the script.

For more args, see by using

python train.py --help

For Validation

cd ROOT_DIR/MODEL_DIR/
python3 test.py -t PRE-TRAINED_MODEL_NAME

-t meas use which pre-trained model to test.
For more args, see by using

python test.py --help

If you want to test a pre-trained model, please place the pre-trained model into ROOT_DIR/MODEL_DIR/checkpoint directory. Please make sure your have put the corresponding model into the folder.

For example, to run pre-trained CPN model with input resolution 256x192,

python3 test.py -t 'CPN256x192'

This pre-trained model is provided below.

Pre-trained models:

COCO.res50.256x192.CPN (updated!)
COCO.res50.384x288.CPN (updated!)
COCO.res101.384x288.CPN* (new)
* CPN-ResNet-101-384x288 model is fine-tuned from the previous pre-trained model. If you train it from scratch, it should get a higher result.

Detection results on Minival dataset

The detection results are tranformed from results in tf version of cpn.
detection_minival

Acknowledgements

Thanks chenyilun95, bearpaw and last-one for sharing their codes, which helps me a lot to build this repo.
Thanks Tiamo666 for testing ResNet-50-384x288CPN and ResNet-101-384x288CPNmodel.
Thanks mingloo for contribution.
Thanks mkocabas for helping me test other configurations.

Others

If you have any questions or find some mistakes about this re-implementation, please open an issue to let me know.
If you want to know more details about the original implementation, you can check tf version of cpn.

Troubleshooting

  1. Thanks Tiamo666 to point it out that the refineNet is implemented in a different way from the original paper(this can reach a higher results, but it will cost more memory).
  2. See issue #10 and issue #7.
    Codes and results have been updated!(2018/9/6)

Reference

[1] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. CVPR (2018)

pytorch-cpn's People

Contributors

gengdavid avatar mingloo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-cpn's Issues

pre-trained model

Can pre-trained model be loaded during training? I don't see the pretrained model loading code in the train.py

Refine Net

I guess maybe there is some problems in implementation of refine net.
In your refineNet.py, you define the forward pass as follows:
def forward(self, x):
refine_fms = []
for i in range(4):
refine_fms.append(self.cascadei)
out = torch.cat(refine_fms, dim=1)
out = self.final_predict(out)
return out
I think you should inverse the x, eg: x = x[::-1], because x[0] is the smallest feature map, and x[3] is biggest feature map. And there are 3 bottlenecks after smallest feature map , 0 bottleneck after biggest feature map according to paper.

nn.Upsampling( ) and pytorch version

I do appreciate for your excellent work, and I get a warning in my training:
( my PyTorch version is 0.4.1 in experiment )

/root/anaconda3/envs/CPNs/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")

I do some researching, and I find that nn.Upsampling( ) is no longer used in PyTorch>=0.4.1.
It contradicts with the install requirement.
And it seems , the model performance will be reduced if I ignore this warning.

How can I deal with it?

Looking forward to your reply!

Color Normalization Issue

Hi @GengDavid,

I've found the self.pixel_means is changed in every iteration when calling getitem due to modification of mean variable in color_normalize function. As a result, our expected color normalization will not take effect after several samples iterations because the mean value decreases to [0, 0, 0].

This issue infects both training and testing phase.

See detailed printed log below:

checking pixel means init: [122.7717 115.9465 102.9801]
=> loaded checkpoint 'checkpoint/epoch32checkpoint.pth.tar' (epoch 32)
testing...
checking pixel means getitem: [122.7717 115.9465 102.9801]
checking pixel means getitem: [0.48145765 0.45469216 0.40384353]
checking pixel means getitem: [0.00188807 0.00178311 0.0015837 ]
checking pixel means getitem: [7.40419296e-06 6.99257450e-06 6.21058869e-06]
checking pixel means getitem: [2.90360508e-08 2.74218608e-08 2.43552498e-08]

Please help to double check it.

Train.py

I ran train.py with an error that prompted me: Runtime Error: CUDA error: out of memory. I haven't made any changes at present. Why is this problem? How to solve it?

gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$ python3 train.py
Initialize with pre-trained ResNet
successfully load 318 keys
/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Total params: 104.55MB

Epoch: 1 | LR: 0.00050000
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
.....................
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py", line 123, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1985, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: CUDA error: out of memory
gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$

Config.py

Hello, I want to ask the use of parameter "symmetry" in the config file.

About the utils/imutils.py line:41

I think the following code is confusing or not correct,
heatmap /= am / 255
because batchnorm is the last layer of the predict net, the single element of heatmap should within the range of 0-1. I think the code should be correct as followed,
heatmap /= am.
But I'm not totally sure I am right, can you explain it?

an unexpected keyword argument

I ran the train.py,but got a error.'TypeError: init() got an unexpected keyword argument 'align_corners''.It occured at globalNet.py,line 56.
I don't know how to fix it,can you help me?thank you

Results is

I used the weight file of 10 epochs of training to test the image. Why is the accuracy so low? Did I train too little?
2020-10-15 16-23-46屏幕截图

about the test data

Hello, I used your method to train, and then used test.py to test, but found that the test accuracy is very low, I downloaded the pre-training model, tested, still very low.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.188.
I suspect that my test data is wrong, I am using COCO2017/val2017,COCO_2017_val.json,person_keypoints_val2017.json
The data is stored according to data/README.md. I want to ask what your test data is. The test results are so good. Thank you.

questions about test.py

hi, thanks for your pytorch code!
I have seen your code about test a model, but I don't know why 4x and 4y should plus 2(line 115 and 116)?
And I have also seen that tf-version plus 2 too.
Thanks!

if ln > 1e-3:
x += delta * px / ln
y += delta * py / ln
x = max(0, min(x, cfg.output_shape[1] - 1))
y = max(0, min(y, cfg.output_shape[0] - 1))
resy = float((4 * y + 2) / cfg.data_shape[0] * (details[b][3] - details[b][1]) + details[b][1])
resx = float((4 * x + 2) / cfg.data_shape[1] * (details[b][2] - details[b][0]) + details[b][0])
v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])

Image test demo

How to use the pre-training network model to visualize the image test results? Is there a demo?

How can get the high score?

I finish my CPN network like this, but it is only 0.553 mAP. Are there someone could give me advice about it?

about the train.py file line:119

for global_output, label in zip(global_outputs, targets):
num_points = global_output.size()[1]
global_label = label * (valid > 1.1).type(torch.FloatTensor).view(-1, num_points, 1, 1)
global_loss = criterion1(global_output,
torch.autograd.Variable(global_label.cuda(async=True))) / 2.0
loss += global_loss
global_loss_record += global_loss.data.item()
上面的代码是不是有错误:
global_outputs should be reversed?

About using myself val_det

If I want to use myself detection results, should I transform the annotations to some particular format? I have read your code, and it seems that it's not mentioned.

test.py gets stuck when computing output

Hi,

I've followed the instructions in the README thoroughly and have double checked all the steps. However, when running test.py, this line of code:

global_outputs, refine_output = model(input_var)

seems to never finish. In case it simply takes a very long time to run, I've also made a small testing subset of the val2017 folder and the annotations file with just 5 images, yet this line of code still seems to run forever. Upon a keyboard interrupt, the error message seems to have to do with threads acquiring lock (not sure if this is useful to know). Any idea why? Thanks.

About mscocoMulti.py

There are 4 vars: target15 target11 target9 target7
what's the means of them?

about output result

Hi, David, Sorry to bother you.
I am a little confused about the output result of key points.
In the pytorch-cpn/256.192.model/test.py file, line 117, you write output
as follows:
v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])
single_result.append(resx)
single_result.append(resy)
single_result.append(1)
I guess the last 1 stands for the confidence or probability, I think it is not
so reasonable that the probability is always 1.
I guess v_score[p] has similar meaning,
why not use v_score[p] instead.

half of the output predictions are wrong

I am running custom images through this model. All the images have been pre-proceed and cropped as to just include the human, and I've removed all annotation files and hardcoded the information that the annotation files used to provide. Also, I'm only processing images where there is a single person in-frame.

For some reason, half of the output predictions are just wrong--they are a mess. The other half of the output predictions look perfect. The wrong outputs are almost all identical with only slight, barely noticeable differences in joint positions. Also, If I feed in, say, a folder of 1000 images, the predictions on images 1-64 will be perfect, 65-128 will be wrong, 129-192 will be perfect, and 193-256 will be wrong, and this pattern continues. This pattern remains the same regardless of the input data.

Any idea why this is happening? I'm happy to provide more info about the issue. Thanks.

code can only detect one person in one image?

when I print(inputs.size(0)),which in 'test.py' line 83.
I find that output is 128.But test batch size is 128,it means all the 128 images in the val2017 only have one person respectively?I think maybe because the code just detect one person's keypoints in a picture?
And when I want to test my own picture ,I also find that I can only detect one person in a picture even though my picture have three people(because I don't have my own picture's gt_bbox,so I have to reashape my picture myself to fit the code)
So I want to know if the code can just detect one person's keypoints in one picture but don't support multi-persons?

Why the bias at FPN upsamle conv is 'True'?

globalNet.py

    def _upsample(self):
        layers = []
        layers.append(torch.nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True))
        layers.append(torch.nn.Conv2d(256, 256,
            kernel_size=1, stride=1, bias=True))
        layers.append(nn.BatchNorm2d(256))

        return nn.Sequential(*layers)

Different data augmentation with tf-cpn

Thanks for your code.
I find that you use a different data augmentation with tf-cpn.
Did you compare your data augmentation with tf-cpn's?
What is the result?
Thank you another time

The name of pretrained model is misspelled

Hi.

I found one of your pretrained models has wrong name.
The parameter file of COCO.res101.384x288.CPN on Google Drive is CPN101_385x288.pth.tar.

Thanks for your nice work.

RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

While training I am getting this error

<ipython-input-67-b0b9c64f728a> in forward(self, x)
     94         print("")
     95 
---> 96         out += residual
     97 
     98         out = self.relu(out)

RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

The code block is the origin of the error

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample 
        self.stride = stride
 
    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        
        out = self.relu(out)

        return out

When I printed the size of out and residual it was this

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 256, 12, 9])
torch.Size([12, 512, 12, 9])

How can I solve this issue?

Unable to extract pretrained model archive

Hi,

I'm trying to use the pretrained model but none of the tar files seem to be in working order. I get the following error:
An error occurred while extracting files.

I'm using Ubuntu 16.04.

a question about test.py

Thanks for your work.But I have a question at the line 115 in test.py.
Why you use 4y+2 but 4y? Can you explain the meaning of +2 ?Thanks.

Using just GlobalNet

Hello! I want to speed up the testing process, so I'm thinking to use just GlobalNet, without RefineNet. Do you think this could work without losing too much AP? Also how should I do it? Thank you very much!

target.cuda(async=True),在async=True处出现SyntaxError: invalid syntax

相关资料不是很直观,想问下我直接把

        refine_target_var = torch.autograd.Variable(target7.cuda(async=True))
        valid_var = torch.autograd.Variable(valid.cuda(async=True))

改成如下可以吗?

        refine_target_var = torch.autograd.Variable(target7.cuda())
        valid_var = torch.autograd.Variable(valid.cuda())

about the number of the output keypoints?

hi, thanks for your pytorch code!
I have seen your code about test a model,but i dont know why the number of the output keypoints is always 51(17 keypoints), The way I print it is:
if len(single_result) != 0:

                    single_result_dict['image_id'] = int(ids)

                    single_result_dict['category_id'] = 1

                    single_result_dict['keypoints'] = single_result

                    **print(len(single_result_dict['keypoints']))**

                    single_result_dict['score'] = float(det_scores[b]) * v_score.mean()

                    full_result.append(single_result_dict)

And I notice that the keypoints coodr in the resulting file(result.json) don't have zero values.

Training with other configurations.

Hi @GengDavid,

Thanks for the great implementation. I'm eager collaborate with you to test other configurations. I have 2 x 1080 and 2 x 1080ti. I can borrow more if needed. Looking forward to your response!

mobilenet is not fast

thanks for your code.
when i replace resnet with mobilenet,i find the speed of model is slower..
i'm so confused . i run model in GPU(titan X)
do you know the reason?
the following is my code:

def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
return nn.Sequential(
nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),

    nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
    nn.BatchNorm2d(out_channels),
    nn.ReLU(inplace=True),
)

Other joints

Awesome repo!

I've trained using ground truth boxes and it works okay but the ap is just lower.

Do you think that the the detection box should affect training seeing as it's increase so significantly in the image preprocessing?

Do you think this network architecture needs to be improved if I'm using more joints than just COCO?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.