gengdavid / pytorch-cpn Goto Github PK

A PyTorch re-implementation of CPN (Cascaded Pyramid Network for Multi-Person Pose Estimation)

License: GNU General Public License v3.0

Python 100.00%

pytorch deep-learning deep-neural-networks keypoint-estimation keypoint-localization pose-estimation computer-vision

pytorch-cpn's Introduction

PyTorch CPN(Cascaded Pyramid Network)

This is a PyTorch re-implementation of CPN (Cascaded Pyramid Network), winner of MSCOCO keypoints2017 challenge. The TensorFlow version can be found here, which is implemented by the paper author.

Evaluation results on COCO minival dataset

Method	Base Model	Input Size	BBox	AP @0.5:0.95	AP @0.5	AP @0.75	AP medium	AP large
CPN	ResNet-50	256x192	Ground Truth	71.2	91.4	78.3	68.6	75.2
CPN	ResNet-50	256x192	Detection Result	69.2	88.0	76.2	65.8	75.6
CPN	ResNet-50	384x288	Ground Truth	74.1	92.5	80.6	70.6	79.5
CPN	ResNet-50	384x288	Detection Result	72.2	89.2	78.6	68.1	79.3
CPN	ResNet-101^*	384x288	Ground Truth	74.0	92.3	80.6	71.1	78.7
CPN	ResNet-101^*	384x288	Detection Result	72.3	89.2	78.9	68.7	79.1

Thanks Tiamo666 and mingloo for training and testing ResNet-50-384x288CPN model. And thanks Tiamo666 for training and testing ResNet-101-384x288CPN model.
If you have interests in this repo, welcome to test other model configurations together.

* CPN-ResNet-101-384x288 model is fine-tuned from the previous pre-trained model. If you train it from scratch, it should get a higher result.

Usage

For training

Clone the repository

git clone https://github.com/GengDavid/pytorch-cpn

We'll call the directory that you cloned ROOT_DIR.

Download MSCOCO2017 images and annotations from http://cocodataset.org/#download. And put images and annotation files follow the struture showed in data/README.md
After placing data and annotation files. Please run label_transform.py at ROOT_DIR to transform the annotation fomat.
Initialize cocoapi

git submodule init
git submodule update
cd cocoapi/PythonAPI
make

It will build cocoapi tools automatically.

Install requirement
This repo require following dependences.

PyTorch >= 0.4.1
numpy >= 1.7.1
scipy >= 0.13.2
python-opencv >= 3.3.1
tqdm > 4.11.1
skimage >= 0.13.1

Training

cd ROOT_DIR/MODEL_DIR/
python3 train.py

For example, to train CPN with input resolution 256x192, just change directory into ROOT_DIR/256.192.model, and run the script.

For more args, see by using

python train.py --help

For Validation

cd ROOT_DIR/MODEL_DIR/
python3 test.py -t PRE-TRAINED_MODEL_NAME

-t meas use which pre-trained model to test.
For more args, see by using

python test.py --help

If you want to test a pre-trained model, please place the pre-trained model into ROOT_DIR/MODEL_DIR/checkpoint directory. Please make sure your have put the corresponding model into the folder.

For example, to run pre-trained CPN model with input resolution 256x192,

python3 test.py -t 'CPN256x192'

This pre-trained model is provided below.

Pre-trained models:

COCO.res50.256x192.CPN (updated!)
COCO.res50.384x288.CPN (updated!)
COCO.res101.384x288.CPN^* (new)
* CPN-ResNet-101-384x288 model is fine-tuned from the previous pre-trained model. If you train it from scratch, it should get a higher result.

Detection results on Minival dataset

The detection results are tranformed from results in tf version of cpn.
detection_minival

Acknowledgements

Thanks chenyilun95, bearpaw and last-one for sharing their codes, which helps me a lot to build this repo.
Thanks Tiamo666 for testing ResNet-50-384x288CPN and ResNet-101-384x288CPNmodel.
Thanks mingloo for contribution.
Thanks mkocabas for helping me test other configurations.

Others

If you have any questions or find some mistakes about this re-implementation, please open an issue to let me know.
If you want to know more details about the original implementation, you can check tf version of cpn.

Troubleshooting

Thanks Tiamo666 to point it out that the refineNet is implemented in a different way from the original paper(this can reach a higher results, but it will cost more memory).
See issue #10 and issue #7.
Codes and results have been updated!(2018/9/6)

Reference

[1] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. CVPR (2018)

pytorch-cpn's People

Contributors

Stargazers

Watchers

Forkers

hibiscuses curious999 hzhang57 liudaizong wjgaas baiyancheng20 xizero00 jkznst lhh17 zbxzc35 chengyedut firedfree icewinechen 40ksoul zhjpqq conansherry 738654805 nmxnql wangzheallen bowrian ltxue feng-leaf robinwenqian baipdiw hyzcn jjdblast zengai leeyangg gyw0228 gavin666github akumar14 xingliujia jcjs baiti01 lwx0724 sj-li yangzhengzheng cinastanbean zlou rrbarioni mtkshu binhmuc code-conquer lxtgh xtczq louisnust 1115146632 heathhose albertter t0mlane ancy397031272 zhaowujie aidarikako jkang94 uestcjackey fraduq troyliu0105 nicolexie wltongxing chenyouxin113 wishgale bennnun crophone peternara liuwenhaha mornydew fork-items freegliboracle gritcs elb3k wangdong0556 alanfool ntucschen ms0521976 fudigeng cpegasus dongzhongxian123 chans1997 doubtfire009 beteixz tangbingyu lainxx yangtao47 yifei87 visualvk debugallthetime djy1989 tensorsong kjw9899 davidpengiupui yooyeah jklujklu lvpchen z1weicui yohanshin wenyux nightmare4214 kyky233 ismailok

pytorch-cpn's Issues

half of the output predictions are wrong

I am running custom images through this model. All the images have been pre-proceed and cropped as to just include the human, and I've removed all annotation files and hardcoded the information that the annotation files used to provide. Also, I'm only processing images where there is a single person in-frame.

For some reason, half of the output predictions are just wrong--they are a mess. The other half of the output predictions look perfect. The wrong outputs are almost all identical with only slight, barely noticeable differences in joint positions. Also, If I feed in, say, a folder of 1000 images, the predictions on images 1-64 will be perfect, 65-128 will be wrong, 129-192 will be perfect, and 193-256 will be wrong, and this pattern continues. This pattern remains the same regardless of the input data.

Any idea why this is happening? I'm happy to provide more info about the issue. Thanks.

Color Normalization Issue

Hi @GengDavid,

I've found the self.pixel_means is changed in every iteration when calling getitem due to modification of mean variable in color_normalize function. As a result, our expected color normalization will not take effect after several samples iterations because the mean value decreases to [0, 0, 0].

This issue infects both training and testing phase.

See detailed printed log below:

checking pixel means init: [122.7717 115.9465 102.9801]
=> loaded checkpoint 'checkpoint/epoch32checkpoint.pth.tar' (epoch 32)
testing...
checking pixel means getitem: [122.7717 115.9465 102.9801]
checking pixel means getitem: [0.48145765 0.45469216 0.40384353]
checking pixel means getitem: [0.00188807 0.00178311 0.0015837 ]
checking pixel means getitem: [7.40419296e-06 6.99257450e-06 6.21058869e-06]
checking pixel means getitem: [2.90360508e-08 2.74218608e-08 2.43552498e-08]

Please help to double check it.

mobilenet is not fast

thanks for your code.
when i replace resnet with mobilenet,i find the speed of model is slower..
i'm so confused . i run model in GPU(titan X)
do you know the reason?
the following is my code:

def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
return nn.Sequential(
nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),

    nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
    nn.BatchNorm2d(out_channels),
    nn.ReLU(inplace=True),
)

questions about test.py

hi, thanks for your pytorch code!
I have seen your code about test a model, but I don't know why 4x and 4y should plus 2(line 115 and 116)?
And I have also seen that tf-version plus 2 too.
Thanks!

pytorch-cpn/256.192.model/test.py

Lines 110 to 117 in 48696a9

 if ln > 1e-3: 

 x += delta * px / ln 

 y += delta * py / ln 

 x = max(0, min(x, cfg.output_shape[1] - 1)) 

 y = max(0, min(y, cfg.output_shape[0] - 1)) 

 resy = float((4 * y + 2) / cfg.data_shape[0] * (details[b][3] - details[b][1]) + details[b][1]) 

 resx = float((4 * x + 2) / cfg.data_shape[1] * (details[b][2] - details[b][0]) + details[b][0]) 

 v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])

关于Evaluation results on COCO minival dataset

请问下Detection Result和 Detection Result的区别是什么呢？

about the test data

Hello, I used your method to train, and then used test.py to test, but found that the test accuracy is very low, I downloaded the pre-training model, tested, still very low.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.188.
I suspect that my test data is wrong, I am using COCO2017/val2017,COCO_2017_val.json,person_keypoints_val2017.json
The data is stored according to data/README.md. I want to ask what your test data is. The test results are so good. Thank you.

Unable to extract pretrained model archive

Hi,

I'm trying to use the pretrained model but none of the tar files seem to be in working order. I get the following error:
An error occurred while extracting files.

I'm using Ubuntu 16.04.

Refine Net

I guess maybe there is some problems in implementation of refine net.
In your refineNet.py, you define the forward pass as follows:
def forward(self, x):
refine_fms = []
for i in range(4):
refine_fms.append(self.cascadei)
out = torch.cat(refine_fms, dim=1)
out = self.final_predict(out)
return out
I think you should inverse the x, eg: x = x[::-1], because x[0] is the smallest feature map, and x[3] is biggest feature map. And there are 3 bottlenecks after smallest feature map , 0 bottleneck after biggest feature map according to paper.

target.cuda(async=True),在async=True处出现SyntaxError: invalid syntax

相关资料不是很直观，想问下我直接把

        refine_target_var = torch.autograd.Variable(target7.cuda(async=True))
        valid_var = torch.autograd.Variable(valid.cuda(async=True))

改成如下可以吗？

        refine_target_var = torch.autograd.Variable(target7.cuda())
        valid_var = torch.autograd.Variable(valid.cuda())

where is mptest.py in “python3 mptest.py -t 'CPN256x192'”

pre-trained model

Can pre-trained model be loaded during training? I don't see the pretrained model loading code in the train.py

some question about the human detector.

in this paper, I see that CPN used the image cropped in raw image with the bounding box of FPN as input. Where is the FPN in your reimplement ?

Config.py

Hello, I want to ask the use of parameter "symmetry" in the config file.

Has anyone trained on the MPII dataset?

Ye

https://github.com/elixir-lsp/vscode-elixir-ls/blob/b645862891c3d8c92b0a286848be8a999f29072b/src/test/suite/index.ts

The name of pretrained model is misspelled

Hi.

I found one of your pretrained models has wrong name.
The parameter file of COCO.res101.384x288.CPN on Google Drive is CPN101_385x288.pth.tar.

Thanks for your nice work.

about the train.py file line:119

for global_output, label in zip(global_outputs, targets):
num_points = global_output.size()[1]
global_label = label * (valid > 1.1).type(torch.FloatTensor).view(-1, num_points, 1, 1)
global_loss = criterion1(global_output,
torch.autograd.Variable(global_label.cuda(async=True))) / 2.0
loss += global_loss
global_loss_record += global_loss.data.item()
上面的代码是不是有错误：
global_outputs should be reversed?

where is the file of "COCO_2017_train.json" ,"COCO_2017_val.json", "val_dets.json"? i can't find them in the coco dataset

FileNotFoundError: [Errno 2] No such file or directory: '/media/agent/data/xcj/pytorch-cpn/256.192.model/../data/COCO2017/annotations/COCO_2017_train.json'

Using just GlobalNet

Hello! I want to speed up the testing process, so I'm thinking to use just GlobalNet, without RefineNet. Do you think this could work without losing too much AP? Also how should I do it? Thank you very much!

Other joints

Awesome repo!

I've trained using ground truth boxes and it works okay but the ap is just lower.

Do you think that the the detection box should affect training seeing as it's increase so significantly in the image preprocessing?

Do you think this network architecture needs to be improved if I'm using more joints than just COCO?

I think this code have a erro about makefile

When I use command make, the process will say a error
the error is that gcc: error: pycocotools/_mask.c: No such file or directory
So how can I solver this problem?

About the structure of refineNet

pytorch-cpn/networks/refineNet.py

Line 65 in 48696a9

def _predict(self, input_channel, num_class):

Why the final layer appends a BN layer? Why the output is normalized?
Could you give me a hint? Thank you!

About using myself val_det

If I want to use myself detection results, should I transform the annotations to some particular format? I have read your code, and it seems that it's not mentioned.

how can i run the code， and see the result on my picture？

@YoungZiyu

@YoungZiyu
Sure, you can find training log here

Originally posted by @GengDavid in #3 (comment)

Yeet

https://github.com/philips77/antidote/tree/master

Training with other configurations.

Hi @GengDavid,

Thanks for the great implementation. I'm eager collaborate with you to test other configurations. I have 2 x 1080 and 2 x 1080ti. I can borrow more if needed. Looking forward to your response!

How can get the high score?

I finish my CPN network like this, but it is only 0.553 mAP. Are there someone could give me advice about it?

nn.Upsampling( ) and pytorch version

I do appreciate for your excellent work, and I get a warning in my training:
( my PyTorch version is 0.4.1 in experiment )

/root/anaconda3/envs/CPNs/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")

I do some researching, and I find that nn.Upsampling( ) is no longer used in PyTorch>=0.4.1.
It contradicts with the install requirement.
And it seems , the model performance will be reduced if I ignore this warning.

How can I deal with it?

Looking forward to your reply!

Image test demo

How to use the pre-training network model to visualize the image test results? Is there a demo?

Why the bias at FPN upsamle conv is 'True'?

globalNet.py

    def _upsample(self):
        layers = []
        layers.append(torch.nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True))
        layers.append(torch.nn.Conv2d(256, 256,
            kernel_size=1, stride=1, bias=True))
        layers.append(nn.BatchNorm2d(256))

        return nn.Sequential(*layers)

about the cpu utilized percent

When I run the train. py ,the cpu utilized percent is more than 300%. How can I solve the problem?

About mscocoMulti.py

There are 4 vars: target15 target11 target9 target7
what's the means of them?

test.py gets stuck when computing output

Hi,

I've followed the instructions in the README thoroughly and have double checked all the steps. However, when running test.py, this line of code:

global_outputs, refine_output = model(input_var)

seems to never finish. In case it simply takes a very long time to run, I've also made a small testing subset of the val2017 folder and the annotations file with just 5 images, yet this line of code still seems to run forever. Upon a keyboard interrupt, the error message seems to have to do with threads acquiring lock (not sure if this is useful to know). Any idea why? Thanks.

code can only detect one person in one image?

when I print(inputs.size(0)),which in 'test.py' line 83.
I find that output is 128.But test batch size is 128,it means all the 128 images in the val2017 only have one person respectively？I think maybe because the code just detect one person's keypoints in a picture?
And when I want to test my own picture ,I also find that I can only detect one person in a picture even though my picture have three people(because I don't have my own picture's gt_bbox,so I have to reashape my picture myself to fit the code)
So I want to know if the code can just detect one person's keypoints in one picture but don't support multi-persons?

about output result

Hi, David, Sorry to bother you.
I am a little confused about the output result of key points.
In the pytorch-cpn/256.192.model/test.py file, line 117, you write output
as follows:
v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])
single_result.append(resx)
single_result.append(resy)
single_result.append(1)
I guess the last 1 stands for the confidence or probability, I think it is not
so reasonable that the probability is always 1.
I guess v_score[p] has similar meaning,
why not use v_score[p] instead.

Achieve the accuracy of the paper

good job
retrain and Achieve the accuracy of the paper？

Train.py

I ran train.py with an error that prompted me: Runtime Error: CUDA error: out of memory. I haven't made any changes at present. Why is this problem? How to solve it?

gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$ python3 train.py
Initialize with pre-trained ResNet
successfully load 318 keys
/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Total params: 104.55MB

Epoch: 1 | LR: 0.00050000
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
.....................
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py", line 123, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1985, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: CUDA error: out of memory
gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$

a question about test.py

Thanks for your work.But I have a question at the line 115 in test.py.
Why you use 4y+2 but 4y? Can you explain the meaning of +2 ?Thanks.

RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

While training I am getting this error

<ipython-input-67-b0b9c64f728a> in forward(self, x)
     94         print("")
     95 
---> 96         out += residual
     97 
     98         out = self.relu(out)

RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

The code block is the origin of the error

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample 
        self.stride = stride
 
    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        
        out = self.relu(out)

        return out

When I printed the size of out and residual it was this

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])

torch.Size([12, 256, 12, 9])
torch.Size([12, 512, 12, 9])

How can I solve this issue?

About the utils/imutils.py line:41

I think the following code is confusing or not correct,
heatmap /= am / 255
because batchnorm is the last layer of the predict net, the single element of heatmap should within the range of 0-1. I think the code should be correct as followed,
heatmap /= am.
But I'm not totally sure I am right, can you explain it?

                    single_result_dict['image_id'] = int(ids)

                    single_result_dict['category_id'] = 1

                    single_result_dict['keypoints'] = single_result

                    **print(len(single_result_dict['keypoints']))**

                    single_result_dict['score'] = float(det_scores[b]) * v_score.mean()

                    full_result.append(single_result_dict)

And I notice that the keypoints coodr in the resulting file(result.json) don't have zero values.

	if ln > 1e-3:
	x += delta * px / ln
	y += delta * py / ln
	x = max(0, min(x, cfg.output_shape[1] - 1))
	y = max(0, min(y, cfg.output_shape[0] - 1))
	resy = float((4 * y + 2) / cfg.data_shape[0] * (details[b][3] - details[b][1]) + details[b][1])
	resx = float((4 * x + 2) / cfg.data_shape[1] * (details[b][2] - details[b][0]) + details[b][0])
	v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])