gengdavid / pytorch-cpn Goto Github PK
View Code? Open in Web Editor NEWA PyTorch re-implementation of CPN (Cascaded Pyramid Network for Multi-Person Pose Estimation)
License: GNU General Public License v3.0
A PyTorch re-implementation of CPN (Cascaded Pyramid Network for Multi-Person Pose Estimation)
License: GNU General Public License v3.0
when I print(inputs.size(0)),which in 'test.py' line 83.
I find that output is 128.But test batch size is 128,it means all the 128 images in the val2017 only have one person respectively?I think maybe because the code just detect one person's keypoints in a picture?
And when I want to test my own picture ,I also find that I can only detect one person in a picture even though my picture have three people(because I don't have my own picture's gt_bbox,so I have to reashape my picture myself to fit the code)
So I want to know if the code can just detect one person's keypoints in one picture but don't support multi-persons?
globalNet.py
def _upsample(self):
layers = []
layers.append(torch.nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True))
layers.append(torch.nn.Conv2d(256, 256,
kernel_size=1, stride=1, bias=True))
layers.append(nn.BatchNorm2d(256))
return nn.Sequential(*layers)
for global_output, label in zip(global_outputs, targets):
num_points = global_output.size()[1]
global_label = label * (valid > 1.1).type(torch.FloatTensor).view(-1, num_points, 1, 1)
global_loss = criterion1(global_output,
torch.autograd.Variable(global_label.cuda(async=True))) / 2.0
loss += global_loss
global_loss_record += global_loss.data.item()
上面的代码是不是有错误:
global_outputs should be reversed?
I finish my CPN network like this, but it is only 0.553 mAP. Are there someone could give me advice about it?
Hello! I want to speed up the testing process, so I'm thinking to use just GlobalNet, without RefineNet. Do you think this could work without losing too much AP? Also how should I do it? Thank you very much!
Hi @GengDavid,
I've found the self.pixel_means is changed in every iteration when calling getitem due to modification of mean variable in color_normalize function. As a result, our expected color normalization will not take effect after several samples iterations because the mean value decreases to [0, 0, 0].
This issue infects both training and testing phase.
See detailed printed log below:
checking pixel means init: [122.7717 115.9465 102.9801]
=> loaded checkpoint 'checkpoint/epoch32checkpoint.pth.tar' (epoch 32)
testing...
checking pixel means getitem: [122.7717 115.9465 102.9801]
checking pixel means getitem: [0.48145765 0.45469216 0.40384353]
checking pixel means getitem: [0.00188807 0.00178311 0.0015837 ]
checking pixel means getitem: [7.40419296e-06 6.99257450e-06 6.21058869e-06]
checking pixel means getitem: [2.90360508e-08 2.74218608e-08 2.43552498e-08]
Please help to double check it.
good job
retrain and Achieve the accuracy of the paper?
How to use the pre-training network model to visualize the image test results? Is there a demo?
I ran the train.py,but got a error.'TypeError: init() got an unexpected keyword argument 'align_corners''.It occured at globalNet.py,line 56.
I don't know how to fix it,can you help me?thank you
I ran train.py with an error that prompted me: Runtime Error: CUDA error: out of memory. I haven't made any changes at present. Why is this problem? How to solve it?
gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$ python3 train.py
Initialize with pre-trained ResNet
successfully load 318 keys
/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Total params: 104.55MB
Epoch: 1 | LR: 0.00050000
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/usr/local/lib/python3.5/dist-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
.....................
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py", line 123, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File "/home/gydx/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 1985, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: CUDA error: out of memory
gydx@gydx-HP-Z6-G4-Workstation:~/A-YFT/pytorch-cpn/256.192.model$
Hi.
I found one of your pretrained models has wrong name.
The parameter file of COCO.res101.384x288.CPN on Google Drive is CPN101_385x288.pth.tar.
Thanks for your nice work.
Hello, I used your method to train, and then used test.py to test, but found that the test accuracy is very low, I downloaded the pre-training model, tested, still very low.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.188.
I suspect that my test data is wrong, I am using COCO2017/val2017,COCO_2017_val.json,person_keypoints_val2017.json
The data is stored according to data/README.md. I want to ask what your test data is. The test results are so good. Thank you.
Thanks for your work.But I have a question at the line 115 in test.py.
Why you use 4y+2 but 4y? Can you explain the meaning of +2 ?Thanks.
I am running custom images through this model. All the images have been pre-proceed and cropped as to just include the human, and I've removed all annotation files and hardcoded the information that the annotation files used to provide. Also, I'm only processing images where there is a single person in-frame.
For some reason, half of the output predictions are just wrong--they are a mess. The other half of the output predictions look perfect. The wrong outputs are almost all identical with only slight, barely noticeable differences in joint positions. Also, If I feed in, say, a folder of 1000 images, the predictions on images 1-64 will be perfect, 65-128 will be wrong, 129-192 will be perfect, and 193-256 will be wrong, and this pattern continues. This pattern remains the same regardless of the input data.
Any idea why this is happening? I'm happy to provide more info about the issue. Thanks.
Can pre-trained model be loaded during training? I don't see the pretrained model loading code in the train.py
Thanks for your code.
I find that you use a different data augmentation with tf-cpn.
Did you compare your data augmentation with tf-cpn's?
What is the result?
Thank you another time
When I use command make, the process will say a error
the error is that gcc: error: pycocotools/_mask.c: No such file or directory
So how can I solver this problem?
Hi, David, Sorry to bother you.
I am a little confused about the output result of key points.
In the pytorch-cpn/256.192.model/test.py file, line 117, you write output
as follows:
v_score[p] = float(r0[p, int(round(y)+1e-10), int(round(x)+1e-10)])
single_result.append(resx)
single_result.append(resy)
single_result.append(1)
I guess the last 1 stands for the confidence or probability, I think it is not
so reasonable that the probability is always 1.
I guess v_score[p] has similar meaning,
why not use v_score[p] instead.
hi, thanks for your pytorch code!
I have seen your code about test a model,but i dont know why the number of the output keypoints is always 51(17 keypoints), The way I print it is:
if len(single_result) != 0:
single_result_dict['image_id'] = int(ids)
single_result_dict['category_id'] = 1
single_result_dict['keypoints'] = single_result
**print(len(single_result_dict['keypoints']))**
single_result_dict['score'] = float(det_scores[b]) * v_score.mean()
full_result.append(single_result_dict)
And I notice that the keypoints coodr in the resulting file(result.json) don't have zero values.
FileNotFoundError: [Errno 2] No such file or directory: '/media/agent/data/xcj/pytorch-cpn/256.192.model/../data/COCO2017/annotations/COCO_2017_train.json'
请问下Detection Result和 Detection Result的区别是什么呢?
hi, thanks for your pytorch code!
I have seen your code about test a model, but I don't know why 4x and 4y should plus 2(line 115 and 116)?
And I have also seen that tf-version plus 2 too.
Thanks!
pytorch-cpn/256.192.model/test.py
Lines 110 to 117 in 48696a9
I think the following code is confusing or not correct,
heatmap /= am / 255
because batchnorm is the last layer of the predict net, the single element of heatmap should within the range of 0-1. I think the code should be correct as followed,
heatmap /= am.
But I'm not totally sure I am right, can you explain it?
Hi,
I'm trying to use the pretrained model but none of the tar files seem to be in working order. I get the following error:
An error occurred while extracting files.
I'm using Ubuntu 16.04.
While training I am getting this error
<ipython-input-67-b0b9c64f728a> in forward(self, x)
94 print("")
95
---> 96 out += residual
97
98 out = self.relu(out)
RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1
The code block is the origin of the error
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
When I printed the size of out and residual it was this
torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])
torch.Size([12, 256, 96, 72])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 512, 48, 36])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 1024, 24, 18])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 2048, 12, 9])
torch.Size([12, 256, 12, 9])
torch.Size([12, 512, 12, 9])
How can I solve this issue?
pytorch-cpn/networks/refineNet.py
Line 65 in 48696a9
Why the final layer appends a BN layer? Why the output is normalized?
Could you give me a hint? Thank you!
There are 4 vars: target15 target11 target9 target7
what's the means of them?
Hello, I want to ask the use of parameter "symmetry" in the config file.
I guess maybe there is some problems in implementation of refine net.
In your refineNet.py, you define the forward pass as follows:
def forward(self, x):
refine_fms = []
for i in range(4):
refine_fms.append(self.cascadei)
out = torch.cat(refine_fms, dim=1)
out = self.final_predict(out)
return out
I think you should inverse the x, eg: x = x[::-1], because x[0] is the smallest feature map, and x[3] is biggest feature map. And there are 3 bottlenecks after smallest feature map , 0 bottleneck after biggest feature map according to paper.
When I run the train. py ,the cpu utilized percent is more than 300%. How can I solve the problem?
If I want to use myself detection results, should I transform the annotations to some particular format? I have read your code, and it seems that it's not mentioned.
thanks for your code.
when i replace resnet with mobilenet,i find the speed of model is slower..
i'm so confused . i run model in GPU(titan X)
do you know the reason?
the following is my code:
def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
return nn.Sequential(
nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
)
Hi @GengDavid,
Thanks for the great implementation. I'm eager collaborate with you to test other configurations. I have 2 x 1080
and 2 x 1080ti
. I can borrow more if needed. Looking forward to your response!
in this paper, I see that CPN used the image cropped in raw image with the bounding box of FPN as input. Where is the FPN in your reimplement ?
I do appreciate for your excellent work, and I get a warning in my training:
( my PyTorch version is 0.4.1 in experiment )
/root/anaconda3/envs/CPNs/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
I do some researching, and I find that nn.Upsampling( ) is no longer used in PyTorch>=0.4.1.
It contradicts with the install requirement.
And it seems , the model performance will be reduced if I ignore this warning.
How can I deal with it?
Looking forward to your reply!
相关资料不是很直观,想问下我直接把
refine_target_var = torch.autograd.Variable(target7.cuda(async=True))
valid_var = torch.autograd.Variable(valid.cuda(async=True))
改成如下可以吗?
refine_target_var = torch.autograd.Variable(target7.cuda())
valid_var = torch.autograd.Variable(valid.cuda())
Hi,
I've followed the instructions in the README thoroughly and have double checked all the steps. However, when running test.py, this line of code:
global_outputs, refine_output = model(input_var)
seems to never finish. In case it simply takes a very long time to run, I've also made a small testing subset of the val2017 folder and the annotations file with just 5 images, yet this line of code still seems to run forever. Upon a keyboard interrupt, the error message seems to have to do with threads acquiring lock (not sure if this is useful to know). Any idea why? Thanks.
Awesome repo!
I've trained using ground truth boxes and it works okay but the ap is just lower.
Do you think that the the detection box should affect training seeing as it's increase so significantly in the image preprocessing?
Do you think this network architecture needs to be improved if I'm using more joints than just COCO?
@YoungZiyu
Sure, you can find training log here
Originally posted by @GengDavid in #3 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.