GithubHelp home page GithubHelp logo

bmartacho / unipose Goto Github PK

View Code? Open in Web Editor NEW
210.0 10.0 44.0 119 KB

We propose UniPose, a unified framework for human pose estimation, based on our “Waterfall” Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. Current pose estimation methods utilizing standard CNN architectures heavily rely on statistical postprocessing or predefined anchor poses for joint localization. UniPose incorporates contextual seg- mentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filter- ing in the cascade architecture, while maintaining multi- scale fields-of-view comparable to spatial pyramid config- urations. Additionally, our method is extended to UniPose- LSTM for multi-frame processing and achieves state-of-the- art results for temporal pose estimation in Video. Our re- sults on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of- the-art results in single person pose detection for both sin- gle images and videos.

License: Other

Python 100.00%

unipose's Introduction

UniPose

UniPose: Unified Human Pose Estimation in Single Images and Videos.


NEW!: BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations

Our novel framework for bottom-up multi-person pose estimation achieves State-of-the-Art results in several datasets. The pre-print of our new method, BAPose, can be found in the following link: BAPose pre-print. Full code for the BAPose framework is scheduled to be released in the near future.


NEW!: UniPose+: A unified framework for 2D and 3D human pose estimation in images and videos

Our novel and improved UniPose+ framework for pose estimation achieves State-of-the-Art results in several datasets. UniPose+ can be found in the following link: UniPose+ at PAMI. Full code for the UniPose+ framework is scheduled to be released in the near future.


NEW!: OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation

Our novel framework for multi-person pose estimation achieves State-of-the-Art results in several datasets. The pre-print of our new method, OmniPose, can be found in the following link: OmniPose pre-print. Full code for the OmniPose framework is scheduled to be released in the near future. Github: https://github.com/bmartacho/OmniPose.


Figure 1: UniPose architecture for single frame pose detection. The input color image of dimensions (HxW) is fed through the ResNet backbone and WASP module to obtain 256 feature channels at reduced resolution by a factor of 8. The decoder module generates K heatmaps, one per joint, at the original resolution, and the locations of the joints are determined by a local max operation.


Figure 2: UniPose-LSTM architecture for pose estimation in videos. The joint heatmaps from the decoder of UniPose are fed into the LSTM along with the final heatmaps from the previous LSTM state. The convolutional layers following the LSTM reorganize the outputs into the final heatmaps used for joint localization.


We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method is extended to UniPose-LSTM for multi-frame processing and achieves state-of-the-art results for temporal pose estimation in Video. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.

We propose the “Waterfall Atrous Spatial Pyramid” module, shown in Figure 3. WASP is a novel architecture with Atrous Convolutions that is able to leverage both the larger Field-of-View of the Atrous Spatial Pyramid Pooling configuration and the reduced size of the cascade approach.


Figure 3: WASP Module.


Examples of the UniPose architecture for Pose Estimation are shown in Figures 4 for single images and videos.


Figure 4: Pose estimation samples for UniPose in images and videos.

Link to the published article at CVPR 2020.


Datasets:

Datasets used in this paper and required for training, validation, and testing can be downloaded directly from the dataset websites below:
LSP Dataset: https://sam.johnson.io/research/lsp.html
MPII Dataset: http://human-pose.mpi-inf.mpg.de/
PennAction Dataset: http://dreamdragon.github.io/PennAction/
BBC Pose Dataset: https://www.robots.ox.ac.uk/~vgg/data/pose/


Pre-trained Models:

The pre-trained weights can be downloaded here.


Contact:

Bruno Artacho:
E-mail: [email protected]
Website: https://www.brunoartacho.com

Andreas Savakis:
E-mail: [email protected]
Website: https://www.rit.edu/directory/axseec-andreas-savakis

Citation:

Artacho, B.; Savakis, A. UniPose: Unified Human Pose Estimation in Single Images and Videos. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

Latex:
@InProceedings{Artacho_2020_CVPR,
title = {UniPose: Unified Human Pose Estimation in Single Images and Videos},
author = {Artacho, Bruno and Savakis, Andreas},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020},
}

unipose's People

Contributors

bmartacho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unipose's Issues

Labels for BBox in MPII dataset

Thanks for your amazing work!
But I have a probem about how to run code on MPII dataset.
As you mentioned in the code"BBox was added to the labels by the authors to perform additional training and testing, as referred in the paper."
Where can I get this?Can you provide the annotation file with BBox labels? (TvT)
I would be appreciated if you can reply me.

where is the datasets about mpii?

img_path = self.images_dir + variable['img_paths'] 
segmented = cv2.imread(self.labels_dir + "segmented/" + variable['img_paths'][:-4] + '.png')
bbox = np.load(self.labels_dir + "BBOX/" + variable['img_paths'][:-4] + '.npy')

points = torch.Tensor(variable['joint_self'])
center = torch.Tensor(variable['objpos'])  # 594, 257
scale = variable['scale_provided']  # 3.021

Does anyone know where the corresponding files are downloaded?
bbox,segmented?

Error in loading state_dict for unipose

Hello,
I'm trying to test the lstm_unipose model using the pretrained UniPose_LSTM_PennAction weights. I'm getting this error:

Error(s) in loading state_dict for unipose:
size mismatch for backbone.conv1.weight: copying a param with shape torch.Size([64, 4, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 7, 7]).
size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([14, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([19, 256, 1, 1]).
size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([14]) from checkpoint, the shape in current model is torch.Size([19]).

Number of classes in last_conv of decoder

Hi,

Thanks for this code, it's really useful.

I had a quick query about the last convolution in the decoder layer (labelled below). In the num_classes there is an addition of 5. What does this do? Removing it allows the LSP and MPII models to be loaded correctly, leaving it there gives the same error that issue #6 highlighted.

self.last_conv = nn.Sequential(nn.Conv2d(304, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.1),
(THIS LINE ---->)                      nn.Conv2d(256, num_classes+1+5, kernel_size=1, stride=1))

wrong codes are uploaded

Hi
I think some mistakes had made by you in the codes :
I tried to read your paper and code . I solved some problems after all I'm facing this error:
it seems the model output and the heatmap_var dimensions are not compatible .
the error is in bellow :

Epoch 0:
0% 0/195 [00:00<?, ?it/s]torch.Size([8, 4, 46, 46]) heatmap_var shape
torch.Size([8, 15, 46, 46]) heat shape
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:446: UserWarning: Using a target size (torch.Size([8, 4, 46, 46])) that is different to the input size (torch.Size([8, 15, 46, 46])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
File "unipose.py", line 280, in
trainer.training(epoch)
File "unipose.py", line 122, in training
loss_heat = self.criterion(heat, heatmap_var)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 446, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2659, in mse_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 71, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore
RuntimeError: The size of tensor a (15) must match the size of tensor b (4) at non-singleton dimension 1
0% 0/195 [00:02<?, ?it/s]

this error happens in training unipose.py for lsp dataset.
as you can see in the second and third line the dimensions are not the same .
even i try to use interpolation for resizing them in to one size , but I cant .

thanks regards.
I'm waiting for your answer .
good luck .

The structure of wasp.

image

Sir, I saw the picture the input is fed to AtrousModule where the dilation_rate = 6 firstly, but in your code the input is fed to AtrousModule that dilation_rate = 24 firstly.

image

Output joints format

Hi, thanks for open sourcing your code.

Really nice work and super useful and necessary. I see that you use several datasets and deviate from using typical COCO. I was wondering which joints are detected and outputted by the model? Are they in LSP/MPII or even COCO format? Are there 15 joints?

Thanks!

Code for multipose ?

HelloThank you for your excellent work
Will your code for the multi pose be published soon?
It's not fair, cut image according to the retinanet, then apply your template?

Error in loading state_dict for unipos_LSTM

Hello,
I'm trying to test the lstm_unipose model using the pretrained UniPose_LSTM_PennAction weights. I'm getting this error:

Error(s) in loading state_dict for unipose:
size mismatch for backbone.conv1.weight: copying a param with shape torch.Size([64, 4, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 7, 7]).

I don't know what is the origin of the fourth dimension in the model checkpoint. The Uni_pose LSTM 1st backbone Conv layer has 3 channels as input in the model.

I trained to get the results in MPII

Hi, thank you for doing this great work.
I clone the project and trained on the MPII dataset and obtained the following results:
image

It did not reach the level you described in your paper, may I ask what caused it?

Pre-trained weight on the unipose_mpii predictions looks bad

Have loaded the pre-trained weights and made some predictions with the unipose modeland the predictions looks off

criterion   = torch.nn.MSELoss()
optimizer   = torch.optim.Adam(model.parameters(), lr=1e-3)
model = unipose(dataset="MPII",backbone='resnet', output_stride=16)

#load weights of the model

model.load_state_dict(torch.load("/content/UniPose_MPII.pth",map_location='cpu'))
model.eval()

trfm = transforms.Compose(
                          [
                          transforms.ToTensor()                                      
                          ]
                          )
DEVICE     = "cuda" if torch.cuda.is_available() else "cpu"
img = cv2.resize(cv2.imread("/content/fed.jpg"), (513, 513))
img2 = trfm(img)[None].to(DEVICE, dtype=torch.float)

model.eval()
hmp = model(img2)

image

The performance when frequent parts occlusion occurrences

Thank you for this great work. It is amazing that your framework does not require separate branches for bounding box and joint detections. As is said in your paper, the task of human pose estimation is challenging due to the frequent occurrence of parts occlusion. So I really want to know the performance of UniPose in such occasions. Is there same case or comparison with other methods in sequential video or single image?

Thank you.

Questions about the augmentation on PennAction dataset

Hi, thanks for your excellent research.
I'm confused about the data preprocessing on the PennAction. I see the TestResized() function is only used in utils.py, right? And why are not other augmentations used?
Looking forward to your reply, thanks very much.

Inference or demo is missing

Hi I understand it's a new work - and you published your pretrained model which is great.

But as an end user who wants to check out your model quality for his purposes there is no simple code snippet to run inference, and no demo. It looks you have already written the test function and also have all the components for this.

Yet no readme nor some basic indication how to infer the model on an image.

Test pretrained model

Thank for your work.

I'm using your pretrained models to test with my dataset.
I encounter an error.
When i use model for LSP dataset, i run command:
CUDA_VISIBLE_DEVICES=1 python3.7 test.py --dataset LSP --pretrained UniPose_LSP.tar --img_folder my_dataset
I get error:

Traceback (most recent call last):
  File "test_hand.py", line 166, in <module>
    trainer = Trainer(args)
  File "test_hand.py", line 89, in __init__
    self.model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for unipose:
	size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([15, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([20, 256, 1, 1]).
	size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([15]) from checkpoint, the shape in current model is torch.Size([20]

I change to MPII dataset with command:
CUDA_VISIBLE_DEVICES=1 python3.7 test.py --dataset MPII --pretrained UniPose_MPII.tar --img_folder my_dataset
then i get:

Traceback (most recent call last):
  File "test_hand.py", line 167, in <module>
    trainer = Trainer(args)
  File "test_hand.py", line 89, in __init__
    self.model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for unipose:
	size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([17, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([22, 256, 1, 1]).
	size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([22]).

My test.py is just unipose.py with new args and remove train, validatation code.

Can you check your pretrained models or maybe what did i do wrong?

The results of validation are 0.

Dear bmartacho:
Hello. Thanks for your excellent research.
I have met some questions about the source code. I ran the source code successfully on the Penn_Action dataset, but the results of the validation are 0. I don't know why the results occurred. So could you give me some suggestions, please? I changed the code as follows:
1. When I ran on the Penn_Action dataset, I found that the heatmap size is 368368, but the output of the model is 4646. So I changed the heatmap size to 46. I don't know if is it right?
2. In penn_action_data.py, I deleted the line 83 to 93.
3. When I loaded the pre-trained model (UniPose_LSTM_PennAction.tar), the conv1.weights in resnet.py (in model/module/backbone) is [3, 64, 7, 7], but the pre-trained model has the shape [4, 64, 7, 7]. So I changed the shape of conv1.weights.
Look forward to your reply, thanks a lot.
The results are as follows:
微信截图_20201016173457

COCO pretrained model dimension

Thank you so much for sharing your great work.
When I tested this network to coco dataset with your coco pretrained model,
The dimension issue happens.

"size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([18])."

An error message appears above, the decoder adds 1 to numclasses, so the dimensions don't match with the pretrained model.

I would be very grateful if you could reply.

Pretrained Penn Action model

thanks for your work,
I download the pretrained Penn Action model but can't uncompress it successfully. It says that the file is broken. Would you uploading a new one or a .pth file?

How does Unipose perform object detection?

May I ask, one of the outputs of Unipose is to learn where bounding box detection of human body is mainly reflected? How does the heat map obtained by outputting the one-dimensional channel translate into the bounding box of the target?

MPII???

谁有这个项目的mpii_annotation.json,从官网下载后转换为.json文件和项目中使用的好像不一样,我不会转换,求!
who has the file named mpii_annotation.json, I can't transform the file.

Running Error: uniposeLSTM on Penn_Action

error in code below (model/uniposeLSTM.py):
if iter == 0:
x = torch.cat((input[:,iter,:,:,:], centermap[:,iter,:,:,:]), dim=1)
x, low_level_feat = self.backbone(x)

print shape below:
input shape: torch.Size([5, 5, 3, 368, 368])
centermap: torch.Size([5, 5, 1, 368, 368])
x : torch.Size([5, 4, 368, 368])

Traceback (most recent call last):
File "uniposeLSTM.py", line 301, in
trainer.training(epoch)
File "uniposeLSTM.py", line 126, in training
heat, cell, hide = self.model(input_var, centermap_var, j, heat, hide, cell)
File "/Users/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/james/work/PySpace/UniPose/model/uniposeLSTM.py", line 109, in forward
x, low_level_feat = self.backbone(x)
File "/Users/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/james/work/PySpace/UniPose/model/modules/backbone/resnet.py", line 114, in forward
x = self.conv1(input)
File "/Users/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/Users/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[5, 4, 368, 368] to have 3 channels, but got 4 channels instead

how to start?

Thank you for your work.
I want to run the pretrained model on pen action data. I don't know how to start.
Do you have a manual?

Questions about OmniPose

What is the approach of OmniPose??
Top-down(Bbox detection -> keypoint detection) or (Keypoint detection--> associative embedding)

Inference error

Dear Authors,

Thank you very much for your research. It is a very interesting research area!
I have recently adapted your code to very simply test on my own images, however, I think I have made some errors as my predictions are all wrong.

I have used the test function in unipose.py and uncommented line 165 of utlis\utils.py in the draw_paint function so that it could find the source image.

I have used the MPII args and downloaded the MPII pretrained weights for the inference.

This is the source image I used, as you can see, it is a very simple human pose:
test4

and this is the output I get from the human pose estimation:

0

Can you please advise if you think there are any steps I might have done wrong?
I am happy to share my code if you want, but it is mainly adapted only to look for the correct files and that's it

I look forward to hearing from you

Many thanks

About evaluation of PCKh in evaluate.py

Hi, I used your code to train the UniPose model.

However, In the part of calculating the PCKh in your evaluation code, I was a little confused.

In line 70 in evaluate.py,

image

I think you declared the "norm" variable to use the distance.

Actually, in calc_dists function in line 5, you use the "norm" variable to normalize the distance from prediction to target.

image

Here I have 2 questions.

FIrst, when normalizing the distance, I don't understand the reason you divided the norm with 10.
As far as I know, I think the "np.ones((pred.shape[0], 2)) * np.array([h,w])" should be the norm.

Second, when calculating the PCKh starting from line 91.

image

When you are defining the headLength variable, which is used as a standard when calculating PCKh, I think you are calculating the head length of the object using original target points(which is not normalized).

I think this leads to comparing the normalized distance between the target and the prediction with the original distance(not normalized).

Thank you.

Person detection branch?

Dear,
Thank you a lot for your work. And I have some questions to ask:

  1. I don't see you train the person detection branch. How come you can predict bbox and pose at the same time as you claim?
  2. In MPII, you use segmented folder. How can you find that annotation?
    I'm looking forward to hearing from you

MPII weight is not compatible

Thanks for your work
Btw I tried to used pretrained weight file (MPII version) and it failed.
It seems for me that the reason is :
weight file has 16 chanel output as final result of decoder.last_conv(), but
actual model structrue has 17 channel output, as denoted as num_classes + 1

class Decoder(nn.Module):
    def __init__(self, dataset, num_classes, backbone, BatchNorm):
        super(Declass Decoder(nn.Module):
    def __init__(self, dataset, num_classes, backbone, BatchNorm):
        super(Decoder, self).__init__()
        if backbone == 'resnet':
            low_level_inplanes = 256

        if dataset == "NTID":
            limbsNum = 18
        else:
            limbsNum = 13

        self.conv1 = nn.Conv2d(low_level_inplanes, 48, 1, bias=False)
        self.bn1 = BatchNorm(48)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(2048, 256, 1, bias=False)
        self.bn2 = BatchNorm(256)
        self.last_conv = nn.Sequential(nn.Conv2d(304, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.1),
                                       nn.Conv2d(256, num_classes + 1, kernel_size=1, stride=1))    
#                                        nn.Conv2d(256, num_classes+5+1, kernel_size=1, stride=1)) # Use in case of extacting the bounding box

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self._init_weight()coder, self).__init__()
        if backbone == 'resnet':
            low_level_inplanes = 256

        if dataset == "NTID":
            limbsNum = 18
        else:
            limbsNum = 13

        self.conv1 = nn.Conv2d(low_level_inplanes, 48, 1, bias=False)
        self.bn1 = BatchNorm(48)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(2048, 256, 1, bias=False)
        self.bn2 = BatchNorm(256)
        self.last_conv = nn.Sequential(nn.Conv2d(304, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=False),
                                       BatchNorm(256),
                                       nn.ReLU(),
                                       nn.Dropout(0.1),
                                       nn.Conv2d(256, num_classes + 1, kernel_size=1, stride=1)) #HERE!  
#                                   nn.Conv2d(256, num_classes+5+1, kernel_size=1, stride=1)) # Use in case of extacting the bounding box

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self._init_weight()

As I think, final output should have 16 channel, one for each joint.
I would appreciate if you can help me to go throught this problem.
Thx in advance!

  • some may say I can simply just replace num_classes + 1 to num_classes, but it causes error in get_kpts
    I'm quite not sure why for loop in get_kpts() starts with index 1, not 0..,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.