princeton-vl / pose-ae-train Goto Github PK

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"

License: BSD 3-Clause "New" or "Revised" License

Python 82.08% C 16.81% C++ 0.50% Cuda 0.61%

pose-ae-train's Introduction

Associative Embedding: Training Code

Multi-person pose estimation with PyTorch based on:

Associative Embedding: End-to-end Learning for Joint Detection and Grouping. Alejandro Newell, Zhiao Huang, and Jia Deng. Neural Information Processing Systems (NIPS), 2017.

(A pretrained model in TensorFlow is also available here: https://github.com/umich-vl/pose-ae-demo)

Getting Started

This repository provides everything necessary to train and evaluate a multi-person pose estimation model on COCO keypoints. If you plan on training your own model from scratch, we highly recommend using multiple GPUs. We also provide a pretrained model.

Requirements:

Python 3 (code has been tested on Python 3.6)
PyTorch
CUDA and cuDNN
Python packages (not exhaustive): opencv-python, cffi, munkres, tqdm, json

Before using the repository there are a couple of setup steps:

First, you must compile the C implementation of the associative embedding loss. Go to extensions/AE/ and call python build.py install. If you run into errors with missing include files for CUDA, this can be addressed by first calling export CPATH=/path/to/cuda/include.

Next, set up the COCO dataset. You can download it from here, and update the paths in data/coco_pose/ref.py to the correct directories for both images and annotations. After that, make sure to install the COCO PythonAPI from here.

You should be all set after that! For reference, the code is organized as follows:

data/: data loading and data augmentation code
models/: network architecture definitions
task/: task-specific functions and training configuration
utils/: image processing code and miscellaneous helper functions
extensions/: custom C code that needs to be compiled
train.py: code for model training
test.py: code for model evaluation

Training and Testing

To train a network, call:

python train.py -e test_run_001 (-e,--exp allows you to specify an experiment name)

To continue an experiment where it left off, you can call:

python train.py -c test_run_001

All training hyperparameters are defined in task/pose.py, and you can modify __config__ to test different options. It is likely you will have to change the batchsize to accommodate the number of GPUs you have available.

Once a model has been trained, you can evaluate it with:

python test.py -c test_run_001 -m [single|multi]

The argument -m,--mode indicates whether to do single- or multi-scale evaluation. Single scale evaluation is faster, but multiscale evaluation is responsible for large gains in performance. You can edit test.py to evaluate at more scales for further improvements.

Training/Validation split

This repository includes a predefined training/validation split that we use in our experiments, data/coco_pose/valid_id lists all images used for validation.

Pretrained model

To evaluate on the pretrained model, you can download it from here and unpack the file into exp/. Then call:

python test.py -c pretrained -m single

That should return a mAP of about 0.59 for single scale evaluation, and .66 for multiscale (performance can be improved further by evaluating at more than the default 3 scales). Results will not necessarily be the same on the COCO test sets.

To use this model for your own images, you can set up code to pass your own data to the multiperson function in test.py.

pose-ae-train's People

Contributors

Stargazers

Watchers

pose-ae-train's Issues

Compilation error when building extensions

Hi, when I compile the extensions, the errors occur as following. Do anyone know how to fix it? Thanks.

src/pose-ae-train/extensions/AE/src/my_lib.c: In function ‘my_lib_loss_forward’:
src/pose-ae-train/extensions/AE/src/my_lib.c:15:30: error: dereferencing pointer to incomplete type
const int batchsize = Tag->size[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:16:28: error: dereferencing pointer to incomplete type
const int tag_dim = Tag->size[2];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:17:37: error: dereferencing pointer to incomplete type
const int num_people = keypoints->size[1];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:18:36: error: dereferencing pointer to incomplete type
const int num_joint = keypoints->size[2];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:21:42: error: dereferencing pointer to incomplete type
const int kpt_strideBatch = keypoints->stride[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:22:43: error: dereferencing pointer to incomplete type
const int kpt_stridePeople = keypoints->stride[1];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:23:42: error: dereferencing pointer to incomplete type
const int kpt_strideJoint = keypoints->stride[2];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:25:36: error: dereferencing pointer to incomplete type
const int tag_strideBatch = Tag->stride[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:26:36: error: dereferencing pointer to incomplete type
const int tag_stridePoint = Tag->stride[1];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:28:42: error: dereferencing pointer to incomplete type
const int output_strideBatch = output->stride[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:29:43: error: dereferencing pointer to incomplete type
const int mean_strideBatch = mean_tags->stride[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:30:38: error: dereferencing pointer to incomplete type
const int mean_stride = mean_tags->stride[1];
^
src/pose-ae-train/extensions/AE/src/my_lib.c: In function ‘my_lib_loss_backward’:
src/pose-ae-train/extensions/AE/src/my_lib.c:113:30: error: dereferencing pointer to incomplete type
const int batchsize = Tag->size[0];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:114:28: error: dereferencing pointer to incomplete type
const int tag_dim = Tag->size[2];
^
src/pose-ae-train/extensions/AE/src/my_lib.c:115:37: error: dereferencing pointer to incomplete type
const int num_people = keypoints->size[1];
^
..........
Traceback (most recent call last):
File "Python-3.7/lib/python3.7/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "Python-3.7/lib/python3.7/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "Python-3.7/lib/python3.7/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "Python-3.7/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

out_dim

Why out_dim use 68 not 34? num_parts = 17,out_dim should be double or four times?Thank you!

Pre-training model cannot be downloaded

Pre-training model cannot be downloaded in China.
Thank you for your work, I can't download the pre-training model from the link you gave, can you give other download links, such as Google Driver, Baidu Wangpan, etc. Thank you very much.

[Compile Error] Cannot build

When I try to compile the AE loss, I encounter the following problem:

Including CUDA code.
generating /tmp/tmpwtp46zqq/_my_lib.c
setting the current directory to '/tmp/tmpwtp46zqq'
running build_ext
building '_my_lib' extension
creating home
creating home/likewise-open
creating home/likewise-open/SENSETIME
creating home/likewise-open/SENSETIME/liuyu1
creating home/likewise-open/SENSETIME/liuyu1/WorkShop
creating home/likewise-open/SENSETIME/liuyu1/WorkShop/pose-ae-train
creating home/likewise-open/SENSETIME/liuyu1/WorkShop/pose-ae-train/extensions
creating home/likewise-open/SENSETIME/liuyu1/WorkShop/pose-ae-train/extensions/AE
creating home/likewise-open/SENSETIME/liuyu1/WorkShop/pose-ae-train/extensions/AE/src
gcc -pthread -B /home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/include/python3.6m -c _my_lib.c -o ./_my_lib.o
In file included from /home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THVector.h:5:0,
from /home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/TH.h:12,
from _my_lib.c:492:
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h: In function ‘TH_polevl’:
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:134:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (size_t i = 0; i <= len; i++) {
^
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:134:3: note: use option -std=c99 or -std=gnu99 to compile your code
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h: In function ‘TH_polevlf’:
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:142:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (size_t i = 0; i <= len; i++) {
^
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h: In function ‘TH_trigamma’:
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:260:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < 6; ++i) {
^
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h: In function ‘TH_trigammaf’:
/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH/THMath.h:278:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < 6; ++i) {
^
Traceback (most recent call last):
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/cffi/ffiplatform.py", line 51, in _build
dist.run_command('build_ext')
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "build.py", line 34, in
ffi.build()
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 184, in build
_build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 108, in _build_extension
outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/cffi/api.py", line 697, in compile
compiler_verbose=verbose, debug=debug, **kwds)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/cffi/recompiler.py", line 1520, in recompile
compiler_verbose, debug)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/cffi/ffiplatform.py", line 22, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
File "/home/likewise-open/SENSETIME/liuyu1/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/cffi/ffiplatform.py", line 58, in _build
raise VerificationError('%s: %s' % (e.class.name, e))
cffi.error.VerificationError: CompileError: command 'gcc' failed with exit status 1

Anyone knows how to fix this?

How to handle the keypoints at the position of (0, 0) on COCO dataset?

Hi @anewell, I would like to ask you that how you handle the keypoints which are annotated at the position of (0, 0) during the training and inference. Is it affect the evaluation accuracy? Thanks.

extentions to fit pytorch 0.4.1

For the changed API of pytorch.
Update /extentions/AE/src/my_lib.c with
new_my_lib.c

why does the value of ‘detection_threshold’ not affect the number of keypoints？

https://github.com/umich-vl/pose-ae-train/blob/454d4ba113bbb9775d4dc259ef5e6c07c2ceed54/utils/group.py#L34
Thanks for your code！
I made some experiments about the value of ’detection_threshold‘，and found that it had no impact on the final visualized keypoints even if i set it as 10. From my understanding，the number of the visualized keypoints should be less if i set a higher detection threshold value. Could you explain this ? Thanks!

strange behaviours on validation data

I found the loss decreasing when the network is passing through validation data. So I checked the code and found:

# train.py
for phase in ['train', 'val']:
    num_step = config['train']['{}_iters'.format(phase)]
    generator = data_func(phase)
    print('start', phase, config['opt'].exp)

    show_range = range(num_step)
    show_range = tqdm.tqdm(show_range, total = num_step, ascii=True)
    batch_id = num_step * config['train']['epoch']
    for i in show_range:
        datas = next(generator)
        # phase is 'train' or 'val'
        outs = train_func(batch_id + i, config, phase, **datas)

The trainer is built in the task path: task/pose.py, where

def make_train(batch_id, config, phase, **inputs):
    for i in inputs:
        inputs[i] = make_input(inputs[i])

    net = config['inference']['net']
    config['batch_id'] = batch_id

    if phase != 'inference':
        # No phase variable is used here.
    else:
        out = {}
        net = net.eval()
        result = net(**inputs)
        if type(result)!=list and type(result)!=tuple:
            result = [result]
        out['preds'] = [make_output(i) for i in result]
        return out

So in my opinion, data in validation phase is also trained as those in the training phase.
Could you explain this to me, please? Thanks.

refine process

Hi, thanks for releasing the code. I have a question about the impementation of refine function in test.py
It identifies all missing keypoints and that will generate the estimation for all keypoints(i.e., 17 joints). But, not all keypoints are visible or exisitent in some cases. Thus, it will give the false estimation. I don't know why it still can improve the performance by a margin. Now, i use the code to train my own model on coco2017 and it has 53% mAP on minival2017 without the refinement. The pretrained tensorflow model has 56% mAP. I wonder if i miss some training details.

Unable to replicate the results with the pretrained model

Hi @anewell,

The evaluation of the pre-trained model is not producing the expected results.
The mAP results for the single and multi-scale settings on the pretrained model are 0.107 and 0.154, which is far less than the expected output of 0.59 and 0.66.
The mAP evaluation for a model trained from scratch(for 100 epochs with batch-size 8) is similarly producing poor results, mAP score of 0.137 on the multi-scale evaluation setting.

Thanks!

PS: Adding Arjun and Rishabh to the thread. @stencilman @rishabhdabral

Questions regarding tagging code

Hello,

I have couple of questions regarding the tagging method that I hope you will be able to address them.

For the file group.py class Params:

For the following self.partOrder = [i-1 for i in [1,2,3,4,5,6,7,12,13,8,9,10,11,14,15,16,17]] would you be able to explain where this order comes from exactly?
The self.max_num_people = 30, is there a specific reason why this is set to 30, or is this just a number that is based on empirical evaluation as to cover the max number of people that might appear with respect to COCO images? If so, would reducing/increasing this number for own dataset have any effect on the output?
Why is pooling used as a choice for NMS? And is there any specific reason for kernel=3, padding=1, I believe its purpose is to preserve the shape of the input map, if so, would the use of kernel=5, padding=2 be also a valid choice?
What is the reasoning behind the choice of threshold values for self.detection_threshold = 0.2 and self.tag_threshold = 1.?
Would you be able to briefly explain how the match_by_tag(*) works? And, what is the purpose

For the file group.py class HeatmapParser:

Would you be able to briefly explain the purpose of adjust(*) ?

Sorry for the long post, please bare in mind I have a limited experience in the area (just getting into keypoint prediction) and therefore some of the questions might be absolutely trivial to answer (I hope). Anyway, thank you in advance for your help, it's a great piece of work with some really awesome insights!

Error about 'match_by_tag' in group.py

According to the process of function parse in class HeatmapParser, there has three outputs of the function top_k:
tag_k has the shape of (num_images, num_joints, max_num_people, 2);
loc_k has the same shape with tag_k;
val_k has the shape of (num_images, num_joints, max_num_people).
The outputs of the function top_k will be the inputs of the function match_by_tag directly, however, according to the process of function match_by_tag, the first dimension of the three outputs has become as num_joints rather than num_images. The evidence is as follows:

    for i in range(params.num_joints):
        idx = params.joint_order[i]
        tags = tag_k[idx]

how long will it take to train?

it seems that it will never stop according to the code.

Did you train on PoseTrack dataset? I get very pool performance

Some predicted results contains annotations of people number more than max_num_people

Problem

I found the some predicted results contains annotations of people number more than max_num_people defined in the task.

Cause

Firstly, although function calc only extracts topk activations from each joint heatmap, these activation need to be matched by tags to get the final predicted people number.

However, people number will be larger than activations in single joint heatmap.
For example, nose heatmap has 27 activations higher than detection threshold, while eye heatmap has 28 activations higher than detection threshold. If only 20 of them can be matched (they only match each other when their tags are closed enough), 7 activations are remained in nose heatmap, 8 in eye heatmap. So we got 20+7+8=35 persons in eye iterations as dic and dic2 increase.

Solution

I notice that this line try to ignore joints matching after tags reach max_num_people. But it's a mistake to use len(actualTags) == params.max_num_people because len(actualTags) may increase by more than 1 in one joint iteration, 27->35 as the example above showing.

What's more, when there are lots of people in the images, this condition judgement will miss all the keypoints in lower half of body for all people. When you reach max_num_people in eye, you won't append more joints into dic.

So I think that's not a good idea to ignore joints matching after tags reach max_num_people, by simply modifying len(actualTags) == params.max_num_people to len(actualTags) >= params.max_num_people or someway else. By the way, this modification will still produce some results with people numbers slightly larger than max_num_people.

As a result , I suggest to place this statement before we change dic and dic2 to increase persons.

if row<diff2.shape[0] and col < diff2.shape[1] and diff2[row][col] < params.tag_threshold:
    dic[actualTags_key[col]][ptIdx] = joints[row]
    dic2[actualTags_key[col]].append(tags[row])
else:
    if params.ignore_too_much and len(list(dic.keys())) == params.max_num_people:
        continue
    key = tags[row][0]
    dic.setdefault(key, np.copy(default_))[ptIdx] = joints[row]
    dic2[key] = [tags[row]]

So I create a PR to fix this problem.

looking for HRNet implementation on C

Hello, is there any native implementation on C?

Question about the result of the forwardNet

Thanks for the excellent work, but I have some questions about the result of the forwardNet.
I think the result of the forwardNet should be batch_size * channel * output_res * output_res, and we should use result[:,:17,:,:] for detection loss and result[:,17:34,:,:] for embedding loss.
However, in models/posenet.py line 52, the author uses dets = preds[:,:,:17] and tags = preds[:,:,17:34]
It seems like the result of forwardNet is output_resoutput_reschannel.
Way? Does this has anything to do with line 49 in posenet.py?

test time

Hi，I want to know how long test a single image?

training loss curve and weights for AEloss

hi @anewell ,
Thank you for releasing the training code.
I tried to train the network on coco2014 with original settings except {'batchsize': 8, 'input_res': 256, 'output_res': 64}. The training loss curve looks so weird but the output map look right. I'm quite confused. Does this normal? Could you please share your training log?

In the line 102 of my_lib.c for push_loss calculation, why do you need to multiply the push_loss with 0.5?

```
    if(current_people>1)
```

        output_tmp[0]/= current_people*(current_people-1)/2;

```
    output_tmp[0] *= 0.5;
```

Why are the weights for pull_loss and push_loss so small(1e-3)? How did you choose the weights for pull_loss, push_loss and detection_loss？

How many epoches do you run?

As epoch_num is not specified in the task/pose.py config, we need to specify the epoch num or interrupt the training manually.

I run the model for 60 epoches with this code but got poor results that AP=0.397 in multi scale mode.

How many epoches do you run? Can anyone reach the AP declared in the paper?

Any way to convert the model to tensorflow?

Hello:

I have changed the pose-ae-train to detect keypoints no-human related. This results in different number of outputs (not 17) and different weights in layers. Anybody knows how to convert the pytorch model to tensorflow (so I can infere with pose-ae-demo)?

I tried pytorch2keras, onnx and mmdnn but none of them works.

Thanks in advance.

TypeError: dist must be a Distribution instance

pytorch:1.2.0, torchvision:0.4.0
When I excute the comand: python build.py install
as errors occured(create_extension is deprecated), so I edit the code as below:

from torch.utils.ffi import create_extension?

from torch.utils.cpp_extension import BuildExtension

extra_objects = ['src/my_lib_kernel.o'] # there is no my_lib_kernel.o file in project subdirectory /src/
extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]

ffi = create_extension(

ffi = BuildExtension(
'_ext.my_lib',
headers=headers,
sources=sources,
define_macros=defines,
relative_to=file,
with_cuda=with_cuda,
extra_objects=extra_objects # error happened this line
)

TypeError happened as below:
Including CUDA code.
Traceback (most recent call last):
File "build.py", line 31, in
extra_objects=extra_objects
File "/home/jiapy/virtualEnv/py3torch1.2/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 233, in init
super(BuildExtension, self).init(*args, **kwargs)
File "/home/jiapy/virtualEnv/py3torch1.2/lib/python3.6/site-packages/setuptools/init.py", line 163, in init
_Command.init(self, dist)
File "/usr/lib/python3.6/distutils/cmd.py", line 57, in init
raise TypeError("dist must be a Distribution instance")
TypeError: dist must be a Distribution instance

Anyone have similar experience? Any advice for solving the issue will be appreciated. Thanks

what about the gcc type???

hello @anewell, first of all, thanks your work.
when i cd /extensions/AE and run python build.py install i meet the bug:
cffi.VerificationError: CompileError: command 'gcc' failed with exit status 1

i'm using py3.6 and pytorch 0.4 gcc is 4.85 (also i tried 4.91) .
so i wonder which type of gcc is suitable?

How to modify the extension ae_loss code to fit Pytorch 1.0?

About pose：out_dim

i found the 'inference': 'oup_dim' in config(pose.py) is 68,however the loss is put on the first 34-d of the output of the network? It would be appreciate that you could give me a reply about the use of the later 34d.

bug at 200k iters

In /task/pose.py lines 132-135 should be after lines 137-141 so that optimizer is defined.

Could you tell me why did you evaluate the image which has more than 1 person ?

in your test.py file

for idx, (a, b) in enumerate(zip(gt, dt)):
if len(a)>0: <-----------------------

Why did you evaluate the images which have more than 1person ?

I evaluate the test2017 set which contains the zero persons image but AP is 58. xx ...
But in your eval code this weight file shows AP 66

And how can I get the 63 AP in coco test _dev set that is in your paper ?

I test your pretrained model in COCO test_dev_2017 but I got the 58.xx ..

Thank you for your reply

pretrained model & model trained from scratch

Thanks for releasing the training code. I have some questions about the pretrained model. I trained my own model on coco train2017 from scratch with one GPU(batchsize = 4), and I evaluated my own model on data/coco_pose/valid_id, and got a mAP of 0.592 for single scale evaluation. When I evaluated on val2017, I got a mAP of about 0.53 for single scale, while the pretrained model got a mAP of about 0.625 for single scale. What's more, when evaluated on test_dev2017, my own model got about 0.522 for single scale, and the pretrained model got about 0.565 for single scale.

I don't kown why the mAP significantly different on val2017 and test_dev2017, did the pretrained model trained on coco train2017+val2017 except the 500 images in data/coco_pose/valid_id ? or maybe because of not using multiple GPUs ?

How can I output AE heatmap?

I want to output AE predictions and visualize it. Where could I get it in the code? I mean...according to your paper, you have heatmaps of tags at every pixel position. And also in the paper, you visualized those heatmaps. I want to output those heatmaps of my own picture, what should I do? Or where, in the code, can I intercept such "heatmaps of tags" (AE predictions) to output?

unable to get the pretrained model

Thank u for your work.But I am sorry that something wrong happened to the pretrained model .Could you send one again? thank you so much.

How to select GPU in code?

Hi, I would like to ask that how to select specific GPUs for training in scripts. And I see the memory utilization of multiple GPUs are not the same. Does anyone know how to choose and how to balance the memory utilization? Thanks.

Model works too slow

It takes up to 20 seconds to obtain results on own model. Have anybody faced this problem?

fatal error: TH/TH.h: No such file or directory

when i run python build.py install encountered this issue in ubuntu 16.04. I studying this project, could anyone tell me how to solve this! thanks ahead.
#issue
Including CUDA code.
/home/xgx/anaconda3/envs/py36/lib/python3.6/distutils/extension.py:131: UserWarning: Unknown Extension options: 'headers', 'relative_to', 'with_cuda'
warnings.warn(msg)
running install
running build
running build_ext
building 'my_lib' extension
gcc -pthread -B /home/jinyu/anaconda3/envs/py36/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/jinyu/anaconda3/envs/py36/include/python3.6m -c src/my_lib.c -o build/temp.linux-x86_64-3.6/src/my_lib.o
src/my_lib.c:1:19: fatal error: TH/TH.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

py36 envrioment

conda list

Name Version Build Channel
blas 1.0 mkl
bzip2 1.0.6 h14c3975_5
ca-certificates 2019.3.9 hecc5488_0 conda-forge
cairo 1.14.12 h8948797_3
certifi 2019.3.9 py36_0 conda-forge
cffi 1.12.2 py36h2e261b9_1
cycler 0.10.0 py_1 conda-forge
cython 0.29.7 py36he1b5a44_0 conda-forge
dbus 1.13.2 h714fa37_1
expat 2.2.5 hf484d3e_1002 conda-forge
ffmpeg 3.4 h7985aa0_0
fontconfig 2.13.1 he4413a7_1000 conda-forge
freeglut 3.0.0 hf484d3e_5
freetype 2.10.0 he983fc9_0 conda-forge
glib 2.56.2 hd408876_0
graphite2 1.3.13 h23475e2_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.8.0 py36h39dcb92_0
harfbuzz 1.9.0 he243708_1001 conda-forge
hdf5 1.8.18 h6792536_1
icu 58.2 h9c2bf20_1
intel-openmp 2019.3 199
jasper 1.900.1 4 conda-forge
jpeg 9b h024ee3a_2
kiwisolver 1.0.1 py36h6bb024c_1002 conda-forge
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libglu 9.0.0 hf484d3e_1
libopus 1.3 h7b6447c_0
libpng 1.6.36 hbc83047_0
libprotobuf 3.5.2 h6f1eeef_0
libstdcxx-ng 8.2.0 hdf63c60_1
libtiff 4.0.10 h2733197_2
libuuid 2.32.1 h14c3975_1000 conda-forge
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.9 he19cac6_0
matplotlib 3.0.2 py36h8a2030e_1001 conda-forge
matplotlib-base 3.0.2 py36h167e16e_1001 conda-forge
mkl 2019.3 199
mkl_fft 1.0.10 py36ha843d7b_0
mkl_random 1.0.2 py36hd81dba3_0
munkres 1.0.7 py36_0 omnia
ncurses 6.1 he6710b0_1
ninja 1.9.0 py36hfd86e86_0
numpy 1.16.2 py36h7e9f1db_0
numpy-base 1.16.2 py36hde5b4d6_0
olefile 0.46 py36_0
opencv 3.3.1 py36h0f6f1c3_0
openssl 1.1.1b h14c3975_1 conda-forge
pcre 8.43 he6710b0_0
pillow 5.4.1 py36h34e0f95_0
pip 19.0.3 py36_0
pixman 0.38.0 h7b6447c_0
pycocotools 2.0.0 py36h14c3975_1000 conda-forge
pycparser 2.19 py36_0
pyparsing 2.4.0 py_0 conda-forge
pyqt 5.6.0 py36h13b7fb3_1008 conda-forge
python 3.6.8 h0371630_0
python-dateutil 2.8.0 py_0 conda-forge
pytorch 1.0.0 py3.6_cuda9.0.176_cudnn7.4.1_1 soumith
qt 5.6.3 h8bf5577_3
readline 7.0 h7b6447c_5
scipy 1.1.0 py36h7c811a0_2
setuptools 41.0.0 py36_0
sip 4.18.1 py36hf484d3e_1000 conda-forge
six 1.12.0 py36_0
sqlite 3.27.2 h7b6447c_0
tk 8.6.8 hbc83047_0
torchvision 0.2.1 py_2 soumith
tornado 6.0.2 py36h516909a_0 conda-forge
tqdm 4.31.1 py36_1
wheel 0.33.1 py36_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0

#cuda version
nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

#variable path
gedit .bashrc

#cuda things
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CPATH=/usr/local/cuda-9.0/include
#cuda end

How long dose the training process last?

Hi everyone. If we want to reproduce the performance which is claimed in the original paper, how long the training will be on a 1080 ti GPU? Thank you!

How the train/val datasets are split when train "pretrained model"?

Implement associative embedding loss without C.

Can we implement associative embedding loss in python based on pytorch?

The tags I got has vastly different value from the paper. How can I get the tags' value in the paper?

It's just my little experiment on the source code. I might get it all wrong in generating the tags' values. Please point out my errors. Thanks!!!
I experimented on the image has the image_id 262145, namely, the one from the paper, Figure 4. The model I'm using is download from the link given by this project.
I output the information of tags from multiperson function in test.py. In the function, it does flip enhancement. I added two tags values (one from the picture, one from the flipped picture). I output all the values of ground truth key points:

[1.260961, 1.3058516, 0, 1.2563938, 0, 1.2727276, 1.3023847, 1.2349962, 1.2527169, 1.2959831, 1.2270368, 1.3404604, 1.7944227, 1.3047302, 1.3632604, 1.2319796, 1.2049041]
[0.5672045, 0.5553081, 0, 0.5342066, 0, 0.62853205, 0.57086825, 0.6060959, 0, 0, 0, 0.44562602, 0.5234276, 0.58281267, 0.5969374, 0.3715278, 0]
[0, 0, 0, 1.2847638, 1.2794614, 1.3218927, 1.3253775, 1.390451, 1.4409075, 0, 0, 1.4085245, 1.4261274, 0, 0, 0, 0]
[0, 0, 0, 1.216414, 1.2051077, 1.1880434, 1.214577, 1.2288826, 1.2320774, 0, 0, 1.1940722, 1.2197711, 0, 0, 0, 0]
[0.71830493, 0.77619904, 0, 0.69648445, 0, 0.8377314, 0, 0.7768618, 0, 0.83856195, 0, 0.85833573, 0, 0, 0, 0, 0]
[0.29368687, 0.29288244, 0.2850952, 0, 0.3033538, 0.35456848, 0.31444836, 0, 0.27472353, 0, 0.32352066, 0.3529215, 0.31695318, 0, 0.2855525, 0, 0]
[0.38019133, 0.39934206, 0, 0.32967758, 0, 0.3586445, 0.40299988, 0.2993703, 0.304348, 0.5085192, 0.27281237, 0.3268919, 0, 0.3919878, 0, 0.38223648, 0]
[0, 0, 0, 0.5740683, 0.57219386, 0.5397749, 0.58586, 0.5749419, 0.58152056, 0.4648273, 0.2812636, 0.5862839, 0.55253434, 0, 0, 0, 0]

There are 8 people annotated with keypoints. 0 means that key point information is not available. Clearly the tags' values from the same person vary less than 0.1 for the most key points. The max variation is less than 0.3. I can see why every person's key points can fall into a almost straight line.
Unfortunately, the difference between two people can also be less than 0.1. This means, not all people can be differentiated by the tags' values. At least, in the data I generated, there could be two people falling into the same line.
The most confusing part is, tags' values I generated here is between 0.2 and 1.8, not -6 to 10 as the Figure.4 in the paper. Where I got it wrong?

Do you have a segmentation part?

In the paper you said using NMS to get the identifiers for each object instance but I don't know how to identifiers!
We calculate a histogram of the tags and perform non-maximumsuppression to determine a set of values to use as identifiers for each object instance.

training not saving model or logging when continuing training

When trying to continue training, experiment stops saving checkpoints and logging. This is because there is no handler for experiment name to come after '-c'.

Meaning, 'python train.py -e test_run_001' saves + logs, but 'python train.py -c test_run_001' does not

To handle saving, I suggest adding something like the following after line 64 of train.py, in the 'save' function:

if config['opt'].exp=='pose' and config['opt'].continue_exp is not None:
resume = os.path.join('exp', config['opt'].continue_exp)

To handle logging, I suggest a variant of the same code from above after line 88 of task/pose.py in the make_network function:

if configs['opt'].exp=='pose' and configs['opt'].continue_exp is not None:
exp_path = os.path.join('exp', configs['opt'].continue_exp)

One stage of hourglass network

Hi~ I have tried to estimate with only one stage of hourglass network, but the performance is very poor. How can I improve it? The input size of training images is set to 384 pixels in my case.

Generating ground truth Tag References

Hello,

Question about the generation of keypoints references for tag loss. In the https://github.com/princeton-vl/pose-ae-train/blob/master/data/coco_pose/dp.py#L43 in __call__ function reference values for each tag map are computed as

visible_nodes[i][tot] = (idx * output_res * output_res + y * output_res + x, 1), initially I thought that the reference value should reflect the ground truth position of the keypoint in a flattened map, however, it seems that this is not true. This seems bit confusing as the predicted tag value from a flattened tag map is actually retrieved using that reference value so it makes me wonder why are reference tag values computed as per above? Why do those value not reflect the actual position of the keypoint? I was under the impression that during the computation of the loss you would want to retrieve predicted tag value at the exact location of where the keypoint is.

Cheers,

The batch_normalization layers are not used by default?

Do you use BN layers? Thank you~

About train:index -3 is out of bounds for dimension 0 with size 2

When train,the following problem arises:

start train test_run_002
0%| | 0/1000 [00:00<?, ?it/s]/home/ylp/anaconda3/envs/pytorch0.4/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:173: UserWarning: nn.UpsamplingNearest2d is deprecated. Use nn.Upsample instead.
warnings.warn("nn.UpsamplingNearest2d is deprecated. Use nn.Upsample instead.")
100%|###################################################################################################################################################################| 1000/1000 [04:18<00:00, 3.87it/s]
start valid test_run_002
0%| | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 130, in
main()
File "train.py", line 127, in main
train(func, data_func, config)
File "train.py", line 94, in train
outs = train_func(batch_id + i, config, phase, **datas)
File "/home/ylp/sunyubo/github/pose-ae-train/task/pose.py", line 113, in make_train
losses = {i[0]: result[-num_loss + idx]*i[1] for idx, i in enumerate(config['train']['loss'])}
File "/home/ylp/sunyubo/github/pose-ae-train/task/pose.py", line 113, in
losses = {i[0]: result[-num_loss + idx]*i[1] for idx, i in enumerate(config['train']['loss'])}
IndexError: index -3 is out of bounds for dimension 0 with size 2

Can you tell me how to handle it?Thank you.

Some questions about the x and y coordinates!

In calc() function which in group.py file, x and y define for this:
x = ind % w
y = (ind / w).long()
ind_k = torch.stack((x, y), dim=3)
But in adjust() function which in group.py file, the order of x and y is:
y, x = joint[0:2]
And in refine() function which in test.py file, the order of x and y is:
y, x = keypoints[i][:2].astype(np.int32)
Bue in genDtByPred() function which in test.py file, order of x and y is:
tmp["keypoints"] += [float(j[0]), float(j[1]), 1]

About data loading

Hi, grateful for your great job!
I want to train on coco 2017, but the load function seems cannot parse the data format. What's the annotation format difference between coco2017 and coco2014?

Refine predictions

In the paper, there goes 'we apply a single-person pose model [40] trained on the same dataset to further refine predictions.'
Does this means during test, the code firstly estimate the initial joint locations and then crop the approximate person area, afterwords, estimate the cropped person using a single-person pose model?
It would be so nice if you could rely.

KeyError: 262145

KeyError: 262145 occured while return [self.imgs[ids]]

The pretrained model is unavailable

Dear author, thanks for your great work and it is very helpful to our works. Could you please provide the pre-trained model again? The pre-trained model in the given link seems to be unavailable now. Thanks very much, looking forward to your reply as soon as possible.

How to handle the regions of people without annotations on COCO and MPII dataset?

Hi @anewell, I would like to ask a question that how to handle the people without annotations on COCO and MPII dataset. Do you think is it affect the training and testing performance? Thanks.

Multi Person Evaluation on MPII

In Section 4 of the paper,

"The groups for MPII Multi-Person are usually a subset of the total people in a particular image, so some information is provided to make sure predictions are made on the correct targets. This includes a general bounding box and scale term used to indicate the occupied region".

However, in mpii test set, there are not bounding box information. Only objpos(person center) and scale(person size, but I can't figure out how it is computed) are provided. Can you explain more about MPII multi person evaluation? How to use these information to make sure predictions are made on the correct targets?