GithubHelp home page GithubHelp logo

chenyilun95 / tf-cpn Goto Github PK

View Code? Open in Web Editor NEW
791.0 27.0 197.0 720 KB

Cascaded Pyramid Network for Multi-Person Pose Estimation (CVPR 2018)

License: MIT License

Python 93.62% Shell 0.04% C++ 2.67% Cuda 3.63% Makefile 0.03%

tf-cpn's Introduction

Cascaded Pyramid Network (CPN)

This repo is also linked to megvii-cpn

This is a Tensorflow re-implementation of CPN (Cascaded Pyramid Network), which wins 2017 COCO Keypoints Challenge. The original repo is based on the inner deep learning framework (MegBrain) in Megvii Inc.

Results on COCO minival dataset (Single Model)

Note that our testing code is based on some detectors. In COCO minival dataset, the used detector here achieves an AP of 41.1 whose human AP is 55.3 in COCO minival dataset.

Method Base Model Input Size AP @0.5:0.95 AP @0.5 AP @0.75 AP medium AP large
CPN ResNet-50 256x192 69.7 88.3 77.0 66.2 76.1
CPN ResNet-50 384x288 72.3 89.1 78.8 68.4 79.1
CPN ResNet-101 384x288 72.9 89.2 79.4 69.1 79.9

Results on COCO test-dev dataset (Single Model)

Here we use the strong detector that achieves an AP of 44.5 whose human AP is 57.2 in COCO test-dev dataset.

Method AP @0.5:0.95 AP @0.5 AP @0.75 AP medium AP large
Detectron(Mask R-CNN) 67.0 88.0 73.1 62.2 75.6
CPN(ResNet-101, 384x288) 72.0 90.4 79.5 68.3 78.6

For reference, by using the detection results given by MegDet that achieves an AP of 52.1 whose human AP is 62.9, human pose result is as followed.

Method AP @0.5:0.95 AP @0.5 AP @0.75 AP medium AP large
MegDet+CPN(ResNet-101, 384x288) 73.0 91.8 80.8 69.1 78.7

Usage

Train on MSCOCO dataset

  1. Clone the repository
git clone https://github.com/chenyilun95/tf-cpn.git

We'll call the directory that you cloned $CPN_ROOT.

  1. Download MSCOCO images from http://cocodataset.org/#download. We train in COCO trainvalminusminival dataset and validate in minival dataset. Then put the data and evaluation PythonAPI in $CPN_ROOT/data/COCO/MSCOCO. All paths are defined in config.py and you can modify them as you wish.

  2. Download the base model (ResNet) weights from slim model_zoo and put them in $CPN_ROOT/data/imagenet_weights/.

  3. Setup your environment by first running

pip3 install -r requirement.txt
  1. To train a CPN model, use network.py in the model folder.
python3 network.py -d 0-1

After the training finished, output is written underneath $CPN_ROOT/log/ which looks like below

log/
       |->model_dump/
       |    |->snapshot_1.ckpt.data-00000-of-00001
       |    |->snapshot_1.ckpt.index
       |    |->snapshot_1.ckpt.meta
       |    |->...
       |->train_logs.txt

Validation

Run the testing code in the model folder.

python3 mptest.py -d 0-1 -r 350

This assumes there is an models that has been trained for 350 epochs. If you just want to specify a pre-trained model path, it's fine to run

python3 mptest.py -d 0-1 -m log/model_dump/snapshot_350.ckpt

Here we provide the human detection boxes results:

Person detection results in COCO Minival

Person detection results in COCO test-dev

Pre-trained models:

COCO.res50.256x192.CPN

COCO.res50.384x288.CPN

COCO.res101.384x288.CPN

Citing CPN

If you find CPN useful in your research, please consider citing:

@article{Chen2018CPN,
    Author = {Chen, Yilun and Wang, Zhicheng and Peng, Yuxiang and Zhang, Zhiqiang and Yu, Gang and Sun, Jian},
    Title = {{Cascaded Pyramid Network for Multi-Person Pose Estimation}},
    Conference = {CVPR},
    Year = {2018}
}

You may also be interested in the following papers:

MSPN:

   @article{li2019rethinking,
     title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
     author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
     journal={arXiv preprint arXiv:1901.00148},
     year={2019}
   }

RSN:

   @misc{cai2020learning,
       title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
       author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
       year={2020},
       eprint={2003.04030},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
   }

Third party implementation

Thanks for Geng David and his pytorch re-implementation of CPN.

Troubleshooting

  1. If you find it pending while running mptest.py, it may be the blocking problem of python queue in multiprocessing. For convenience, I simply implemented data transferring via temporary files. You need to call MultiProc with extra parameter "dump_method=1" and it'll be fine to run the test code with multiprocess.

Contact

If you have any questions about this repo, please feel free to contact [email protected].

tf-cpn's People

Contributors

chenyilun95 avatar chgg avatar megvii-wzc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tf-cpn's Issues

About last_fm = None

Hi, thanks for sharing nice work.
In the create_global_net function of network.py, last_fm is define as None. However, that would prevent upsampling and pixel-wise summation of feature map described in your paper (Fig1).

Which one is correct? The code or Fig1 of paper?

A little confuse about "epoch_size".

Thanks for your great work!
But I'm a little confuse about epoch_size = 60000 # include flip * 2, aug * 4, batch * 16 in config.py
Could you explain about how to get 60000 a little? We know that COCO has about 150k person instances for training. Is there any connection between these two figures?

error "local variable 'label' referenced before assignment" shows if vis=True

set vis=True
run python mptest.py -d 0-1 -r 350
the follow error shows:

05-29 15:47:43 Current epoch is 350.
ran 0s >> << left 0s
Process Worker-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/../../lib/tfflat/mp_utils.py", line 34, in run
msg = self._func(self.id, *self.args, **self.kwargs)
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/mptest.py", line 220, in func
return test_net(tester, logger, dets, range)
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/mptest.py", line 90, in test_net
test_img, detail = Preprocessing(test_data[i], stage='test')
File "/media/projects/tf-cpn2/models/COCO.res101.384x288.CPN/dataset.py", line 241, in Preprocessing
draw_skeleton(tmpimg, label.astype(int))
UnboundLocalError: local variable 'label' referenced before assignment

Performing Inference

How to perform inference on an image? I want to run the model on an image and get the locations of the human joints. Can you please help me with this?

how about generate the label heatmap in the 384x288 resolution and resize the label to 96x72?

Hi,
I find that you generate the label heatmap in 92x72 resolution, so the [int(x/4.),[int(y/4.)]] was the center to generate Gaussian Blur. But it seems may cause mismatch with the original coordinate.e.g.int(17/4)=4,but 4*4=16. So I wonder could I generate the label heatmap in 384x288 resolution and resize it to 96x72? This method would be much slower than your implement but more accuracy?
Thanks in advance!

evaluation results

First of all, thanks for sharing the work. I quickly run a test of AP with following results, do you know why it is too low?

python3 models/COCO.res50.256x192.CPN/mptest.py -d 0-1 -r 350
loading annotations into memory...
Done (t=2.09s)
creating index...
index created!
loading the precalcuated json files
Loading and preparing results...
4581
4581
DONE (t=2.98s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
there are 40504 unique images
DONE (t=14.41s).
Accumulating evaluation results...
DONE (t=0.53s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.093
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.116
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.102
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.089
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.099
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.097
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.117
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.104
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.092
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.103
AP50
ap50 is 0.141489
ap is 0.099431

I added the AP calculation and saved the json file already

Ensemble in the paper?

Hi, what is your ensemble methods used in your paper? What models do you use ? How do you ensemble them? Thanks.

Batch size

How do you think a batch size of 8 versus the batch size you used affect the final output? Have you experimented with batch size at all to see the affect on results?

Compare with paper result

The paper report 73.0 on test-dev using ensemble models while this code can achieve it by single model ?

Training loss fluctuates

Hello, I train resnet101 of 384x288 with batchsize 16 and lr 1.5625e-05, the loss fluctuates between 60-100. Is it normal?

the influence of detector

the performance of human AP from the 57.2 to 62.9 increase extra 1% in human pose, will you release the result of 62.9 human detection result for analyze the importance of detector?

error :no default __reduce__ due to non-trivial __cinit__

Hello, I cloned this project recently and tried to evaluate the pretrained model's performance. But when I ran 'python3 mptest.py -d 0-1 -r 350' , I got this error message
ran 2337s >> << left 0s Traceback (most recent call last): File "/share1/home/chunyang/anaconda3/envs/cpn/lib/python3.6/multiprocessing/queues.py", line 234, in _feed obj = _ForkingPickler.dumps(obj) File "/share1/home/chunyang/anaconda3/envs/cpn/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "stringsource", line 2, in pyarrow.lib.Buffer.__reduce_cython__ TypeError: no default __reduce__ due to non-trivial __cinit__
So can you guys please tell me how to slove this problem ?

P.S. This error didn't come up until the evaluation process was almost done.

How about just detecting persons during training?

Hi, I'm reading your paper recently and feel it so cool. It is mentioned that you utilized all eighty categories in the dataset to train the detector but only caught person boxes for the follow-up work. I wonder if it is possible to detect person only while ignoring other categories. What are the benefits to detect so match categories?

A doubt about global loss and refine loss

I find the calculation of the global loss and the refine loss is different. The refine loss ignore the valid < 0.1, which doesn't generate the loss. But in the global loss when valid < 1.1, the label change to 0 as global_label, but the global_out doesn't change, which means the global loss only focus these visible points. Is my understanding correct?

how much fps?

Hi,
This can be run in RealTime?
Do you have any data about its benchmark (fps) on a given hardware?

I'd like to know if it's possible to get RealTime performance on Jetson TX2

why making the boarder for image in image process?

Hi,I'm quite a newer for human pose estimate, and your work helps me a lot. I'm confused that why you make boarder for the image in the image process, like this code:

bimg = cv2.copyMakeBorder(img, add, add, add, add, borderType=cv2.BORDER_CONSTANT,
                              value=cfg.pixel_means.reshape(-1))

It seems to avoid the region with human beyond the image size(e.g. xmin < 0), you pad the image before cropping the region with human. Did I understand it correctly? If so, crop first then pad the cropped image is another choice?
Thanks in advance.

doubt about the pre-model precision and recall

hi,i use the model you released COCO.res50.384x288.CPN snapshot_350.ckpt,set test_subset= True, the num is first 1000, the result is so low
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.111
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.131
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.119
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.108
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.117
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.113
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.132
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.120
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.108
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.121

this is because of released model is underfitting?i try to draw the keypoints model predicted,show result is not correct.

FileNotFoundError: [Errno 2] No such file or directory: '.../data/COCO/dets/person_detection_minival411_human553.json'

Hello,recently I forked this project and tried to evaluate its performance. But there is an error message when I run 'python3 mptest.py -d 0-1 -r 350'. It says "FileNotFoundError: [Errno 2] No such file or directory: '.../data/COCO/dets/person_detection_minival411_human553.json' ". I can't find this file in the Google Drive. So where can I download this json file?
Thank you very much ^_^

Loss evaluations

Very interested work.

During your training what were your refine loss, global loss and total loss values like in the final epochs? I've had to modify the repo due to different graphics cards so I am wondering if my values are similar.

Thanks again for all your hard work in this and other repos.

Train Details

First of all, thanks for sharing the work. You used 8 Titank GPUs for training . Now, I want to retrain the keypoints detection model as you using COCO dataset .but I only have one GPU (GTX1080, 8G memory), Can I finish the retrain with this GPU? and retrain need how much memory of GPU at least?
Thank you

About BN location

Hi! Thank you for making this project open to the world!
I run into some confuses because I am new to Tensorflow.
I wonder whether you add Batch Normalization(BN) to your added layers, e.g. 1*1 conv kernel and bottleneck in RefineNet.
Could you please give me a hint about where you add BN?

about heatmap size

Hi @chenyilun95,great work!
how about generate heatmap size the same as original image (img :256x192 , heatmap: 256x192)? will it increase AP due to pixel to pixel match?
Thanks.

JSON Training File

What are the parameters per image that are needed in the JSON training file if we want to train with our own data? Currently trying to write a script that can automate the JSON file generation for my own data. Thanks!

Why the keypoints whose coordinates are out of the input shape are reserved when generating the label heatmaps?

Hi,
I find when you generate the heatmaps, you throw away the points whose coordinates are less than 0, while how about those ones whose coordinates are bigger than the input shape, you replace them with the boundary's coordinates:
label[i][j << 1 | 1] = min(label[i][j << 1 | 1], ori_size[0] - 1)
label[i][j << 1] = min(label[i][j << 1], ori_size[1] - 1)
I wonder why you reserve these keypoints and generate the heatmaps different from the keypoints' original location.
Thanks!

calculate BN on multi GPUs?

Hi, thanks for your great work. I have a question on BN calculation. In your code, BN is only calculated on a single GPU. What if I want calculate BN on multi GPUs to make the normalization step more accurate? Since it may help network converge better. Thanks.

Windows

Great repo! Do you have any plans to make this compatible with windows?

e2e training or 2stage training?

I am wondering whether e2e training is the best way.

If I train global net to a steady stage and then I fix global net and start to train refine net, what the result would be like? Have you tried this approach and what about the results.

my results are lower than what is indicated

Congratulation for your COCO Challenge result, and thank you for sharing your code.

I'm testing your code and the problem is I have these results on the validation set of 2014:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.430
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.682
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.458
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.385
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.534
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.780
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.463
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.631

-I'm using ResNet-101 with an input size of 384x288
-I haven't almost changed anything in the code(except the config file, in mptest I put dump_method=1 as argument for the function MultiProc).
-I'm using the pretrained model you have trained.
-Also, for the dataset, I have downloaded the 2014 version(train and val).
-I'm using the annotation file and also the bounding box you've given.
-I don't understand why is there a huge difference between your results and mine ! Have I done something wrong ?

Test on my own data

I want to use the pretrain model on my own dataset (not COCO dataset). Which python files do I need to modify?

In the function of Preprocessing() of dataset.py, does the object objcenter means the center point of bounding box?

I saw that bbox is read from the json file, and the value of key 'bbox' is (start_x, start_y, width, height). So the object adds the latter 2 values with the first 2 values to get the end_x and end_y.
Then comes the issue I'm confused of,
objcenter = np.array([bbox[0] + bbox[2] / 2., bbox[1] + bbox[3] / 2.])
if it was to calculate the center point, why wasn't it wrote as (bbox[0] + bbox[2]) / 2. instead? Division comes before addition, right? Am I getting the wrong idea?

How about training from scratch?

Hi! Thanks for providing such a wonderful work.
I wonder have you tried a ResNet backbone without ImageNet pretraining?
Is it possible that a pre-trained model might become one of the keys of the performance improvement?

annotations files not complete?

Hi, I want to evalute your pretrained model. I download "minival annotation" file and "Person detection results in COCO Minival" file from your link, but it seems that some images are missing. Here are my test codes:

val = json.load(open('./MSCOCO/annotations/person_keypoints_minival2014.json', 'r'))
train = json.load(open('./MSCOCO/annotations/person_keypoints_trainvalminusminival2014.json', 'r'))
det = json.load(open('./dets/person_detection_minival411_human553.json', 'r'))

det_image_ids = set([i['image_id'] for i in det])
val_image_ids = set([i['id'] for i in val['images']])
val_annot_ids = set([i['id'] for i in val['annotations']])
val_annot_img_ids = set([i['image_id'] for i in val['annotations']])

print(len(val_image_ids & val_annot_img_ids))
print(len(det_image_ids & val_annot_img_ids))

output is

2693
2692

it seems that many annotatated images(ground truth) do not exist in your detection result file and your minival. What is the reason? Thanks.

How about the training details?

Thanks you very much for your work!
And could you tell me the details about your training:

  • How about the batch size setting in your training. Unfortunately, my GPU can only set 16, 24 is Out Of Memory, and I get 72.5 (yours is 72.9) AP in the COCO minival dataset. And, I think the larger batch size can get better performance.
  • How many GPUs used and how much time spent for your training ? I used 4 GPUs, spending 3.x days.
  • What's more, how about the memory of your one GPU card?

Thanks!

About saving model during training

The original code save model after every one ecpoh. What if I want to save model only when validation loss is smaller? How can I add evaluation code during training? Thanks.

More training detatils

How many epochs did you use to train the models? Did you train the different models using different epochs? I read your paper. You said, the learning rate is decreased by a factor of 2 every 10 epcohs. Is this learning strategy used for every model? Or is there some difference between training different models?

something about continue_train

Hi, if I want to finetune the 350epoch-snapshot.ckpt model on my own keypoints data, if I need to set the cfg variable "continue_train" to True?

How about the training detail?

Thank you very much for your wonderful work.
I trained resnet50.256x192 model and used the default settings. And here is my performance in the COCO minival dataset:
image

I did not know why the gap is so huge. I trained the model with 4 1080 gpu.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.