GithubHelp home page GithubHelp logo

akashsengupta1997 / humaniflow Goto Github PK

View Code? Open in Web Editor NEW
67.0 4.0 2.0 16.18 MB

[CVPR 2023] Code repository for HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation

License: MIT License

Python 100.00%
3d-human-pose 3d-human-shape-and-pose-estimation computer-vision normalizing-flows cvpr cvpr2023 smpl

humaniflow's Introduction

HuManiFlow

Code repository for the paper:
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
Akash Sengupta, Ignas Budvytis, Roberto Cipolla
CVPR 2023
[paper+supplementary][video]

This paper presents a probabilistic approach to 3D human shape and pose estimation, which aims to improve sample-input consistency and sample diversity over contemporary methods.

teaser method_fig

This repository contains inference, training and evaluation code. A few weaknesses of this approach, and future research directions, are listed below. If you find this code useful in your research, please cite the following publication:

@InProceedings{sengupta2023humaniflow,
               author = {Sengupta, Akash and Budvytis, Ignas and Cipolla, Roberto},
               title = {{HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation}},
               booktitle = {CVPR},
               month = {June},
               year = {2023}                         
}

Installation

Requirements

  • Linux or macOS
  • Python ≥ 3.7

Instructions

First clone the repo:

git clone https://github.com/akashsengupta1997/HuManiFlow.git

We recommend using a virtual environment to install relevant dependencies:

python3.8 -m venv HuManiFlow_env
source HuManiFlow_env/bin/activate

Install torch and torchvision (the code has been tested with v1.9.0 of torch), as well as other dependencies:

pip install torch==1.9.0 torchvision==0.10.0
pip install -r requirements.txt

Finally, install pytorch3d, which we use for data generation during training and visualisation during inference. To do so, you will need to first install the CUB library following the instructions here. Then you may install pytorch3d - note that the code has been tested with v0.7.1 of pytorch3d, and we recommend installing this version using:

pip install "git+https://github.com/facebookresearch/[email protected]"

Model files

You will need to download SMPL model files from here. The neutral model is required for training and running the demo code. If you want to evaluate the model on datasets with gendered SMPL labels (such as 3DPW and SSP-3D), you should also download the male and female models. You may need to convert the SMPL model files to be compatible with Python 3 by removing any chumpy objects. To do so, please follow the instructions here.

Download pre-trained model checkpoints for our 3D Shape/Pose network, as well as for 2D Pose HRNet-W48 from here.

Place the SMPL model files and network checkpoints in the model_files directory, which should have the following structure. If the files are placed elsewhere, you will need to update configs/paths.py accordingly.

HuManiFlow
├── model_files                           # Folder with model files
│   ├── smpl
│   │   ├── SMPL_NEUTRAL.pkl              # Gender-neutral SMPL model
│   │   ├── SMPL_MALE.pkl                 # Male SMPL model
│   │   ├── SMPL_FEMALE.pkl               # Female SMPL model
│   ├── humaniflow_weights.tar            # HuManiFlow checkpoint
│   ├── pose_hrnet_w48_384x288.pth        # Pose2D HRNet checkpoint
│   ├── cocoplus_regressor.npy            # Cocoplus joints regressor
│   ├── J_regressor_h36m.npy              # Human3.6M joints regressor
│   ├── J_regressor_extra.npy             # Extra joints regressor
│   └── UV_Processed.mat                  # DensePose UV coordinates for SMPL mesh
└── ...

Inference

scripts/run_predict.py is used to run inference on a given folder of input images. For example, to run inference on the demo folder, do:

python scripts/run_predict.py --image_dir assets/demo_images/ --save_dir pred_output/ -VS -VU -VXYZ

This will first detect human bounding boxes in the input images using Mask-RCNN. If your input images are already cropped and centred around the subject of interest, you may skip this step using --cropped_images as an option. To greatky increase inference speed, remove the options -VS -VXYZ to skip directional variance and sample visualisation, and only render the predicted pose and shape point estimate.

scripts/run_optimise.py is used to run post-inference optimisation, with:

python scripts/run_optimise.py --pred_image_dir assets/demo_images/ --pred_output_dir pred_output/ --save_dir opt_output/

This minimises reprojection error between 2D keypoints and the 3D point estimate. The predicted pose and shape distribution is used as an image-conditioned prior, to guide the optimisation process.

Evaluation

scripts/run_evaluate.py is used to evaluate our method on the 3DPW and SSP-3D datasets. A description of the metrics used to evaluate predicted distributions is given in metrics/eval_metrics_tracker.py, as well as in the paper.

Download SSP-3D from here. Update configs/paths.py with the path pointing to the un-zipped SSP-3D directory. Evaluate on SSP-3D with:

python scripts/run_evaluate.py -D ssp3d -B 32 -N 100

To change the number of samples used for sample-based distribution evaluation metrics, update the -N argument. Using more samples will give better measures of sample-input consistency and sample diversity, but will slow down evaluation.

Download 3DPW from here. You will need to preprocess the dataset first, to extract centred+cropped images and SMPL labels (adapted from SPIN):

python data/pw3d_preprocess.py --dataset_path $3DPW_DIR_PATH

This should create a subdirectory with preprocessed files, such that the 3DPW directory has the following structure:

$3DPW_DIR_PATH
      ├── test                                  
      │   ├── 3dpw_test.npz    
      │   ├── cropped_frames   
      ├── imageFiles
      └── sequenceFiles

Additionally, download HRNet 2D joint detections on 3DPW from here, and place this in $3DPW_DIR_PATH/test. Update configs/paths.py with the path pointing to $3DPW_DIR_PATH/test. Evaluate on 3DPW with:

python scripts/run_evaluate.py -D 3dpw -B 32 -N 10

Training

scripts/run_train.py is used to train our method using random synthetic training data (rendered on-the-fly during training).

Download .npz files containing SMPL training/validation body poses and textures from here. Place these files in a ./train_files directory, or update the appropriate variables in configs/paths.py with paths pointing to the these files. Note that the SMPL textures are from SURREAL and MultiGarmentNet.

We use images from LSUN as random backgrounds for our synthetic training data. Specifically, images from the 10 scene categories are used. Instructions to download and extract these images are provided here. The data/copy_lsun_images_to_train_files_dir.py script can be used to copy LSUN background images to the ./train_files directory, which should have the following structure:

train_files
      ├── lsun_backgrounds
          ├── train
          ├── val
      ├── smpl_train_poses.npz
      ├── smpl_train_textures.npz                                  
      ├── smpl_val_poses.npz                                  
      └── smpl_val_textures.npz                                  

Finally, start training with:

python scripts/run_train.py -E experiments/exp_001

As a sanity check, the script should find 91106 training poses, 125 + 792 training textures, 397582 training backgrounds, 33347 validation poses, 32 + 76 validation textures and 3000 validation backgrounds.

Weaknesses and Future Research

The following aspects of our method may be the subject of future research:

  • Sample diversity: 3D samples do not always cover the range of solutions that match an input image - in particular, samples involving limbs bending backwards are rarely obtained.
  • Mesh interpenetrations: this occurs occasionally amongst 3D mesh samples drawn from shape and pose distribution predictions. A sample inter-penetratation penalty may be useful.
  • Non-tight clothing: body shape prediction accuracy suffers when subjects are wearing non-tight clothing, since the synthetic training data does not model clothing in 3D (only uses clothing textures). Perhaps better synthetic data (e.g. AGORA) will alleviate this issue.

TODO

  • Gendered pre-trained models for improved shape estimation

Acknowledgments

Code was adapted from/influenced by the following repos - thanks to the authors!

humaniflow's People

Contributors

akashsengupta1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

dylangh faresmlk

humaniflow's Issues

Using model with multiple images or video

What is the best way to update the code to work on multiple images or a video? I attempted to use VideoCapture on a gif file to read each frame. However, I am having difficulty appending each image and heatmap together to be fed into the model.

This is in the predict_humaniflow.py script:

for image_fname in tqdm(sorted([f for f in os.listdir(image_dir)])):
        with torch.no_grad():
            # Capture video from file
            cap = cv2.VideoCapture(os.path.join(image_dir, image_fname))
            # Capture frame-by-frame
            ret, frame = cap.read()
            frames = []
            while ret:
                # ------------------------- INPUT LOADING AND PROXY REPRESENTATION GENERATION -------------------------
                image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

                ...............

                frames.append(torch.cat([proxy_rep_img, proxy_rep_heatmaps], dim=1))
                ret, frame = cap.read()
                if not ret:
                    break
            
            cap.release()
            cv2.destroyAllWindows()
            proxy_rep_input = torch.cat([x.float() for x in frames], dim=1).float()  # (1, 18, img_wh, img_wh)

Background images for training

Hi! Thanks for releasing the code!
I'm trying to reproduce the training and thus need to gather all the training and validation backgrounds. Following your description, I used the lsun repo to download and extract the backgrounds. However, now I'm struggling with 1) selecting the "correct" background images and 2) converting them to the right format and to the right location.
The provided script data/copy_lsun_images_to_train_files_dir.py doesn't work for me, since I guess the directory structure isn't correct after extracting the images. E.g. "bedroom_train_lmdb" extracts images to e.g. ./f/8/8/1/2/2/*.webp which isn't compatible with your script. Your script also only looks for .jpg files. Furthermore, I don't know how to select the mentioned 397582 training backgrounds, since e.g. "bedroom_train_lmdb" alone has over 3mio images. Would be grateful for any help!

Model performance

Hi, thank you for your interesting work.

I would like to ask what is the performance of your model, like processing time for single image or fps? since I dont see it's not mentioned in your paper.
Is your model capable of running in real time?
Thank you for your time!

Performance gap

Hi @akashsengupta1997 ,

I tried the evaluation code with your released default checkpoint but found a performance gap between what I got from the code and the performance reported in the paper.

This is the SSD-3D performance I got:
Screenshot 2024-04-09 at 8 34 22 PM

This is the 3DPW performance I got:
Screenshot 2024-04-09 at 8 34 28 PM

The results show a significant difference from what is shown in Table 3 -5 in the paper. Could you help to provide some clarification about it?

Using more than 10 betas

In the humaniflow_config.py file, the number of betas is set to 10. If I adjust the betas, the program returns an error. Is it possible to update other parameters to use more than 10 betas, or does the model need to be re-trained to complete this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.