GithubHelp home page GithubHelp logo

xiexh20 / vistracker Goto Github PK

View Code? Open in Web Editor NEW
66.0 4.0 2.0 12.77 MB

Official implementation for the CVPR'23 paper: Visibility Aware Human-Object Interaction Tracking from Single RGB Camera

Home Page: http://virtualhumans.mpi-inf.mpg.de/VisTracker/

Python 99.72% Shell 0.28%
3d-reconstruction human-object-interaction

vistracker's Introduction

VisTracker (CVPR'23)

Official implementation for the CVPR 2023 paper: Visibility Aware Human-Object Interaction Tracking from Single RGB Camera

[ArXiv] [Project Page]

teaser

Please also check our old ECCV'22 work CHORE here.

Contents

  1. Dependencies
  2. Dataset preparation
  3. Run demo
  4. Training
  5. Evaluation
  6. Citation
  7. Acknowledgements
  8. License

Dependencies

The code is tested with torch 1.6, cuda10.1, debian 11. The environment setup is the same as CHORE, ECCV'22. Please follow the instructions here.

Dataset preparation

We work on the extended BEHAVE dataset, to have the dataset ready, you need to download some files and run some processing scripts to prepare the data. All files are provided in this webpage.

  1. Download the video files: color videos of test sequences, frame time information.
  2. Extract RGB images: follow this script from BEHAVE dataset repo to extract RGB images. Please enable -nodepth tag to extract RGB images only. Example: python tools/video2images.py /BS/xxie-3/static00/rawvideo/Date03/Date03_Sub03_chairwood_hand.0.color.mp4 /BS/xxie-4/static00/behave-fps30/ -nodepth
  3. Download human and object masks: masks for all test sequences. Download and unzip them into one folder.
  4. Rename the mask files to follow the BEHAVE dataset structure: python tools/rename_masks.py -s SEQ_FOLDER -m MASK_ROOT Example: python tools/rename_masks.py -s /BS/xxie-4/static00/behave-fps30/Date03_Sub03_chairwood_hand -m /BS/xxie-5/static00/behave_release/30fps-masks-new/
  5. Download openpose and FrankMocap detections: packed data for test sequences
  6. Process the packed data to BEHAVE dataset format: python tools/pack2separate.py -s SEQ_FOLDER -p PACKED_ROOT. Example: python tools/pack2separate.py -s /BS/xxie-4/static00/behave-fps30/Date03_Sub03_chairwood_hand -p /scratch/inf0/user/xxie/behave-packed

Run demo

You can find all the commands of the pipeline in scripts/demo.sh. To run it, you need to download the pretrained models from here and unzip them in the folder experiments.

Also, the dataset files should be prepared as described above. For convenience, we prepare example data of one sequence in this file. You can download it and extract to some directory and then modify EXTTENDED_PATH, GT_PACKED in PATHS.yml accordingly.

You also need to download the SMPL-H model from the official website. We use the manov1.2 model for the SMPL-H model.

Once done, you can run the demo for one sequence simply by:

bash scripts/demo.sh SEQ_FOLDER 

example: bash scripts/demo.sh /BS/xxie-4/static00/test-seq/Date03_Sub03_chairwood_hand

It will take around 6~8 hours to finish a sequence of 1500 frames (50s).

Tips: the runtime bottlenecks are the SMPL-T pre-fitting (step 1-2) and joint optimization (step 6) in scripts/demo.sh. If you have a cluster with multiple GPU machines, you can run multiple sequences/jobs in parallel by specifying the --start and --end option for these commands. This will separate one long sequence into several chunks and each job only optimizes the chunk specified by start and end frames.

Training

Train a SIF-Net model:

python -m torch.distributed.launch --nproc_per_node=NUM_GPU --master_port 6789 --use_env train_launch.py -en tri-vis-l2

Note that to train this model, you also need to prepare the GT registrations (meshes) in order to run online boundary sampling during training. We provide an example script to save SMPL and object meshes from packed parameters: python tools/pack2separate_params.py -s SEQ_FOLDER -p PACKED_PATH, similar to tools/pack2separate.py. The packed training data for this can be downloaded from here (part1) and here (part2)

In addition, the split files, frame times and visibility information should also be downloaded from here and extracted in the subfolder splits.

Train motion infill model:

python -m torch.distributed.launch --nproc_per_node=NUM_GPU --master_port 6787 --use_env train_mfiller.py -en cmf-k4-lrot

For this, you need to specify the path to all packed GT files downloaded from the link mentioned above. i.e.: train part1, train part 2, test seqs.

Evaluation

python recon/eval/evalvideo_packed.py -split splits/behave-test-30fps.json -sn RECON_NAME -m ours -w WINDOW_SIZE

where RECON_NAME is your own save name for the reconstruction, and WINDOW_SIZE is the alignment window size (main paper Sec. 4). WINDOW_SIZE=1 is equivalent to the evaluation used by CHORE.

Citation

If you use our code, please cite:

@inproceedings{xie2023vistracker,
title = {Visibility Aware Human-Object Interaction Tracking from Single RGB Camera},
    author = {Xie, Xianghui and Bhatnagar, Bharat Lal and Pons-Moll, Gerard },
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    month={June}, 
    year={2023} 
}

If you use BEHAVE dataset, please also cite:

@inproceedings{bhatnagar22behave,
    title = {BEHAVE: Dataset and Method for Tracking Human Object Interactions},
    author={Bhatnagar, Bharat Lal and Xie, Xianghui and Petrov, Ilya and Sminchisescu, Cristian and Theobalt, Christian and Pons-Moll, Gerard},
    booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {jun},
    organization = {{IEEE}},
    year = {2022},
    }

Acknowledgements

This project leverages the following excellent works, we thank the authors for open-sourcing their code:

FrankMocap

Openpose

SmoothNet

Conditional motion infilling

Interactive segmentation

Video segmentation

DetectronV2

License

Copyright (c) 2023 Xianghui Xie, Max-Planck-Gesellschaft

Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").

The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.

Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the Visibility Aware Human-Object Interaction Tracking from Single RGB Camera paper in documents and papers that report on research using this Software.

vistracker's People

Contributors

xiexh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vistracker's Issues

Using different videos

I am trying to use this model on different footage than the one provided and was curious what steps would be required to prepare the video for the model to process it. In the demo data provided, there is a folder "calibs". Is this the calibrations of the kinects used? What would I need to replace that with if I recorded myself from just one camera? Thank you for your time.

About implementation details on InterCap dataset.

Thanks for your great work!

In your paper, "Visibility Aware Human-Object Interaction Tracking from Single RGB Camera", you train the model on InterCap dataset.

We train our model on sequences from subject 01-08 (173 sequences) and test on sequences from subject 09-10 (38 sequences).

After I download the InterCap dataset, I found there are 40 sequences from subject 09-10. Did you skip some sequences? Would you tell me why you skipped some sequences?

Some test sequences cannot found

Thanks for your great work!

I'm currently running your codes for the evaluation on BEHAVE dataset.
I downloaded all files following the instruction (https://github.com/xiexh20/VisTracker#dataset-preparation)

But, I cannot find below test sequences of 'behave-test-30fps.json' in provided data:
"Date03_Sub04_boxtiny_part2", "Date03_Sub04_yogaball_play2", "Date03_Sub05_chairwood_part2"

Is there additional data to download? Or do I need some other pre-processing?

Here is the test sequences of 'behave-test-30fps.json'.

{
  "seqs": [
    "Date03_Sub03_backpack_back",
    "Date03_Sub03_backpack_hand",
    "Date03_Sub03_backpack_hug",
    "Date03_Sub03_boxlarge",
    "Date03_Sub03_boxlong",
    "Date03_Sub03_boxmedium",
    "Date03_Sub03_boxsmall",
    "Date03_Sub03_boxtiny",
    "Date03_Sub03_chairblack_hand",
    "Date03_Sub03_chairblack_lift",
    "Date03_Sub03_chairblack_sit",
    "Date03_Sub03_chairblack_sitstand",
    "Date03_Sub03_chairwood_hand",
    "Date03_Sub03_chairwood_lift",
    "Date03_Sub03_chairwood_sit",
    "Date03_Sub03_monitor_move",
    "Date03_Sub03_plasticcontainer",
    "Date03_Sub03_stool_lift",
    "Date03_Sub03_stool_sit",
    "Date03_Sub03_suitcase_lift",
    "Date03_Sub03_suitcase_move",
    "Date03_Sub03_tablesmall_lean",
    "Date03_Sub03_tablesmall_lift",
    "Date03_Sub03_tablesmall_move",
    "Date03_Sub03_tablesquare_lift",
    "Date03_Sub03_tablesquare_move",
    "Date03_Sub03_tablesquare_sit",
    "Date03_Sub03_toolbox",
    "Date03_Sub03_trashbin",
    "Date03_Sub03_yogaball_play",
    "Date03_Sub03_yogaball_sit",
    "Date03_Sub03_yogamat",
    "Date03_Sub04_backpack_back",
    "Date03_Sub04_backpack_hand",
    "Date03_Sub04_backpack_hug",
    "Date03_Sub04_boxlarge",
    "Date03_Sub04_boxlong",
    "Date03_Sub04_boxmedium",
    "Date03_Sub04_boxsmall",
    "Date03_Sub04_boxtiny",
    "Date03_Sub04_boxtiny_part2",
    "Date03_Sub04_chairblack_hand",
    "Date03_Sub04_chairblack_liftreal",
    "Date03_Sub04_chairblack_sit",
    "Date03_Sub04_chairwood_hand",
    "Date03_Sub04_chairwood_lift",
    "Date03_Sub04_chairwood_sit",
    "Date03_Sub04_monitor_hand",
    "Date03_Sub04_monitor_move",
    "Date03_Sub04_plasticcontainer_lift",
    "Date03_Sub04_stool_move",
    "Date03_Sub04_stool_sit",
    "Date03_Sub04_suitcase_ground",
    "Date03_Sub04_suitcase_lift",
    "Date03_Sub04_tablesmall_hand",
    "Date03_Sub04_tablesmall_lean",
    "Date03_Sub04_tablesmall_lift",
    "Date03_Sub04_tablesquare_hand",
    "Date03_Sub04_tablesquare_lift",
    "Date03_Sub04_tablesquare_sit",
    "Date03_Sub04_toolbox",
    "Date03_Sub04_trashbin",
    "Date03_Sub04_yogaball_play",
    "Date03_Sub04_yogaball_play2",
    "Date03_Sub04_yogaball_sit",
    "Date03_Sub04_yogamat",
    "Date03_Sub05_backpack",
    "Date03_Sub05_boxlarge",
    "Date03_Sub05_boxlong",
    "Date03_Sub05_boxmedium",
    "Date03_Sub05_boxsmall",
    "Date03_Sub05_boxtiny",
    "Date03_Sub05_chairblack",
    "Date03_Sub05_chairwood",
    "Date03_Sub05_chairwood_part2",
    "Date03_Sub05_monitor",
    "Date03_Sub05_plasticcontainer",
    "Date03_Sub05_stool",
    "Date03_Sub05_suitcase",
    "Date03_Sub05_tablesmall",
    "Date03_Sub05_tablesquare",
    "Date03_Sub05_toolbox",
    "Date03_Sub05_trashbin",
    "Date03_Sub05_yogaball",
    "Date03_Sub05_yogamat"
  ]
}

Object template mesh

Great work!

Maybe I am missing something, but how/where is the object template loaded? It's not specified in the dataset preparation section.

Also, does it need to be textured? Are there any other conventions that need to be followed?

Issue about hand pose

Many thanks for your great work!
I've run the demo scripts successfully and generate the video, however I found that the human hand pose is not predicted correctly, it seems that the pose of hand is initial all the time. And I also notice that you just predict 25 body keypoints by openpose without hand keypoints. I wonder if it's caused by Frankmocap or you just not fit the hand pose.
Thanks again and looking forward to your reply!
image

psbody library

Hello everyone!

I'm trying to install all the dependencies, but following the psbody-mesh library READ.me for the requirements, this error appears:

"Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libboost-all-dev is already the newest version (1.74.0.3ubuntu7).
0 upgraded, 0 newly installed, 0 to remove and 42 not upgraded.
The virtual environment was not created successfully because ensurepip is not
available. On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

apt install python3.10-venv
You may need to use sudo with that command. After installing the python3-venv
package, recreate your virtual environment.

Failing command: /content/mesh/psbody-mesh-namespace/my_venv/bin/python3

/bin/bash: line 1: my_venv/bin/activate: No such file or directory
Cloning into 'psbody.mesh'...
fatal: could not read Username for 'https://github.com/': No such device or address
[Errno 2] No such file or directory: 'psbody.mesh'
/content/mesh/psbody-mesh-namespace
make: *** No rule to make target 'all'. Stop."

Can anybody help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.