GithubHelp home page GithubHelp logo

yixin-shen-1218 / language2pose Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chahuja/language2pose

0.0 0.0 0.0 449 KB

Language2Pose: Natural Language Grounded Pose Forecasting

Home Page: http://chahuja.com/language2pose

License: MIT License

Python 100.00%

language2pose's Introduction

Language2Pose:Natural Language Grounded Pose Forecasting

There are 5 steps to running this code

  • Python Virtual Environment and dependencies
  • Data download and preprocessing
  • Training
  • Sampling
  • Rendering

PS: The implementation of one of the baselines, proposed by Lin et al.[1], was not publicly available and hence we make use of our implementation of their model to generate all the results and animations marked as Lin et al. Due to the differences in training hyperparameters, dataset and experiments, the numbers reported for Lin et al. in our paper differ from the ones in the original paper [1].

PS: This repo, at the moment, is functional at best. Feel free to create issues/pull requests however you see fit.


Python Virtual Environment

Anaconda is recommended to create the virtual environment

conda create -f env.yaml
source activate torch

To handle the logistics of saving/loading models pycasper is used

git clone https://github.com/chahuja/pycasper
cd src 
ln -s ../pycasper/pycasper .
cd ..

Data

Download

We use KIT Motion-Language Dataset which can be downloaded here

wget https://motion-annotation.humanoids.kit.edu/downloads/4/2017-06-22.zip
mkdir dataset/kit-mocap
unzip 2017-06-22.zip -d dataset/kit-mocap
rm 2017-06-22.zip 

Download Word2Vec binaries

Download the binary file here and place it in src/s2v

Pre-trained Models

Download pretrained models here and place it in src/save

Preprocessing

python data/data.py -dataset KITMocap -path2data ../dataset/kit-mocap

Rendering Ground Truths

python render.py -dataset KITMocap -path2data ../dataset/kit-mocap/new_fke -feats_kind fke

Calculating mean+variance for Z-Normalization

python dataProcessing/meanVariance.py -mask '[0]' -feats_kind rifke -dataset KITMocap -path2data ../dataset/kit-mocap -f_new 8

Training

We train the models using a script train_wordConditioned.py (Pardon the misnomer; initially it was supposed to be word conditioned pose forecasting but then I ended up adding sentence conditioned pose forecasting as well and was too lazy to change the filename.)

All the arguments (and their corresponding help texts) used for training can be found in src/argsUtils.py (PS: Some of them might be deprecated, but I have not removed them in case it breaks any of the other code that I might have written in the experimentation phase. Please raise an issue/ or send me an email if you have any clarification questions about any of the arguments). It would be good to stick to the args used in the examples if you want to play with the models in the paper.

  • JL2P
python train_wordConditioned.py -batch_size 100 -cpk jl2p -curriculum 1 -dataset KITMocap -early_stopping 1 -exp 1 -f_new 8 -feats_kind rifke -losses "['SmoothL1Loss']" -lr 0.001 -mask "[0]" -model Seq2SeqConditioned9 -modelKwargs "{'hidden_size':1024, 'use_tp':False, 's2v':'lstm'}" -num_epochs 1000 -path2data ../dataset/kit-mocap -render_list subsets/render_list -s2v 1 -save_dir save/model/ -tb 1 -time 16 -transforms "['zNorm']" 

-modelKwargs need some explaination as they could vary based on the model

hidden_size: size of the joint embedding
use_tp: use a trajectory predictor [1]. False for JL2P models
s2v: sentence to vector model ('lstm' or 'bert')
  • Our Implementation of Lin et. al. [1]
python train_seq2seq.py -batch_size 100 -cpk lin -curriculum 0 -dataset KITMocap -early_stopping 1 -exp 1 -f_new 8 -feats_kind rifke -losses "['MSELoss']" -lr 0.001 -mask "[0]" -model Seq2Seq -modelKwargs "{'hidden_size':1024, 'use_tp':True, 's2v':'lstm'}" -num_epochs 1000 -path2data ../dataset/kit-mocap -render_list subsets/render_list -s2v 1 -save_dir save/model -tb 1 -time 16 -transforms "['zNorm']"

This model has 2 training steps. train_seq2seq.py uses a seq2seq model to first learn an embedding for pose sequences. Once the training is complete, train_wordConditioned.py is called which optimizes to map from language embeddings to pose embeddings.


Sampling

Sampling from trained Models

The training scripts will sample after the stopping criterion has reached, but if you would like to manually sample run the following script

python sample_wordConditioned.py -load <path-to-weights.p>

<path-to-weights.p> ends in _weights.p

Using Pretrained Models

Make sure you have downloaded the pre-trained models as described here.

  • JL2P
python sample_wordConditioned.py -load save/jl2p/exp_726_cpk_jointSampleStart_model_Seq2SeqConditioned9_time_16_chunks_1_weights.p
  • Our Implementation for Lin et. al. [1]
python sample_wordConditioned.py -load save/lin-et-al/exp_700_cpk_mooney_model_Seq2SeqConditioned10_time_16_chunks_1_weights.p 

Rendering

After sampling, it would be nice to see what animation does the model generates. We only use the test samples for rendering.

If possible, use a machine with many cpu cores, as rendering animations on matplotlib is painfully slow. render.py uses all the available cores for parallel processing.

Using your trained model

python render.py -dataset KITMocap -load <path-to-weights.p> -feats_kind fke -render_list subsets/render_list

Using pre-trained Models

  • JL2P
python render.py -dataset KITMocap -load save/jl2p/exp_726_cpk_jointSampleStart_model_Seq2SeqConditioned9_time_16_chunks_1_weights.p -feats_kind fke -render_list subsets/render_list
  • Our Implementation for Lin et. al. [1]
python render.py -dataset KITMocap -load save/lin-et-al/exp_700_cpk_mooney_model_Seq2SeqConditioned10_time_16_chunks_1_weights.p -feats_kind fke -render_list subsets/render_list

References

[1]: Lin, Angela S., et al. "1. Generating Animated Videos of Human Activities from Natural Language Descriptions." Learning 2018 (2018).

language2pose's People

Contributors

chahuja avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.