GithubHelp home page GithubHelp logo

vamosc / caphuman Goto Github PK

View Code? Open in Web Editor NEW
79.0 10.0 2.0 8.07 MB

[CVPR2024] CapHuman: Capture Your Moments in Parallel Universes

Home Page: https://caphuman.github.io

License: Other

Python 98.23% C++ 0.16% Cuda 1.17% Shell 0.45%
cvpr2024 diffusion diffusion-models image-synthesis personalization portrait-generation

caphuman's Introduction

License arXiv

CapHuman: Capture Your Moments in Parallel Universes

[Paper] [Project Page]

This is the repository for the paper CapHuman: Capture Your Moments in Parallel Universes.

Chao LiangFan MaLinchao ZhuYingying DengYi Yang

We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, and facial expressions in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the ``encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines.

🎏 News

  • [2024/04/26] We release the code and checkpoint.
  • [2024/02/27] Our paper is accepted by CVPR2024.
  • [2024/02/01] We release the Project Page.

🔨 Installation

Dependency

conda create -n caphuman python=3.7
pip install -r requirements.txt

Follow INSTALL to install pytorch3d (e.g. 0.7.4, 0.7.6). We provide the whl file.

We provide the script to download data and models conveniently.

bash tools/setup.sh

Otherwise, follow adobe-research/diffusion-rig for DECA setup.

data/
  deca_model.tar
  generic_model.pkl
  FLAME_texture.npz
  fixed_displacement_256.npy
  head_template.obj
  landmark_embedding.npy
  mean_texture.jpg
  texture_data_256.npy
  uv_face_eye_mask.png
  uv_face_mask.png

And, download our checkpoint caphuman.ckpt, vae-ft-mse-840000-ema-pruned.ckpt, Realistic_Vision_V3.0.ckpt, 79999_iter.pth and put them into ckpts.

ckpts/
  face-parsing/
    79999_iter.pth
  caphuman.ckpt
  Realistic_Vision_V3.0.ckpt
  vae-ft-mse-840000-ema-pruned.ckpt

Note: you can download comic-babes, disney-pixar-cartoon-type-a, toonyou for different styles.

📸 Inference

python inference.py --ckpt ckpts/caphuman.ckpt --vae_ckpt ckpts/vae-ft-mse-840000-ema-pruned.ckpt --model models/cldm_v15.yaml --sd_ckpt ckpts/Realistic_Vision_V3.0.ckpt --input_image examples/input_images/196251.png --pose_image examples/pose_images/pose1.png --prompt "a photo of a man wearing a suit in front of Space Needle"

Note: you can replace the sd backbone for different styles, e.g. --sd_ckpt disneyPixarCartoon_v10.safetensors.

If you prefer gradio, you can try the following command:

python -m gradios.gradio_visualization --ckpt ckpts/caphuman.ckpt --vae_ckpt ckpts/vae-ft-mse-840000-ema-pruned.ckpt --model models/cldm_v15.yaml --sd_ckpt ckpts/Realistic_Vision_V3.0.ckpt

If you are familiar with stable-diffusion-webui, please refer to the extension sd-webui-controlnet. Note: we make some modifications to support CapHuman.

📎 Citation

@inproceedings{liang2024caphuman,
  author={Liang, Chao and Ma, Fan and Zhu, Linchao and Deng, Yingying and Yang, Yi},
  title={CapHuman: Capture Your Moments in Parallel Universes}, 
  booktitle={CVPR},
  year={2024}
}

⚠️ License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

🙏 Acknowledgements

We sincerely thank Zongxin Yang for valuable discussions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.