GithubHelp home page GithubHelp logo

dclm / instantid Goto Github PK

View Code? Open in Web Editor NEW

This project forked from instantx-research/instantid

0.0 0.0 0.0 107.32 MB

InstantID : Zero-shot Identity-Preserving Generation in Seconds ๐Ÿ”ฅ

Home Page: https://instantid.github.io/

License: Apache License 2.0

instantid's Introduction

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds

We are currently organizing code and pre-training checkpoints, which will be available soon! Please don't hesitate to star our work.

Abstract

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin. Our codes and pre-trained checkpoints will be available at https://github.com/InstantID/InstantID.

Release

  • [2024/1/15] ๐Ÿ”ฅ We release the technical report.
  • [2023/12/11] ๐Ÿ”ฅ We launch the project page.

Demos

Stylized Synthesis

Comparison with Previous Works

Comparison with existing tuning-free state-of-the-art techniques. Specifically, we compare with IP-Adapter (IPA), IP-Adapter-FaceID, and recent PhotoMaker. Among them, PhotoMaker needs to train the LoRA parameters of UNet. It can be seen that both PhotoMaker and IP-Adapter-FaceID achieves good fidelity, but there is obvious degradation of text control capabilities. In contrast, InstantID achieves better fidelity and retain good text editability (faces and styles blend better).

Comparison of InstantID with pre-trained character LoRAs. We can achieve competitive results as LoRAs without any training.

Comparison of InstantID with InsightFace Swapper (also known as ROOP or Refactor). However, in non-realistic style, our work is more flexible on the integration of face and background.

Code

We are working with diffusers team and will release the code before the end of January. Starring our work will definitely speed up the process. No kidding!

Cite

If you find InstantID useful for your research and applications, please cite us using this BibTeX:

@misc{wang2024instantid,
        title={InstantID: Zero-shot Identity-Preserving Generation in Seconds}, 
        author={Qixun Wang and Xu Bai and Haofan Wang and Zekui Qin and Anthony Chen},
        year={2024},
        eprint={2401.07519},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
}

instantid's People

Contributors

researcherxman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.