GithubHelp home page GithubHelp logo

mlpc-ucsd / tokencompose Goto Github PK

View Code? Open in Web Editor NEW
81.0 3.0 2.0 137.3 MB

(CVPR 2024) 🧩 TokenCompose: Grounding Diffusion with Token-level Supervision

Home Page: https://mlpc-ucsd.github.io/TokenCompose/

License: Other

Python 10.51% Shell 0.57% Jupyter Notebook 88.92%
computer-vision diffusion-models generative-ai machine-learning text-to-image artificial-intelligence latent-diffusion multimodal stable-diffusion image-generation

tokencompose's People

Contributors

jamessand avatar zwcolin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sorokinvld whuhxb

tokencompose's Issues

GPU memory usage

Hi, thank you for this great work. I wonder which gpu did you use in your experiments? I'm having OOM on V100 when the unet is equipped with the controller.

Why batch size could only be 1?

Hi, congrats on your great work!

I wonder why batch size could only be set to 1. It seems to me that controller could store attention map for a larger batch. Is this due to the memory cost or any other reason?

SDXL Training

Well, congrats! I'm really impressed with the generations being so well grounded without any extra network or anything !
And you only finetuned SD1.4 for 24K steps ? That was a pretty fast training right ?

Have you considered training SDXL ? I didn't see anything about it in your paper !
Do you have recommendations on the parameters to use for training SDXL ?

compatibility with lora

Hi, I wonder if you have tried using LoRA to finetune the model? It should need less gpu memory than the current full finetuning strategy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.