Comments (1)
Hi @Taited
Thanks for your interest in our work!!
As stated in #15:
In Figure 2 of the paper, you can see that the textual prompt q is a simple, predefined prompt like "a photo of a model wearing a dress," "a photo of a model wearing a lower body garment," or "a photo of a model wearing an upper body garment." This prompt serves as a starting point for the diffusion process. It is not tailored to each specific image in the dataset; rather, it provides a general direction for the model to follow during the virtual try-on task.
We then use the textual inversion adapter$F_{\theta}$ to predict the pseudo-word embeddings associated with that specific garment. Finally, we condition the denoising network using the features extracted from the concatenation of the generic prompt plus the predicted pseudo-word embeddings.
However, in 2nd row of Table 4, we also provide an experiment without the textual inversion technique but using a textual description of the in-shop garment. You can find the textual description of each garment in the data/noun_chunks
folder.
To extract these textual descriptions we follow the approach described in https://arxiv.org/abs/2304.02051
Alberto
from ladi-vton.
Related Issues (20)
- What is your GPU type and How long you training
- try on tattoos
- How to get image "image-parse-v3" HOT 1
- Out of memory when training the emasc module
- Issue with training VTO & Inversion Adapter HOT 1
- how to using the Stable Diffusion VAE without using the denoising network HOT 1
- Could I ask you for some advice?
- How can i run this project with m1 mac
- Ask for training gpu spec
- What data will affect inference results of VITON-HD? HOT 1
- How to make inference on only one photo not just the whole dataset? HOT 4
- Question about ablation study
- How about to maintain details when warping?
- VAE with intermediate features takes up more GPU memory than original VAE HOT 2
- Bad Result on custom image from DressCode Dataset HOT 5
- How to use this on custom dataset?
- hub
- 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask' HOT 1
- Batch images not being processed by AutoEncoder
- DressCode Garment Reconstruction
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ladi-vton.