GithubHelp home page GithubHelp logo

tongxin-wang / texfit Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 0.0 2.53 MB

Code release for TexFit: Text-Driven Fashion Image Editing with Diffusion Models (AAAI 2024)

Home Page: https://texfit.github.io/

License: MIT License

Python 99.52% Shell 0.48%

texfit's Introduction

TexFit: Text-Driven Fashion Image Editing with Diffusion Models

teaser

TexFit: Text-Driven Fashion Image Editing with Diffusion Models (AAAI 2024)

Abstract: Fashion image editing aims to edit an input image to obtain richer or distinct visual clothing matching effects. Existing global fashion image editing methods are difficult to achieve rich outfit combination effects while local fashion image editing is more in line with the needs of diverse and personalized outfit matching. The local editing techniques typically depend on text and auxiliary modalities (e.g., human poses, human keypoints, garment sketches, etc.) for image manipulation, where the auxiliary modalities essentially assist in locating the editing region. Since these auxiliary modalities usually involve additional efforts in practical application scenarios, text-driven fashion image editing shows high flexibility. In this paper, we propose TexFit, a Text-driven Fashion image Editing method using diffusion models, which performs the local image editing only with the easily accessible text. Our approach employs a text-based editing region location module to predict precise editing region in the fashion image. Then, we take the predicted region as the generation condition of diffusion models together with the text prompt to achieve precise local editing of fashion images while keeping the rest part intact. In addition, previous fashion datasets usually focus on global description, lacking local descriptive information that can guide the precise local editing. Therefore, we develop a new DFMM-Spotlight dataset by using region extraction and attribute combination strategies. It focuses locally on clothes and accessories, enabling local editing with text input. Experimental results on the DFMM-Spotlight dataset demonstrate the effectiveness of our model.

Setup

Initialize a conda environment named texfit by running:

conda env create -f environment.yaml
conda activate texfit

# install mmcv and mmsegmentation
pip install -U openmim
mim install mmcv==1.2.1
mim install mmsegmentation==0.9.0

And then initialize an πŸ€—Accelerate environment with:

accelerate config

Data Preparation

You need to download DFMM-Spotlight dataset from Google Drive and unzip to your own path /path/to/DFMM-Spotlight. The dataset folder structure should be as follows:

DFMM-Spotlight
β”œβ”€β”€ train_images
β”‚Β Β  β”œβ”€β”€ MEN-Denim-id_00000080-01_7_additional.png
β”‚Β Β  β”œβ”€β”€ .......
β”‚Β Β  └── WOMEN-Tees_Tanks-id_00007979-04_4_full.png
β”œβ”€β”€ test_images
β”‚Β Β  β”œβ”€β”€ MEN-Denim-id_00000089-03_7_additional.png
β”‚Β Β  β”œβ”€β”€ .......
β”‚Β Β  └── WOMEN-Tees_Tanks-id_00007970-01_7_additional.png
β”œβ”€β”€ mask
β”‚Β Β  β”œβ”€β”€ MEN-Denim-id_00000080-01_7_additional_mask_0.png
β”‚Β Β  β”œβ”€β”€ .......
β”‚Β Β  └── WOMEN-Tees_Tanks-id_00007979-04_4_full_mask_0.png
└── mask_ann
 Β Β  β”œβ”€β”€ train_ann_file.jsonl
 Β Β  └── test_ann_file.jsonl

Training and Inference

Important note: Replace all the /path/to paths in the code and configuration files with real paths.

/path/to paths exist in all the configuration files under the folder configs and dataset/dfmm_spotlight_hf/dfmm_spotlight_hf.py.

Train the ERLM (Stage I)

Train the editing region location module ERLM with the following command:

CUDA_VISIBLE_DEVICES=0 python train_erlm.py --opt ./configs/region_gen.yml

Train the TexFit (Stage II)

Train the local fashion image editing model TexFit with the following command:

bash train_texfit.sh

Local Fashion Image Editing

Once the ERLM and Texfit are trained, you can edit a fashion image locally by running the following command:

CUDA_VISIBLE_DEVICES=0 python pipeline.py \
  --opt ./configs/region_gen.yml \
  --img_path /path/to/your_fashion_image_path \
  --output_path /path/to/edited_image_saving_path \
  --text_prompt the_editing_text_prompt \
  --erlm_model_path /path/to/trained_erlm_model_path \
  --texfit_model_path /path/to/trained_texfit_model_path

For example:

CUDA_VISIBLE_DEVICES=0 python pipeline.py \
  --opt ./configs/region_gen.yml \
  --img_path examples/MEN-Denim-id_00000089-03_7_additional.png \
  --output_path ./example_output.png \
  --text_prompt 'denim blue lower clothing'  \
  --erlm_model_path experiments/region_gen/models/region_generation_epoch55.pth \
  --texfit_model_path sd-model-finetuned/texfit-model

Citation

If you find this paper or the code useful for your research, please consider citing:

@inproceedings{wang2024texfit,
  title={TexFit: Text-Driven Fashion Image Editing with Diffusion Models},
  author={Wang, Tongxin and Ye, Mang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={9},
  pages={10198--10206},
  year={2024}
}

Acknowledgments

Our code is developed based on πŸ€—Diffusers and Text2Human. Thanks for their open source contributions.

texfit's People

Contributors

tongxin-wang avatar

Stargazers

fenghuohuo avatar  avatar  avatar  avatar PJH avatar  avatar  avatar  avatar Yuliu Guo avatar kiWi avatar ZcsrenLongZ avatar

Watchers

Kostas Georgiou avatar  avatar

texfit's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.