yumingj / talk-to-edit Goto Github PK

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Home Page: https://www.mmlab-ntu.com/project/talkedit/

Python 95.65% C++ 0.48% Cuda 3.88%

talk-to-edit's Introduction

Talk-to-Edit (ICCV2021)

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang^∗, Ziqi Huang^∗, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset] [Poster] [Video]

You can try our colab demo here. Enjoy!

Editing with dialog:
Editing without dialog:

Overview

Dependencies and Installation

Clone Repo

git clone [email protected]:yumingj/Talk-to-Edit.git

Create Conda Environment and Install Dependencies
```
conda env create -f environment.yml
conda activate talk_edit
```
- Python >= 3.7
- PyTorch >= 1.6
- CUDA 10.1
- GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

./download/pretrained_models
├── 1024_field
│   ├── Bangs.pth
│   ├── Eyeglasses.pth
│   ├── No_Beard.pth
│   ├── Smiling.pth
│   └── Young.pth
├── 128_field
│   ├── Bangs.pth
│   ├── Eyeglasses.pth
│   ├── No_Beard.pth
│   ├── Smiling.pth
│   └── Young.pth
├── arcface_resnet18_110.pth
├── language_encoder.pth.tar
├── predictor_1024.pth.tar
├── predictor_128.pth.tar
├── stylegan2_1024.pth
├── stylegan2_128.pt
├── StyleGAN2_FFHQ1024_discriminator.pth
└── eval_predictor.pth.tar

You can try pure image editing without dialog instructions:
```
python editing_wo_dialog.py \
   --opt ./configs/editing/editing_wo_dialog.yml \
   --attr 'Bangs' \
   --target_val 5
```
The editing results will be saved in ./results.

You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].
You can also try dialog-based editing, where you talk to the system through the command prompt:
```
python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml
```
The editing results will be saved in ./results.

How to talk to the system:
- Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
- When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
- To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
- To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.
If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.
You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:
```
./download/train_data
├── 1024
│   ├── Bangs
│   ├── Eyeglasses
│   ├── No_Beard
│   ├── Smiling
│   └── Young
└── 128
    ├── Bangs
    ├── Eyeglasses
    ├── No_Beard
    ├── Smiling
    └── Young
```

We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

./download/editing_data
├── 1024
│   ├── Bangs.npz.npy
│   ├── Eyeglasses.npz.npy
│   ├── No_Beard.npz.npy
│   ├── Smiling.npz.npy
│   └── Young.npz.npy
└── 128
    ├── Bangs.npz.npy
    ├── Eyeglasses.npz.npy
    ├── No_Beard.npz.npy
    ├── Smiling.npz.npy
    └── Young.npz.npy

All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.
There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.
For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:
```
python train.py --opt ./configs/train/field_128_Bangs.yml
```

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

Use the trained semantic field to edit images.

python editing_quantitative.py \
--opt ./configs/train/field_128_bangs.yml \
--pretrained_path ./download/pretrained_models/128_field/Bangs.pth

Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.
```
python quantitative_results.py \
--attribute Bangs \
--work_dir ./results/field_128_bangs \
--image_dir ./results/field_128_bangs/visualization \
--image_num 148
```

Qualitative Results

CelebA-Dialog Dataset

Our CelebA-Dialog Dataset is available for Download.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
Accompanied with each image, there are captions describing the attributes and a user request sample.

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{jiang2021talk,
  title={Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  author={Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13799--13808},
  year={2021}
}

@article{jiang2023talk,
  title={Talk-to-edit: Fine-grained 2d and 3d facial editing via dialog},
  author={Jiang, Yuming and Huang, Ziqi and Wu, Tianxing and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023},
  publisher={IEEE}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

talk-to-edit's People

Contributors

Stargazers

Watchers

talk-to-edit's Issues

有关属性预测器训练

在https://github.com/yumingj/Talk-to-Edit/issues/5中看到有关自定义数据集的问题，但是这个里面都没有提及到输入数据的结构，我目前想在自定义数据集上训练属性预测器，但是训练数据的具体数据结构是什么样子的，有没有一个样式可供参考一下吗？非常感谢。

Why cannot get expected results?

Hi,
Thanks for your wonderful work. However, when I try to run the demo by using your pretrained models and default config parameters, that is:

python editing_wo_dialog.py \
   --opt ./configs/editing/editing_wo_dialog.yml \
   --attr 'Bangs' \
   --target_val 5

I always get the following results:

This attribute is already at the degree that you want. Let's try a different attribute degree or another attribute.

Sorry, we are unable to edit this attribute. Perhaps we can try something else.

And I can only find the cropped face image and a simple start_image.png in my results folder.

And I also have tried some other attr and target_val combinations and got the above output as well.

I don't know what the problems they are. And I also not sure about the exact meaning about the target_val.

BTW, in your README, you mentioned we can use Beard attribute, but I found it have only No_Beard attribute in your config files.

attr_to_idx:
  Bangs: 0
  Eyeglasses: 1
  No_Beard: 2
  Smiling: 3
  Young: 4

Hope you could offer my help, thanks in advance.

About downloading dataset

Hi,
The link provided in github readme redirects to webpage titled "CelebA-Dialog Dataset".
But the link (zip icon) in the webpage reloads to this github repo.
Is the link for dataset in progress or am i missing something? How can I download the dataset?
Thanks for great work.

How to edit attribute in positive or negative direction

Thanks for your excellent work!
I have 2 questions, could you tell me：

Is there a variable to contral positive or negative direction of attribute? eg: add Beard or no Beard.
Is there a variable to contral edit step of attribute? eg: add little Beard, add a little Beard, add a lot Beard.

Question on attribute predictor (classifiers' outputs and meaning of `attributes_5.json` dictionary)

Hi,

I'm trying to use your pre-trained classifier for the five CelebA attributes that you use (Bangs, Eyeglasses, No_Beard, Smiling, Young). I'm building the model that you provide (the modified ResNet) using attributes_5.json and I load the weights given in eval_predictor.pth.tar.

As far as I can tell, for each of the above five attributes, you have a classification head. For instance, classifier32Smiling which, at the top of it, it has a linear layer with 6 outputs. This is determined by the sub-dictionary

"32": {
            "name": "Smiling",
            "value":[0, 1, 2, 3, 4, 5],
            "idx_scale": 1,
            "idx_bias": 0
        }

found in attributes_5.json. Similarly you build the rest of the classifiers. My question is why you use these value lists above (i.e., "value":[0, 1, 2, 3, 4, 5])? What are those classes?

I'd like to use this model for predicting a score for each of the five attributes for a batch of images. Would you think this is possible?

As a side note, the function that you use for post-processing the predictions, i.e., output_to_label, gives NaNs in many cases. This is due to high values in predictions (in my case), which lead exp(.) to get to Inf, and thus softmax be NaN. Just says, that you could shift the maximum prediction to be zero before calculating softmax.

Thank you!

预训练好的身份识别模型

能提供一下预训练的arcface_resnet18_110.pth模型的链接吗，我在Arcface: Additive angular margin loss for deep
face recognition.的github上好像没有找到训练好的模型。

An error occurs when upfirdn2d() is called

error msg: upfirdn2d(): incompatible function arguments. The following argument types are supported:
1. (arg0: at::Tensor, arg1: at::Tensor, arg2: int, arg3: int, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: int) -> at::Tensor

editing_wo_dialog

请问一下，我对一幅1024的图片（整幅图片基本只有人脸，没有其他的背景）进行编辑的话，为什么之后smiling这个属性能进行编辑，而对其他属性进行编辑会报错“ Sorry, we are unable to edit this attribute. Perhaps we can try something else”，请问一下这个是检测器的问题还是哪部分的问题？另外，如果我将这幅图片resize为128的话，倒是基本每个属性都能进行编辑。

训练自定义的数据

您好，感谢你的出色工作。我想问一下，如果我要使用自己的数据集从头训练一个模型，我都需要做哪些工作？1.先训练一个stylegan2模型；2.训练一个预测器；3.训练Talk-to-Edit；是这样的顺序吗？

How to encode the text information into the semantic field

I revise the code and i am wondering whether to encode the text-guided feature into the semantic field so that it can be modified to a cross-modal semantic field conditioned on the text.

Not working on real images

Thank you for the great work. I'm currently trying to use "editting_wo_dialog" on real images but the algorithm doesn't produce the results! I use exactly the same setting and models you provided. I just only change the input image. I get the following message at the end:

2021-10-25 12:42:50,212.212 - INFO: Sample 000 is already at the target class, skip.
2021-10-25 12:42:50,212.212 - INFO: This attribute is already at the degree that you want. Let's try a different attribute degree or another attribute.

sometimes it also gives something like this:

total: 0.2573; perceptual: 0.2314; mse: 0.0259; lr: 0.0000: 100%|█| 600/600 [00:59<00:00, 10
2021-10-25 12:40:12,199.199 - INFO: Sorry, we are unable to edit this attribute. Perhaps we can try something else.

Here I upload the input and output for your reference.

How can I run the code properly on real images?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.