GithubHelp home page GithubHelp logo

ankanbhunia / handwriting-transformers Goto Github PK

View Code? Open in Web Editor NEW
155.0 10.0 35.0 67.89 MB

Handwriting-Transformers (ICCV21)

License: MIT License

Python 88.20% Jupyter Notebook 11.80%
handwriting generative-ai handwriting-generation handwriting-synthesis mimic style-transfer

handwriting-transformers's Introduction

⚡ Handwriting Transformers Open in Colab

Project | ArXiv | Paper | Huggingface-demo | Colab-demo

News

Abstract

Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah

Abstract: We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images.

Software environment

  • Python 3.7
  • PyTorch >=1.4

Setup & Training

Please see INSTALL.md for installing required libraries. You can change the content in the file mytext.txt to visualize generated handwriting while training.

Download Dataset files and models from https://drive.google.com/file/d/16g9zgysQnWk7-353_tMig92KsZsrcM6k/view?usp=sharing and unzip inside files folder. In short, run following lines in a bash terminal.

git clone https://github.com/ankanbhunia/Handwriting-Transformers
cd Handwriting-Transformers
pip install --upgrade --no-cache-dir gdown
gdown --id 16g9zgysQnWk7-353_tMig92KsZsrcM6k && unzip files.zip && rm files.zip

To start training the model: run

python train.py

If you want to use wandb please install it and change your auth_key in the train.py file (ln:4).

You can change different parameters in the params.py file.

You can train the model in any custom dataset other than IAM and CVL. The process involves creating a dataset_name.pickle file and placing it inside files folder. The structure of dataset_name.pickle is a simple python dictionary.

{
'train': [{writer_1:[{'img': <PIL.IMAGE>, 'label':<str_label>},...]}, {writer_2:[{'img': <PIL.IMAGE>, 'label':<str_label>},...]},...], 
'test': [{writer_3:[{'img': <PIL.IMAGE>, 'label':<str_label>},...]}, {writer_4:[{'img': <PIL.IMAGE>, 'label':<str_label>},...]},...], 
}

Run Demo using Docker

 docker run -it -p 7860:7860 --platform=linux/amd64 \
	registry.hf.space/ankankbhunia-hwt:latest python app.py

Handwriting synthesis results

Please check the results folder in the repository to see more qualitative analysis. Also, please check out colab demo to try with your own custom text and writing style Colab Notebook

Handwriting reconstruction results

Reconstruction results using the proposed HWT in comparison to GANwriting and Davis et al. We use the same text as in the style examples to generate handwritten images.

Citation

If you use the code for your research, please cite our paper:

@InProceedings{Bhunia_2021_ICCV,
    author    = {Bhunia, Ankan Kumar and Khan, Salman and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Fahad Shahbaz and Shah, Mubarak},
    title     = {Handwriting Transformers},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {1086-1094}
}

handwriting-transformers's People

Contributors

ankanbhunia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

handwriting-transformers's Issues

Missing license

Love this work! Any chance you could add an MIT license or something? I've started my own fork, but wanted to make sure it remains open so people can keep building off of it. Thanks!

plan for code release?

Hi, many thanks for your innovative work, just wondering is there a plan for code release? thanks

Cycle loss

I went through the codebase, it seems like the cycleloss has been commented. is there any particluar reason why it has been commented out?

Having problem in loading the pretrained weight in orchvision\models\_utils.py

warnings. warn(
C:\Users\Asus\anaconda3\envs\pytorch\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)

Generate custom handwriting?

Could this be used to reproduce one particular handwriting style? (That is, one not in the IAM database -- such as my own handwriting, or the handwriting of a famous historical figure.)

If so, could you please walk me through how to do that? I would be very appreciative.

‘base_path = '../IAM_32.pickle'’

Hello, what's in your code ‘base_path = '../IAM_32.pickle'’ 。Can this document be published? If so, please send me a private letter? Thanks for your reading!

Model results

Hi!

I've been playing around with this model locally following the instructions in the README and my results don't seem to be nearly as good as yours.. I'm following your instructions here #11 (comment) and then running the prepare.py in my fork

For instance even with different style prompts the model seems to generate very similar results for me
Real on left, Generated on right

IAM style 1
image-9-IAM

IAM style 2
image-6_1

and then secondly the CVL and the IAM models give very different results to each other. But also quite consistent results within the model itself for different styles

CVL style 1
image-9-CVL

CVL style2
image-6

Is there something stupid I'm missing or do I need to train it with these writers in the dataset to get better results? does the google drive contain the fully trained models that were used to generate the results in the paper?

Very cool project though - congrats!!

Tool for text cropping?

I'm trying to generate something based on my own handwriting, what text cropping tool did you use? Wondering if I can skip the work of cropping every single word by hand.

Thanks!!

Color images

Hello!

I need to generate color images for my project. I want to try changing your architecture accordingly. I guess this should be easy to do. It is enough to change the number of channels at input and output, right? I'm wondering what you think of this possibility. Any thoughts welcome!

When is this code complete?

Hello, thanks for the impressive work.
It seems some code(lexicon file and txt not be uploaded) and the install.md not be finished.
When will this code be updated?
@ankanbhunia

About pre-trained model.

Sir, the method proposed in your paper has effectively improved.
I am very curious how to get the "fid: four scenes" in your paper
In addition, Would you like to share the pre-trained model.
I am sorry for my recklessness.
good luck.

The Fake and real folder are empty after the training is completed

Hello author, thank you for your excellent paper. I have recently looked at your code and I have encountered a problem that I am not generating Fake and Real images in the saved_images folder after the training is completed, these two folders are empty, I would like to get your guidance and help, looking forward to your reply, thank you!

For other language's handwriting generation--how to properly fine-tune the model?

@ankanbhunia

While you mention:
"You can train the model in any custom dataset other than IAM and CVL. The process involves creating a dataset_name.pickle file and placing it inside files folder. The structure of dataset_name.pickle is a simple python dictionary."

I go through the codes and the way you suggest seems to retrain the model on another dataset from scratch rather than fine-tuning your model. As your paper has not yet much discussed this, I come to ask your opinion. If I want to apply the model to generate other language's handwritings e.g. Japanese, is there a way to quickly fine-tune your model upon a new Japanese handwriting data? Or do I need to retrain it from scratch?

No such file or directory: '../CVL_32.pickle'

Thank you for publishing IAM_32.pickle in Issue #4
I cloned this project to my personal Colab Notebook and executed "train.py". Now the file CVL_32.pickle is required.

Traceback (most recent call last):
  File "./Handwriting-Transformers/train.py", line 42, in <module>
    TextDatasetObjval = TextDatasetval()
  File "/content/Handwriting-Transformers/data/dataset.py", line 109, in __init__
    file_to_store = open(base_path, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../CVL_32.pickle'

Would you please publish this file also?

Thank you very much. 🙇

Regarding the issue of loading the dataset

Hello, I have been troubled by some questions for a long time, and I hope to get answers. Regarding the parameters in prapare_data.py: TRAIN_IDX = 'gan.iam.tr_va.gt.filter27' and TEST_IDX = 'gan.iam.test.gt.filter27', how were they obtained? Are the IAM_WORD_DATASET_PATH and XMLS_PATH directories used to store word images and related XML files, respectively, downloaded from the I AM dataset official website?
I hope to receive your answer, and thank you very much.

RIMES weights?

Great project. I saw the paper had experiments with RIMES, any chance you still have those weights? Thanks!

params.py file

Hi,

Thanks for sharing the great work!

I am training this model from scratch, but the training parameters mentioned in the paper, such as IGM_HEIGHT and LR are different than the ones given in the params.py file.

Can you please share the parameters that were used to train the network?
Thanks.

CVL_32.pickle

Hello, what's in your code ‘base_path = './CVL_32.pickle'’ 。Can this document be published? If so, please send me a private letter? Thanks for your reading

Missing the positional encodings in the encoder

Hi, sir, thanks for the impressive work. Mentioned in the paper, "To retain information regarding the order of input sequences being supplied, we add the positional encodings [23] to the input of each attention layer". However, the released code does not add the positional encodings to the Multi-Head Attention of the encoder, and only adds positional encodings to the Multi-Head Attention of the decoder. It's better if we don't apply positional encodings in the encoder?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.