GithubHelp home page GithubHelp logo

bartzi / kiss Goto Github PK

View Code? Open in Web Editor NEW
110.0 12.0 29.0 571 KB

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"

License: GNU General Public License v3.0

Python 100.00%
deep-learning paper chainer scene-text-recognition spatial-transformer-networks transformer

kiss's Introduction

KISS

Code for the paper KISS: Keeping it Simple for Scene Text Recognition.

This repository contains the code you can use in order to train a model based on our paper. You will also find instructions on how to access our model and also how to evaluate the model.

Pretrained Model

You can find the pretrained model here. Download the zip and put into any directory. We will refer to this directory as <model_dir>.

Prepare for using the Code

  • make sure you have at least Python 3.7 installed on your system
  • create a new virtual environment (or whatever you like to use)
  • install all requirements with pip install -r requirements.txt (if you do not have a CUDA capable device in your PC, you should remove the package cupy from the file requirements.txt).

Datasets

If you want to train your model on the same datasets, as we did, you'll need to get the train data first. Second, you can get the train annotation we used from here.

Image Data

You can find the image data for each dataset, using the following links:

Once, you've downloaded all the images, you can get the gt-files we've prepared for the MJSynth and SynthAdd datasets here.

For the SynthText dataset, you'll have to create them yourself. You can do so by following these steps:

  1. Get the data and put it into a directory (lets assume we put the data into the directory /data/oxford)
  2. run the script crop_words_from_oxford.py (you can find it in datasets/text_recognition) with the following command line parameters python crop_words_from_oxford.py /data/oxford/gt.mat /data/oxford_words.
  3. This will crop all words based on their axis aligned bounding box from the original oxford gt.
  4. Create train and validation split with the script create_train_val_splits.py. python create_train_val_splits.py /data/oxford_words/gt.json.
  5. Run the script json_to_npz.py with the following command line: python json_to_npz /data/oxford_words/train.json ../../train_utils/char-map-bos.json. This will create a file called train.npz in the same directory as the file gt.json is currently located in.
  6. Repeat the last step with the files validation.json.

Once you are done with this, you'll need to combine all npz files into one large npz file. You can use the combine_npz_datasets.py for this. Assume you saved the MJSynth dataset + npz file here /data/mjsynth and the SynthAdd dataset + npz file here /data/SynthAdd, then you'll need to run the script in the following way: python combine_npz_datasets.py /data/mjsynth/annotation_train.npz /data/oxford_words/train.npz /data/SynthAdd/gt.npz --destination /data/datasets_combined.npz.

Since the datasets may contain words that are longer than N characters (we always set N to 23), we need to get rid of all words that are longer than N characters. You can use the script filter_word_length.py for this. Use it like so: python filter_word_length.py 23 /data/datasets_combined.npz --npz. Do the same thing with the file validation.npz you obtained from splitting the SynthText dataset.

If you want to follow our experiments with the balanced dataset, you can create a balanced dataset with the script balance_dataset.py. For example: python balance_dataset.py /data/datasets_combined_filtered_23.npz datasets_combined_balanced_23.npz -m 200000. If you do not use the -m switch the script will show you dataset statistics and you can choose your own value.

Evaluation Data

In this ssection we explain, hou you can get the evaluation data + annotation. For getting the evaluation data you just need to do 2 steps per dataset:

  1. Clone the repository.
  2. Download the npz annotation file. And place it in the directory, where you cloned the git repository to.
Dataset Git Repo NPZ-Link Note
ICDAR2013 https://github.com/ocr-algorithm-and-data/ICDAR2013 download Rename the directory test to Challenge2_Test_Task3_Images
ICDAR2015 https://github.com/ocr-algorithm-and-data/ICDAR2015 download Rename the dir TestSet to ch4_test_word_images_gt
CUTE80 https://github.com/ocr-algorithm-and-data/CUTE80 download -
IIIT5K https://github.com/ocr-algorithm-and-data/IIIT5K download -
SVT https://github.com/ocr-algorithm-and-data/SVT download Remove all subdirs, but the dir test_crop. Rename this dir to img
SVTP https://github.com/ocr-algorithm-and-data/SVT-Perspective download -

Training

Now you should be ready for training ๐ŸŽ‰. You can use the script train_text_recognition.py, which is in the root-directory of this repo.

Before you can start your training, you'll need to adapt the config in config.cfg. Set the values following this list:

  • train_file: Set this to the file /data/datasets_combined_filtered_23.npz
  • val_file: Set this to /data/oxford_words.validation.npz
  • keys in TEST_DATASETS set those to the corresponding npz file you got here and setup in the last step.

You can now run the training with, e.g., python train_text_recognition.py <name for the log> -g 0 -l tests --image-mode RGB --rdr 0.95 This will start the training and create a new directlry with log entries in logs/tests. Get some coffee and sleep, because the training will take some time!

You can inspect the train progress with Tensorboard. Just start Tensorboard in the root directory like so: tensorboard --logir logs.

Evaluation

Once, you've trained a model or if you just downloaded the model we provided, you can run the evaluation script on it.

If you want to know how the model performes on all datasets, you can use the script run_eval_on_all_datasets.py. Lets assume you trained a model and logs/tests/train is the path to the log dir. Now, you can run the evaluation with this command: python run_eval_on_all_datasets.py config.cfg 0 -b 16 --snapshot-dir logs/tests/train. You can also render the predictions of the model for each evaluation image by making the following changes to the command: python run_eval_on_all_datasets.py config.cfg 0 -b 1 --snapshot-dir logs/tests/train --render. You will then find the results for each image in the directory logs/tests/train/eval_bboxes.

Questions?

Feel free to open an issue! You want to contribute? Just open a PR ๐Ÿ˜„!

License

This code is licensed under GPLv3, see the file LICENSE for more information.

Citation

If you find this code useful, please cite our paper:

@misc{bartz2019kiss,
    title={KISS: Keeping It Simple for Scene Text Recognition},
    author={Christian Bartz and Joseph Bethge and Haojin Yang and Christoph Meinel},
    year={2019},
    eprint={1911.08400},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

kiss's People

Contributors

bartzi avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kiss's Issues

MultiGPU

Is it possible to train on multi-gpus? Thanks!

The data link is broken

ๅพฎไฟกๅ›พ็‰‡_20210919173957

We can't find the original data, so we don't know their original appearance and filenameใ€‚ Therefore, we don't know how to rename and how to arrange the path.

what is gt.mat?

I have both labels in gt texts and images of real-world data. How can I get the gt.mat or crop using crop_words_from_oxford.py

Use pretrained model and continue training on own data

Hi, thanks for this great project. Is it possible to use the pretrained model and continue training on a custom training set containing 5000 images (grey scale images with a dotted font). Do you think the results will be good?

SVT Evaluation

The SVT test_crop=img folder contains 0 bite images and I am getting an error for that?

mjsynth.npz only has first letter of each word in "text"

I would like to start by saying I really enjoyed reading your paper and I am currently porting it to Pytorch.

I was going through the steps to download the synth data (outlined on your github) and during the filter_word_length I noticed that the "text" array in mjsynth.npz only contains the first letter of each word in the dataset. Is there a reason for this? Thank you.

How do you generate the mask in transformer model and process text labels to "class_id" ?

As the project is reall huge, I'm not understand how you process the text labels ? Usually, in an attention based text recognitizer, there will be "[GO]" and "[EOS]" label and will be converted into "num
class_id". But I don't understand in the transformer model, how the code process the labels into "num class_id" and generate the mask as bellow :

self.mask = subsequent_mask(self.transformer_size)

and your code has some difference in generating the mask from
https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/76762bb08225014fb3055a9d07f0043aba972d68/transformer/Models.py#L169

Do you have used "pad_idx", where can I find it ? what's the difference in use "pad_idx" and not use ? I'm really confused with "pad_idx", "GO_idx", "EOS_idx", how do you process that part ?
I don't quite know how to process with it. Could you give me some advice ?

Change the num_words_per_image without training again

Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.

InvalidType: 
Invalid operation is performed in: Reshape (Forward)

Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0

Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)

link to download SynAdd dataset.?

I cannot register baidu account, so I am not able to download this dataset.
Could anyone who downloaded send me another link to download?
Thanks

cannot install without a GPU

Hi. It looks like there is a problem when I try to do the install on my MacBook with no GPU :

$ย pip install -r requirements.txt
Collecting chainer==6.5.0
  Downloading https://files.pythonhosted.org/packages/1d/59/aa63339001ca8e15ebb560d0c33333ef465c479e165d967e64c7611b6e67/chainer-6.5.0.tar.gz (876kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 880kB 508kB/s
Collecting chainercv==0.13.1
  Downloading https://files.pythonhosted.org/packages/e8/1c/1f267ccf5ebdf1f63f1812fa0d2d0e6e35f0d08f63d2dcdb1351b0e77d85/chainercv-0.13.1.tar.gz (260kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 266kB 676kB/s
Collecting cupy==6.5.0
  Downloading https://files.pythonhosted.org/packages/67/4b/6960cdfeee8bbfa12450da6b83206b57f6d6951a74043f055905449bb657/cupy-6.5.0.tar.gz (3.1MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3.1MB 959kB/s
    ERROR: Command errored out with exit status 1:
     command: /Users/sebastienvincent/.virtualenvs/kiss/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"'; __file__='"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/pip-egg-info
         cwd: /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/
    Complete output (46 lines):
    Options: {'package_name': 'cupy', 'long_description': None, 'wheel_libs': [], 'wheel_includes': [], 'no_rpath': False, 'profile': False, 'linetrace': False, 'annotate': False, 'no_cuda': False}

    -------- Configuring Module: cuda --------
    /var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/tmpde2uw1h1/a.cpp:1:10: fatal error: 'cublas_v2.h' file not found
    #include <cublas_v2.h>
             ^~~~~~~~~~~~~
    1 error generated.
    command 'gcc' failed with exit status 1

    ************************************************************
    * CuPy Configuration Summary                               *
    ************************************************************

    Build Environment:
      Include directories: []
      Library directories: []
      nvcc command       : (not found)

    Environment Variables:
      CFLAGS          : (none)
      LDFLAGS         : (none)
      LIBRARY_PATH    : (none)
      CUDA_PATH       : (none)
      NVTOOLSEXT_PATH : (none)
      NVCC            : (none)

    Modules:
      cuda      : No
        -> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
        -> Check your CFLAGS environment variable.

    ERROR: CUDA could not be found on your system.
    Please refer to the Installation Guide for details:
    https://docs-cupy.chainer.org/en/stable/install.html

    ************************************************************

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py", line 132, in <module>
        ext_modules = cupy_setup_build.get_ext_modules()
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 632, in get_ext_modules
        extensions = make_extensions(arg_options, compiler, use_cython)
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 387, in make_extensions
        raise Exception('Your CUDA environment is invalid. '
    Exception: Your CUDA environment is invalid. Please check above error log.
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Is it possible to use kiss with CPU only?

Windows fatal exception: Access violation

Hello,

When I try to train the network with train_text_recognition.py using my own images,
I have a Windows fatal exception: access violation.
This is followed by several treads, mostly comming from chainer, tensordboard and multi_node_mean.
Do you have any idea where that could come from?

Thank you by advance,
Clรฉment

Loss Functions

Hey again,
I had a few questions about the loss functions you used for the Localization net during training.

  • In the Out Of Image loss calculation you +/- 1.5 to the bbox instead of +/- 1 (like your paper), why do you do this?

  • Also why are you using corner coordinates for loss calculations?

  • Was the DirectionLoss used in your paper?

[Help] Using Pretrained Model

Thanks for sharing the model. I just want to test the pertained model that you provided. Do I still need to download the image data (SynthText/MjSynth) if I'm using the pretrained model? And If not then how can I get run the pertained model on testing datasets like cute80, idcars etc. I have already downloaded the datasets (cute80, idcar2013, idcar2015, iiit5k, svt, svtp) and their respective npz files. How can I run the evaluation on these datasets?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.