GithubHelp home page GithubHelp logo

deeptextspotter's Introduction

DeepTextSpotter

DeepTextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework

The implementation of DeepTextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework : ICCV 2017.

Requirements

  • python2.7
  • opencv 3.x with python bindings

Installation

  1. Get the proper version of caffe
git clone https://github.com/MichalBusta/caffe.git
cd caffe
git checkout darknet

  1. build caffe
mkdir Release 
cd Release 
cmake -D CMAKE_BUILD_TYPE=Release -D BLAS=Open -D BUILD_SHARED_LIBS=Off ..
make 
make install (optionally)
  1. build project
cd "SOURCE dir" 
mkdir build
cd build
cmake ..
make 

Download models

RPN: https://drive.google.com/open?id=0B8SUcdkLTcuTZjRHeUpjdzhmbFE OCR: https://drive.google.com/open?id=0B8SUcdkLTcuTMmI0TS1uNDJaZGs

the paths are hard-coded, models shoud be at models subdirectory

Run webcam demo

python2 demo.py

Notes:

  • The provided RPN model is tiny version of full "YOLOv2" detector (= demo runs at 7 fps on 1GB Nvidia GPU)
  • For decoding final output, we provide just greedy and dictionary based prefix decoding

deeptextspotter's People

Contributors

michalbusta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeptextspotter's Issues

no module named cmp_trie

Hi , thanks for your code ! I have two prolems :
1、I modify the train.py
image
and add my path of data in the train.py , run python train.py ,but it displays that no module named cmp_trie 。 I used the standard caffe , is it caused by the standard caffe ?
2、my train_list and valid_list are as follow:
image
image
are they right ? @MichalBusta

Error while running train.py

After formatting SynthText to conform to the x,y,w,h,ang format I run into a problem while training.

Traceback (most recent call last):
File "train_v0.py", line 431, in
train_dir(nets, sgd, yolonet, dataloader, args)
File "train_v0.py", line 406, in train_dir
process_batch(nets, optim, optim2, image_size, args)
File "train_v0.py", line 126, in process_batch
cx = net.layers[0].get_crop(bid, 0)
AttributeError: 'Layer' object has no attribute 'get_crop'

Apparently, OnDiskData is registered (I think you named in CMPData) and python has access to basic things like the type'' function, but get_img_file_name'' and ``get_crop'' are not accessible. Is this training code verified to run? Have you encountered this behavior?

TEST and TRAIN phases

Hey,
What are the differences between those 2 phases?
I noticed that TEST phase disables the dropout and cropping, and enables a softmax layer that allows to extract ctc probabilities.
Are there any more differences? I assume there are, as I get significantly (15%) worse results when using TEST phase which is weird.
I'm using a modified validation script I wrote from train.py, because I encountered many problems using validation / demo files (pre-processing is different from train.py)
Many thanks,
Lee

How to pre-train?

Hi, I encountered some problems in pre-training. First of all, I want to confirm a few questions.

  1. optim2.step(1): in this step, are you training on Regional Proposals or Best Proposals?
  2. When pre-training the recognition network alone, is the format of ground truth similar to this?
    cls x y w h alpha txt
    Finally, if you can teach me how to pre-train Detection Network and Recognition Networks with different data sets, this will be very helpful.
    Please help
    Thanks a lot

run demo.py error

Traceback (most recent call last):
File "DeepTextSpotter-master/demo.py", line 21, in
from utils import get_normalized_image, print_seq_ext, print_seq2, get_obox, process_splits
File "/home/chase/caffe/DeepTextSpotter-master/utils.py", line 12, in
import cmp_trie
ImportError: /home/chase/caffe/DeepTextSpotter-master/build/cmp_trie.so: undefined symbol: _ZN2cv28rotatedRectangleIntersectionERKNS_11RotatedRectES2_RKNS_12_OutputArrayE

Issues while running demo.py : cannot connect to X server

demo.py:148: RuntimeWarning: invalid value encountered in divide
mean_conf = np.sum(masked) / masked.shape[0]
: cannot connect to X server

i've build caffe and DeepTextSpotter follow readme file,

i'm using a Unix server without any UI to show images, only to save .avi file.

Caffe Compiling Error "__shfl_down" undefine

Hey:
I tried to compile Caffe following the instructions, and met this error:
/caffe/src/caffe/layers/reduce.cu(44) error : identifier "__shfl_down" is undefined
The function "__shfl_down" should be defined in cuda. However, in the file reduce.cu, quote of cuda are commented.
`// Includes, system
// #include <stdio.h>
// #include <stdlib.h>

// Includes, cuda
// #include <cuda_runtime.h>
// #include <cublas_v2.h>

// Includes, cuda helper functions
// #include <helper_cuda.h>

// For the functors
#include "caffe/util/ctc/detail/ctc_helper.h"
#include "caffe/util/ctc/ctc.h"
`

Caffe Make Error

When I follow the instructions that build the caffe, I have this error:

/deeplearning/Dropbox/decisionengines/invoice_text_bbox/caffe/src/caffe/layers/ondisk_data_layer.cpp:84:65: error: 'rotatedRectangleIntersection' was not declared in this scope int ret = rotatedRectangleIntersection(rect1, rect2, vertices); ^

Couldn't find any function that called "rotatedRectangleIntersection" in the src.

Compile error of caffe

Hello,

When I build caffe following README, I meet the error:

caffe/src/caffe/layers/ondisk_data_layer.cpp:84:65: error: ‘rotatedRectangleIntersection’ was not declared in this scope

Maybe the branch darknet is missing some code.

Single Characters are not recognized

Is it possible to detect single characters? for example the word " T Segments" the word "Segments" is recognized but "T" is not (There is more than single space between "T" and "Segments"). A bounding box is drawn around it though but it won't appear on the std out like the "Segments" word. Is it possible to change this behavior?

how to organization the code of data.py?

I read the code of data.py , there are some codes that I can not understand . And I don not know how to organization the code in order to using in the training. In other words, how to get the training data to train the network.Thanks very much !@MichalBusta

What is "cls"?

Hello, I am a student and I am very new to the deep learning. I really wanna to know how the training process works.
I see that is the "cls" in the annotation, what is that, could you please explain it to me?

Thanks a lot.

Input as image

As the title. Can I use image as the input? and how?
Thanks 🥇

Issues while running demo.py

Hi Michal,
I have installed caffe from here and make another installation using darknet branch as you mentioned in README. However, caffe with darknet installation doesn't satisfy
import caffe dependency.
I have given both caffe with and without darknet path in PYTHONPATH. However, while running python demo.py. I get the following error :

W0207 17:05:14.716331 25778 _caffe.cpp:142] Net('DeepTextSpotter/models/tiny.prototxt', 1, weights='DeepTextSpotter/models/tiny.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 513:14: Message type "caffe.LayerParameter" has no field named "reorg_param".
F0207 17:05:14.717484 25778 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: DeepTextSpotter/models/tiny.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

Please tell where I am going wrong.

Specs :


nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 8.0, V8.0.61

NVIDIA-SMI 390.12 Driver Version: 390.12

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.6)

Can you give an example of a image in train list file?

Thanks for sharing the codes! It is a nice work!
I want to finetune the model for my own dataset, but confused when preparing for the train list. I have see the code, but failed. I will really appreciate if you can give some examples of the train lists.
Thanks a lot!

I keep getting this error duriong cmake ?

During cmake of caffe:

CMake Warning at /usr/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.7/Modules/FindBoost.cmake:1470 (_Boost_MISSING_DEPENDENCIES)
cmake/Dependencies.cmake:8 (find_package)
CMakeLists.txt:46 (include)

CMake Warning at /usr/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.7/Modules/FindBoost.cmake:1470 (_Boost_MISSING_DEPENDENCIES)
cmake/Dependencies.cmake:8 (find_package)
CMakeLists.txt:46 (include)

CMake Warning at /usr/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.7/Modules/FindBoost.cmake:1470 (_Boost_MISSING_DEPENDENCIES)
cmake/Dependencies.cmake:8 (find_package)
CMakeLists.txt:46 (include)

-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
CMake Error at /usr/share/cmake-3.7/Modules/FindBoost.cmake:1831 (message):
Unable to find the requested Boost libraries.

Unable to find the Boost header files. Please set BOOST_ROOT to the root
directory containing Boost or BOOST_INCLUDEDIR to the directory containing
Boost's headers.
Call Stack (most recent call first):
cmake/Dependencies.cmake:8 (find_package)
CMakeLists.txt:46 (include)

-- Found GFlags: /usr/include
-- Found gflags (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Found Glog: /usr/include
-- Found glog (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread (found version "3.0.0")
-- Found PROTOBUF Compiler: /usr/bin/protoc
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5: Using hdf5 compiler wrapper to determine CXX configuration
-- Found HDF5: /usr/lib/x86_64-linux-gnu/libhdf5_cpp.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.10.0.1") found components: HL
CMake Error at /usr/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
Could NOT find LMDB (missing: LMDB_INCLUDE_DIR LMDB_LIBRARIES)
Call Stack (most recent call first):
/usr/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
cmake/Modules/FindLMDB.cmake:19 (find_package_handle_standard_args)
cmake/Dependencies.cmake:52 (find_package)
CMakeLists.txt:46 (include)

-- Configuring incomplete, errors occurred!
See also "/home/test/Desktop/DeepTextSpotter/caffe/Release/CMakeFiles/CMakeOutput.log".
See also "/home/test/Desktop/DeepTextSpotter/caffe/Release/CMakeFiles/CMakeError.log".

How did you get the angle from gt box?

I'm trying to train this model, and port this code into Tensorflow, so I'm trying to know what the code and paper means.

def read_txt_gt(gt_file, multi_script = False, separator = ' ', skip_slash = False):
     
  f = codecs.open(gt_file, "rb", "utf-8-sig")
  lines = f.readlines()
  
  gt_rectangles = []
  for line in lines:
    if line[0] == '#':
      continue
    splitLine = line.split(separator);
    if len(splitLine) < 7:
      continue
    xline = u'{0}'.format(line.strip())
    xline = xline.encode('utf-8') 
    
    for splitLine in unicodecsv.reader([xline], skipinitialspace=True, quotechar='"', delimiter=separator, encoding='utf-8'):
      break
    
    language = ""
    cls = splitLine[0].strip()
    x =  float(splitLine[1].strip())
    y = float(splitLine[2].strip())
    w = float(splitLine[3].strip())
    h = float(splitLine[4].strip())
    angle = float(splitLine[5].strip())

It seems like the code used different format of GT other than ICDAR and SynthText. Is it?
If so, can you open GT-generation code that deals with ICDAR 2013, 2015 and SynthText? Thank you.

formayolo_mobile_iter_0t error

the error is AttributeError: 'unicode' object has no attribute 'formayolo_mobile_iter_0t' and code is
print(u'{0} - {1}'.formayolo_mobile_iter_0t(out, txt[0]) )

system is ubuntu 16.04 python 2.7.12

Reproducing ICDAR Results

Hi,

Thanks for the code.

I've tried to reproduce the reported results for ICDAR 2013 and ICDAR 2015.

Using the weights you've shared, I get .7 F1 on ICDAR 2013 E2E. In the paper you reported .77.
I am not using the 90K dictionary for the recogntion.
Is this drop in performance expected? Are the weights you have shared tuned to some other dataset?

Regards,
John

How can I train the network for another language?

Hi,
I just don't know how to start with the text recognition for another language.
I have tried changing 'codec' variable (add more non-English characters), and I think I should change 'caffe.SpatialConvolution_36' in 'model_cz.prototxt ' with another 'num_output' value. But I don't know how to choose that value.

Could you please give me some direction?
Thank you very much :)

Bad gt line error

I am trying to run train.py with my own data. However, I got the bad gt line error for every line when the program reading my ground truth label files.

The format in my file is like this:
x, y, w, h, words
x2, y2, w2, h2, words

I changed the value of data_param.scource in tiny.prototxt to my own path.

I couldn't find where do you print the bad gt line error in python code, so it is hard for me to debug. Could you indicate where is the data loading process for train.py? Or could you make it modifiable in python part? Because I really don't know where to locate the error.

Thanks.

INTERSECT_NONE is not a member of cv when making project

Hi @MichalBusta,

When I go to make the project I'm getting this error:

Scanning dependencies of target trie_py
[ 25%] Building CXX object src/CMakeFiles/trie_py.dir/trie.cpp.o
/workspace/DeepTextSpotter/src/trie.cpp: In function 'PyArrayObject* assign_lines(PyArrayObject*, int, npy_intp*, PyArrayObject*, npy_intp*)':
/workspace/DeepTextSpotter/src/trie.cpp:300:65: error: 'rotatedRectangleIntersection' was not declared in this scope
    int ret = rotatedRectangleIntersection(rect1, rect2, vertices);
                                                                 ^
/workspace/DeepTextSpotter/src/trie.cpp:301:14: error: 'INTERSECT_NONE' is not a member of 'cv'
    if(ret != cv::INTERSECT_NONE){
              ^
src/CMakeFiles/trie_py.dir/build.make:62: recipe for target 'src/CMakeFiles/trie_py.dir/trie.cpp.o' failed
make[2]: *** [src/CMakeFiles/trie_py.dir/trie.cpp.o] Error 1
CMakeFiles/Makefile2:124: recipe for target 'src/CMakeFiles/trie_py.dir/all' failed
make[1]: *** [src/CMakeFiles/trie_py.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
root@0825c99f78d2:/workspace/DeepTextSpotter/build#```

Any suggestions? I'm running this inside the official caffe docker image found here:
https://github.com/BVLC/caffe/tree/master/docker

Also,  do you know of any tensorflow implementation of Deep TextSpotter? I've searched around and can't seem to find one.

Thanks!
Joel

where is bilinear sampling used in the code?

Thanks for sharing the code. I am training to understand the paper, so where is bilinear sampling used in the code? From the code it seems that after getting bounding boxes proposals, you just crop the original image?

Understanding cropping process (net.layers[0].get_crop())

Hi,
I'm trying to adapt the 'Train.py' for evaluation.
In the 'process_batch' function there's a cropping process, using:

net.layers[0].get_crop()

which returns random coordinates for cropping the image.
I understand that during the train process, these crops are serving as augmentations of the data.
I would like to disable this augmentation, as it disrupt the forward pass when validating.
Just removing these lines ruins the results totally.
The function is 'get' so I assume the crop happens somewhere in the architecture, but I can't locate it.

The relevant code section is from train.py -> process batch:
image

Thanks,
Lee.

gcc version for caffe compiling

@MichalBusta Which version of gcc your project used? An error occured when compiling:
/usr/include/x86_64-linux-gnu/c++/4.7/./bits/c++config.h(177): error: identifier "nullptr" is undefined

/usr/include/x86_64-linux-gnu/c++/4.7/./bits/c++config.h(177): error: expected a ";"

/usr/include/c++/4.7/exception(65): error: expected a ";"

Classes in published weight file, (w,h,\theta) ambiguity

I noticed that in the published model you have 6 classes in the region layer (so the shape of conv_22 is 1024xxx168). But in the datasets mentioned in the paper there is only 1 language afaik. Do you use an additional training set that came in handy?

Also, there is inherent ambiguity in (w,h, \theta) in rotated boxes. It seems like you handle this by asserting w > h, but it is not 100% clear from the paper or the code.

Thanks and regards,
John

failure in training

I try to reproduce your experiment on icdar dataset, but meet the following problem:  

qq 201805082215082
qq 201805082247052
qq 201805082247292
It seems that the data is not imported successfully, but I think the data is organized in the right way as required.

issue while running train.py

Hi we tried train.py but are getting following error:

I0213 04:00:44.575316 17258 net.cpp:242] This network produces output softmax
I0213 04:00:44.575346 17258 net.cpp:255] Network initialization done.
I0213 04:00:44.575435 17258 solver.cpp:56] Solver scaffolding done.
[416, 416]
I0213 04:00:44.581261 17258 solver.cpp:330] Iteration 0, Testing net (#0)
Floating point exception (core dumped)

We are creating a separate .txt file for each .jpeg file. For example, one of the .txt file looks like following:-

0 86.544325 113.70325 61.5000988376 10.8000627304 3.03217953614 calliberation

Our image size is 426 x 240.. shall we change it to some standard size?

Please help us resolving the error.

cmp_trie undefined symbol error on running demo.py

I want to run deeptextspotter in CPU. Is it possible to install and run deeptextspotter on CPU or not?. After installations on running demo.py , I face the issue :

import cmp_trie
ImportError: /home/vivek/DeepTextSpotter/cmp_trie.so: undefined symbol: _ZN2cv28rotatedRectangleIntersectionERKNS_11RotatedRectES2_RKNS_12_OutputArrayE

What do you mean by greedy prefix decoding?

Hello @MichalBusta ,
At the main page it says that
"For decoding final output, we provide just greedy and dictionary based prefix decoding"
What do you mean by greedy prefix decoding? Is it basically to recognize out of vocabulary words?

Thanks for your anwer.

Dropout source layer is ignored

Hey,
After the initialization of the network, between the many red lines, I'm also notified that the dropout layer is being ignored:
image

I get this line before every test run.
How can I fix it ?
Thanks,
Lee

different cropped-image normalization in validation and train

Hey,
I'm wondering why the cropped image that enters the second network (CTC) is being normalized differently in train and in validation. The pixels values are in different range, thus I get different predictions on the same picture using the same weights.

validation:
image

train:
image

Thanks,
Lee

caffe make error

collect2: error: ld returned 1 exit status
tools/CMakeFiles/finetune_net.dir/build.make:131: recipe for target 'tools/finetune_net' failed
make[2]: *** [tools/finetune_net] Error 1
CMakeFiles/Makefile2:467: recipe for target 'tools/CMakeFiles/finetune_net.dir/all' failed
make[1]: *** [tools/CMakeFiles/finetune_net.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

How can i run the demo.py use CPU-Only

Hi
I tried to run the demo.py use cpu. so i setted the command of cmake like that
cmake -D CMAKE_BUILD_TYPE=Release -D BLAS=Open -D BUILD_SHARED_LIBS=OFF -D CPU_ONLY=ON .. ,
and setted CPU_ONLY := 1 in the Makefile.config ,but I got error
F0417 16:57:45.895947 6787 conv_layer.cpp:76] Cannot use GPU in CPU-only Caffe: check mode.

How can i solve this problem? Thank you!

Floating point exception (core dumped) for own training data

Hi,

I've managed to create a training dataset. The text file for the attached image contains the following line:

0 0.10345672607421875 0.06218933477634337 0.18472142537434896 0.07758427723548365 0.0404388306745 YK-201-SB

However, I'm still getting the following error:
I0817 11:00:51.780251 36032 solver.cpp:330] Iteration 0, Testing net (#0) Floating point exception (core dumped)

Can you please tell me if it is a dataset problem or a code problem? Can I debug the solver.cpp file?
Thank you.

stage_91.jpg
stage_91.txt

ristretto problem?

I get the error below when I try to compile the required caffe version. I compiled the pmgysel/caffe repo from which this one is forked, and didn't get this error.

CXX src/caffe/ristretto/quantization.cpp
src/caffe/ristretto/quantization.cpp: In member function ‘void Quantization::Quantize2DynamicFixedPoint()’:
src/caffe/ristretto/quantization.cpp:188:42: error: no matching function for call to ‘caffe::Net<float>::Net(caffe::NetParameter&, NULL)’
     net_test = new Net<float>(param, NULL);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.