GithubHelp home page GithubHelp logo

emedvedev / attention-ocr Goto Github PK

View Code? Open in Web Editor NEW
1.1K 48.0 257.0 263 KB

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

License: MIT License

Python 100.00%
tensorflow ocr ocr-recognition machine-learning ml cnn seq2seq google-cloud-ml google-cloud image-recognition

attention-ocr's Introduction

Attention-based OCR

Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the trained model with weights as a SavedModel or a frozen graph.

Acknowledgements

This project is based on a model by Qi Guo and Yuntian Deng. You can find the original model in the da03/Attention-OCR repository.

The model

Authors: Qi Guo and Yuntian Deng.

The model first runs a sliding CNN on the image (images are resized to height 32 while preserving aspect ratio). Then an LSTM is stacked on top of the CNN. Finally, an attention model is used as a decoder for producing the final outputs.

OCR example

Installation

pip install aocr

Note: Tensorflow and Numpy will be installed as dependencies. Additional dependencies are PIL/Pillow, distance, and six.

Note #2: this project works with Tensorflow 1.x. Upgrade to Tensorflow 2 is planned, but if you want to help, please feel free to create a PR.

Usage

Create a dataset

To build a TFRecords dataset, you need a collection of images and an annotation file with their respective labels.

aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
aocr dataset ./datasets/annotations-testing.txt ./datasets/testing.tfrecords

Annotations are simple text files containing the image paths (either absolute or relative to your working dir) and their corresponding labels:

datasets/images/hello.jpg hello
datasets/images/world.jpg world

Train

aocr train ./datasets/training.tfrecords

A new model will be created, and the training will start. Note that it takes quite a long time to reach convergence, since we are training the CNN and attention model simultaneously.

The --steps-per-checkpoint parameter determines how often the model checkpoints will be saved (the default output dir is checkpoints/).

Important: there is a lot of available training options. See the CLI help or the parameters section of this README.

Test and visualize

aocr test ./datasets/testing.tfrecords

Additionally, you can visualize the attention results during testing (saved to out/ by default):

aocr test --visualize ./datasets/testing.tfrecords

Example output images in results/correct:

Image 0 (j/j):

example image 0

Image 1 (u/u):

example image 1

Image 2 (n/n):

example image 2

Image 3 (g/g):

example image 3

Image 4 (l/l):

example image 4

Image 5 (e/e):

example image 5

Export

After the model is trained and a checkpoint is available, it can be exported as either a frozen graph or a SavedModel.

# SavedModel (default):
aocr export ./exported-model

# Frozen graph:
aocr export --format=frozengraph ./exported-model

Load weights from the latest checkpoints and export the model into the ./exported-model directory.

Note: During training, it is possible to pass parameters describing the dimensions of the input images (--max-width, --max-height, etc.). If you used them during training, make sure to also pass them to the export command. Otherwise the exported model will not work properly when serving (next section).

Serving

Exported SavedModel can be served as an HTTP REST API using Tensorflow Serving. You can start the server by running the following command:

tensorflow_model_server --port=9000 --rest_api_port=9001 --model_name=yourmodelname --model_base_path=./exported-model

Note: tensorflow_model_server requires a sub-directory with the version number to be present and inside it the files exported in the previous step. So you need to manually move contents of exported-model into exported-model/1.

Now you can send a prediction request to the running server, for example:

curl -X POST \
  http://localhost:9001/v1/models/aocr:predict \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
  "signature_name": "serving_default",
  "inputs": {
     	"input": { "b64": "<your image encoded as base64>" }
  }
}'

REST API requires binary inputs to be encoded as Base64 and wrapped in an object containing a b64 key. See 'Encoding binary values' in Tensorflow Serving documentation

Google Cloud ML Engine

To train the model in the Google Cloud Machine Learning Engine, upload the training dataset into a Google Cloud Storage bucket and start a training job with the gcloud tool.

  1. Set the environment variables:
# Prefix for the job name.
export JOB_PREFIX="aocr"

# Region to launch the training job in.
# Should be the same as the storage bucket region.
export REGION="us-central1"

# Your storage bucket.
export GS_BUCKET="gs://aocr-bucket"

# Path to store your training dataset in the bucket.
export DATASET_UPLOAD_PATH="training.tfrecords"
  1. Upload the training dataset:
gsutil cp ./datasets/training.tfrecords $GS_BUCKET/$DATASET_UPLOAD_PATH
  1. Launch the ML Engine job:
export NOW=$(date +"%Y%m%d_%H%M%S")
export JOB_NAME="$JOB_PREFIX$NOW"
export JOB_DIR="$GS_BUCKET/$JOB_NAME"

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir=$JOB_DIR \
    --module-name=aocr \
    --package-path=aocr \
    --region=$REGION \
    --scale-tier=BASIC_GPU \
    --runtime-version=1.2 \
    -- \
    train $GS_BUCKET/$DATASET_UPLOAD_PATH \
    --steps-per-checkpoint=500 \
    --batch-size=512 \
    --num-epoch=20

Parameters

Global

  • log-path: Path for the log file.

Testing

  • visualize: Output the attention maps on the original image.

Exporting

  • format: Format for the export (either savedmodel or frozengraph).

Training

  • steps-per-checkpoint: Checkpointing (print perplexity, save model) per how many steps
  • num-epoch: The number of whole data passes.
  • batch-size: Batch size.
  • initial-learning-rate: Initial learning rate, note the we use AdaDelta, so the initial value does not matter much.
  • target-embedding-size: Embedding dimension for each target.
  • attn-num-hidden: Number of hidden units in attention decoder cell.
  • attn-num-layers: Number of layers in attention decoder cell. (Encoder number of hidden units will be attn-num-hidden*attn-num-layers).
  • no-resume: Create new weights even if there are checkpoints present.
  • max-gradient-norm: Clip gradients to this norm.
  • no-gradient-clipping: Do not perform gradient clipping.
  • gpu-id: GPU to use.
  • use-gru: Use GRU cells instead of LSTM.
  • max-width: Maximum width for the input images. WARNING: images with the width higher than maximum will be discarded.
  • max-height: Maximum height for the input images.
  • max-prediction: Maximum length of the predicted word/phrase.

References

Convert a formula to its LaTex source

What You Get Is What You See: A Visual Markup Decompiler

Torch attention OCR

attention-ocr's People

Contributors

adamwp avatar alpexjava avatar anotherpopoua avatar asimov876 avatar brishtiteveja avatar ckirmse avatar da03 avatar dos1in avatar emedvedev avatar gammasts avatar harmonicahappy avatar imoonkey avatar linjm avatar mariusmez avatar mattfeury avatar mgaitan avatar mjpieters avatar nektor211 avatar pokonski avatar rmoe avatar rrtaylor avatar sivanke avatar stickler-ci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attention-ocr's Issues

Optimizing for Inference

I'm trying to run my graph on an Android device however I'm having some issues optimizing the graph. Has anyone managed to get attention-ocr to work on an android devices? Here's what I've tried so far.

I froze the graph using aocr export --format=frozengraph ./exported-model

Then I tried the optimizing for inference by running:

$ python optimize_for_inference.py --input frozen_graph.pb --output graph_optimized.pb --input_names=input_image_as_bytes --output_names=prediction,probability`

The resulting .pb file gave me an error.

Caused by: java.io.IOException: Not a valid TensorFlow Graph serialization: Node 'cond_1/strided_slice/stack': Control dependencies must come after regular dependencies

I then tried optimizing for deployment with the newer graph_transforms by running:

$ bazel build tensorflow/tools/graph_transforms:transform_graph

$ bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=frozen_graph.pb --out_graph=optimized.pb --inputs='input_image_as_bytes' --outputs='prediction,probability' --transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

That gave me another error.

Node 'model_with_buckets/embedding_attention_decoder/embedding_lookup' expects to be colocated with unknown node 'embedding_attention_decoder/embedding'

Return multiple guesses and probabilities for a given image

As briefly discussed in #19, it would be great to return a list of "guesses" with their probabilities. e.g. for an image like "JABRONI" we could get a response a la

[
{
    "output": "JABRONI",
    "probability": 0.998
},
{
    "output": "JABR0NI",
    "probability": 0.968
},
{
    "output": "JABR0N1",
    "probability": 0.942
}
]

adding this here to track it. i may find some time to get to it, but not in the short term. think we just need a clever way to go through the probabilities and determine which guesses we should consider. somewhere around here:

for l in xrange(len(self.attention_decoder_model.output)):
guess = tf.argmax(self.attention_decoder_model.output[l], axis=1)
num_feed.append(guess)

anyway to get "Confidence" metric?

Hello,

I'm interested in knowing a "confidence" for a given prediction. Does anyone have any ideas on the best way to tackle this? I assume there is some output in the graph (potentially for each character?) that I could tap into to calculate this. Hope to play with this later this week but wanted to see if anyone had any ideas first.

Feature extraction using CNN

Hi,
I would like to extract feature sequence of a text line image using CNN.
How can I perform this using cnn.py or model.py ?
Thank you in advance for your help.

Problem while training with datasets.tfrecords

Traceback (most recent call last):
File "/home/phani/Documents/AOCR/attention-ocr/aocr/main.py", line 277, in
main()
File "/home/phani/Documents/AOCR/attention-ocr/aocr/main.py", line 252, in main
num_epoch=parameters.num_epoch )
File "/home/phani/Documents/AOCR/attention-ocr/aocr/model/model.py", line 348, in train
for batch in s_gen.gen(self.batch_size):
File "/home/phani/Documents/AOCR/attention-ocr/aocr/util/data_gen.py", line 71, in gen
word = self.convert_lex(lex)
File "/home/phani/Documents/AOCR/attention-ocr/aocr/util/data_gen.py", line 97, in convert_lex
assert len(lex) <= self.bucket_specs[-1][1]
AssertionError

self.bucket_specs[-1][1] gives me a value of 10 always because bucket_specs = [(22,10)]
len(lex) is the length of the label the longest of which is 19 characters so the assertion fails

What does the parameter bucket_specs refer to and how do I fix this error?

`max-width` only works with widths divisible by 40

Hi @emedvedev

aocr train datasets/training.tfrecords command is throwing the following error:

2017-09-27 17:35:18.730416: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 17:35:18.730433: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 17:35:18.730437: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 17:35:18.730440: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 17:35:18.730443: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-27 17:35:18.924995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8095
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.80GiB
2017-09-27 17:35:18.925019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:927] DMA: 0 
2017-09-27 17:35:18.925023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:937] 0:   Y 
2017-09-27 17:35:18.925032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0
2017-09-27 17:35:18.957293: I tensorflow/core/common_runtime/direct_session.cc:265] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0

2017-09-27 17:35:18,957 root  INFO     loading data
2017-09-27 17:35:18,964 root  INFO     phase: train
2017-09-27 17:35:18,964 root  INFO     model_dir: checkpoints
2017-09-27 17:35:18,964 root  INFO     load_model: True
2017-09-27 17:35:18,964 root  INFO     output_dir: results
2017-09-27 17:35:18,964 root  INFO     steps_per_checkpoint: 100
2017-09-27 17:35:18,964 root  INFO     batch_size: 64
2017-09-27 17:35:18,964 root  INFO     num_epoch: 1000
2017-09-27 17:35:18,964 root  INFO     learning_rate: 1
2017-09-27 17:35:18,964 root  INFO     reg_val: 0
2017-09-27 17:35:18,964 root  INFO     max_gradient_norm: 5.000000
2017-09-27 17:35:18,964 root  INFO     clip_gradients: True
2017-09-27 17:35:18,964 root  INFO     max_image_width 300.000000
2017-09-27 17:35:18,964 root  INFO     max_prediction_length 30.000000
2017-09-27 17:35:18,964 root  INFO     target_vocab_size: 39
2017-09-27 17:35:18,964 root  INFO     target_embedding_size: 10.000000
2017-09-27 17:35:18,964 root  INFO     attn_num_hidden: 128
2017-09-27 17:35:18,964 root  INFO     attn_num_layers: 2
2017-09-27 17:35:18,965 root  INFO     visualize: True
Traceback (most recent call last):
  File "launcher.py", line 258, in <module>
    main()
  File "launcher.py", line 247, in main
    max_prediction_length=parameters.max_prediction,
  File "/home/sudheer/Flipkart/Research/maneesh/tensor_flow/models/model_2.0/models/emedvedev_attention-ocr/attention-ocr/aocr/model/model.py", line 165, in __init__
    use_gru=use_gru)
  File "/home/sudheer/Flipkart/Research/maneesh/tensor_flow/models/model_2.0/models/emedvedev_attention-ocr/attention-ocr/aocr/model/seq2seq_model.py", line 137, in __init__
    softmax_loss_function=softmax_loss_function)
  File "/home/sudheer/Flipkart/Research/maneesh/tensor_flow/models/model_2.0/models/emedvedev_attention-ocr/attention-ocr/aocr/model/seq2seq.py", line 993, in model_with_buckets
    encoder_inputs = tf.split(encoder_inputs_tensor, bucket[0], 0)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1214, in split
    split_dim=axis, num_split=num_or_size_splits, value=value, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3261, in _split
    num_split=num_split, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2508, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1873, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1823, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimension size must be evenly divisible by 20 but is 21
        Number of ways to split should evenly divide the split dimension for 'model_with_buckets/split' (op: 'Split') with input shapes: [], [21,?,512] and with computed input tensors: input[0] = <0>.

Am I doing any thing wrong ?

Thanks in advance !!!

optionally save filename as comment field in dataset

Right now, if you have a large corpus for training, many of which have the exact same text, it is painful to figure out which one is responsible for a low value in output of the 'test' command.

Proposed solution: save the original filename with the record and output that in testing.

Thoughts?

testing results poor after model converging

Hi,

I've trained the model on a Synth 90k dataset and after ~442800 iterations reported perplexity is ~1.05, precision ~90% and loss in the 0.01-0.1 range. But when I run test command on same dataset numbers are much worse. Here is statistics after 1000 test steps: accuracy ~35%, perplexity ~170528.

This is train command:

aocr train datasets/synth90k_train.tfrecord --max-width 200 --max-height 31 --max-prediction 30

Ant this is test command:

aocr test datasets/synth90k_train.tfrecord --max-width 200 --max-height 31 --max-prediction 30

Any idea why such poor results?

missing filename gives a difficult to understand error

A typo in a command line argument with a parameter leads to a hard-to-understand error:

Better error detection around data_gen.py line 40 would save me multiple sad attempts at figuring out my own mistake.

I'll make a PR for this soon if no one else does.

OSError in visualizing results

When I run aocr test --visualize test.tfrecords, it throws the error OSError: cannot write mode RGBA as JPEG.

I have tried converting all my test images to JPGs and removing the alpha channels, but this doesn't seem to work. Any tips?

Detecting multiple words?

I've read the original paper and it seems that it detects multiple words in an image. I would like to confirm whether this is true? By multiple words I mean detecting text "The Mistake You" in the image below:

box_0

Also, Is there a saved model available that is trained on http://www.robots.ox.ac.uk/~vgg/data/text/ ? If not, if I train the model on that dataset which contains single words (not multiple words), will the model work on the images like the above example?

I've tried the above image in CRNN (https://arxiv.org/abs/1507.05717) but it doesn't recognize the text well because of the spaces.

some string issues with python3, especially --visualize

Hi--this is a great repo, thanks to the original authors and the great work you have done to make it more usable.

With the very latest code (I pulled on morning of oct-7-2017), there are a few string issues. The 'test' command is reporting incorrect results due to some byte / string issue, even if the prediction is correct. I'm seeing things like b"B'test'" vs b'test (where the correct word is test), and the extra b and quotes are reducing the reported quality of the results. I have a hack to work around it but I'm just learning my way around this code so you might have a better answer.

Similarly, --visualize doesn't work at all because the "filename" being passed in isn't actually a string, but a byte literal.

Separately I tried to run this under python2 at first and there was some issue with that too.

Anyway, if you have fixes, please go ahead, or if you have suggestions on the best way you want it fixed, I'll proceed and send in a PR.

NotImplementedError on training

Hi! First off thanks for this repo. It's been immensely helpful. I've hit a few snags but I've been able to work around them (can submit a PR if wanted). But on occasion during training I see this error:

Traceback (most recent call last):
  File "/usr/local/bin/aocr", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/aocr/__main__.py", line 238, in main
    model.train()
  File "/usr/local/lib/python2.7/site-packages/aocr/model/model.py", line 301, in train
    for batch in self.s_gen.gen(self.batch_size):
  File "/usr/local/lib/python2.7/site-packages/aocr/util/data_gen.py", line 65, in gen
    go_shift=1)
  File "/usr/local/lib/python2.7/site-packages/aocr/util/bucketdata.py", line 42, in flush_out
    raise NotImplementedError

I'm able to dig into the code and see where this is obviously raised for probably a good reason, but am not sure what the core reason is. Any thoughts as to why this might be happening? i'm assuming something with my dataset?

thanks!

UTF-8 and Special Characters

I'm trying to train with special characters such as currency symbols, hyphens, slashes, and apostrophes. I see that data_gen.py sets the character map to ascii. Is this project limited to ascii characters for any particular reason or should it be possible to use UTF-8 characters?

Error when training on Synth 90k

2017-10-22 23:07:17.471187: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Invalid JPEG data, size 1024
In
image = tf.image.decode_png(img, channels=1)

question of pad

Is the label should be pad into same length?,just like start

show AssertionError when I run "aocr train training.tfrecords"

I have a sample.txt with the text such as "./1391/4/361_Kindest_42517.jpg kindest"and corresponding jpg in that directory, and I succeed in build training.tfrecords.But when I run "aocr train training.tfrecords",it shows error below:
Traceback (most recent call last):
File "/usr/local/bin/aocr", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/aocr/main.py", line 199, in main
model.train()
File "/usr/local/lib/python2.7/dist-packages/aocr/model/model.py", line 298, in train
for batch in self.s_gen.gen(self.batch_size):
File "/usr/local/lib/python2.7/dist-packages/aocr/util/data_gen.py", line 68, in gen
word = self.convert_lex(lex)
File "/usr/local/lib/python2.7/dist-packages/aocr/util/data_gen.py", line 94, in convert_lex
assert lex and len(lex) < self.bucket_specs[-1][1]
AssertionError
how can I deal with this problem?

Text line recognition

Hello,
I would like to use this toolkit to recognize text line image, the groundtruth is like the following example:
aaAlaBeeE maBalMaaEtaA maBnaE aaAlaBseMnaM..........................
aaAlaBeeE is a word composed from this forms/letters : aaA laB eeE
How can I adapt this annotation to that is used in this framework like the following:
datasets/images/hello.jpg hello
datasets/images/world.jpg world

Synth 90k training error

Hi,

I'm trying to train model on a Synth90k and I run into this error during training:

InvalidArgumentError (see above for traceback): TensorArray has inconsistent shapes. Index 0 has shape: [31,135,1] but index 1 has shape: [31,81,1] [[Node: map/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:@map/TensorArray_1"], dtype=DT_UINT8, element_shape=[?,?,1], _device="/job:localhost/replica:0/task:0/cpu:0"](map/TensorArray_1, map/TensorArrayStack/range, map/while/Exit_1/_115)]] [[Node: cond_52/strided_slice/_409 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_44518_cond_52/strided_slice", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Does training script expects all input images to be of same width?

Questions about spaces

The datasets I've seen referenced for this project all contain single strings of text that contain no spaces . My use case will require identifying strings of text that contain spaces. How well does attention-ocr work if it is provided an image that contains multiple strings of text separated by spaces? Will it work, or does it require images with continuous strings of text only?

No module named tensorflow

I installed attention-ocr in my home on Ubuntu 14.04 LTS.
But this message happened.
I followed https://www.tensorflow.org/install/install_linux
1.After installing python 3.n Virtualenv steps, I installed attention-ocr but failed with below message.
2.After installing python 2.7 Virtualenv steps, I installed attention-ocr but failed with below message.

aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
Traceback (most recent call last):
File "/usr/local/bin/aocr", line 9, in
load_entry_point('aocr==0.6.2', 'console_scripts', 'aocr')()
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 351, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2363, in load_entry_point
return ep.load()
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2088, in load
entry = import(self.module_name, globals(),globals(), ['name'])
File "/usr/local/lib/python2.7/dist-packages/aocr/main.py", line 13, in
import tensorflow as tf
ImportError: No module named tensorflow

unrecognized arguments: --attn-use-lstm=False

The same problem with gpu-id. Can you please explain me how to set parameters. I didn't have problems with --max-width, --max-height, --max-prediction, --full-ascii, and --color. Thanks!

No bias terms in the CNN

Is there a reason why the feature extraction is only using convolutional kernels and no bias term?

No such process

this error happened at tensorflow 1.5 in Window 10
./datasets/annotations-training.txt file shold be in there?
I can't find annotations-training.txt anywhere.

C:\Users\kimduknam>aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
2018-02-05 13:14:41.835884: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-02-05 13:14:41,826 root INFO Building a dataset from ./datasets/annotations-training.txt.
2018-02-05 13:14:41,826 root INFO Output file: ./datasets/training.tfrecords
Traceback (most recent call last):
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\kimduknam\AppData\Local\Programs\Python\Python36\Scripts\aocr.exe_main
.py", line 9, in
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\site-packages\aocr_main
.py", line 219, in main
dataset.generate(parameters.annotations_path, parameters.output_path, parameters.log_step, parameters.force_uppercase, parameters.save_filename)
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\site-packages\aocr\util\dataset.py", line 19, in generate
writer = tf.python_io.TFRecordWriter(output_path)
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\lib\io\tf_record.py", line 106, in init
compat.as_bytes(path), compat.as_bytes(compression_type), status)
File "c:\users\kimduknam\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile: ./datasets/training.tfrecords : \udcc1\udcf6\udcc1\udca4\udcb5\udcc8 \udcb0\udce6\udcb7θ\udca6 ã\udcc0\udcbb \udcbc\udcf6 \udcbe\udcf8\udcbd\udcc0\udcb4ϴ\udcd9.
; No such process

test --visualize doesn't work

Entering this as a to-do item. It looks like it broke in the conversion to use TFRecordDataset

To fix, I think the proper solution is to:

  • add filename into the record in dataset.py:generate
  • read and set filename in data_gen.py:_parse_record
  • pass filename into bucket_data.append and save it in there
  • grab filename from batch data in model.py:test and pass it to visualize_attention()

I'm not planning to do this in the next few days so I wanted to put the plan here. If no one else fixes this, hopefully I will soon.

Visualizing attention: magic line

Hi,

Thank you for making the code so complete and for including the attention visualization mechanism!
2 questions and 1 small issue 😃

Issue first:
There is a small typo in this resize. It should use (mw, mh) in order to keep the original ratio of the image. PR for such a small thing is unnecessary, right ?


Q1:

I was trying to understand why for some of the last letters I don't get an attention map, and I came across this "magic" line:

attention_orig = np.convolve(attention_orig, [0.199547, 0.200226, 0.200454, 0.200226, 0.199547], mode='same') 

I think its purpose is to keep only the most relevant columns of the attention map. Is that correct ?
I'm curious how you chose those numbers :).


Q2:

Also, for me, the attention maps usually come after the letter. Here are the maps for 1 and 5:
image_0 image_1

It this normal? Is it because the LSTM can only decide "at the end" of the digit which one it saw ?

UnboundLocalError

@emedvedev , @brishtiteveja
Hi,
I follow the steps above the file,when i run aocr train datasets/training.tfrecords ,got the error :

File "/home/shengcheng/anaconda2/lib/python2.7/site-packages/aocr/model/model.py", line 351, in train
% (self.sess.run(self.global_step), step_time, loss, perplexity))
UnboundLocalError: local variable 'perplexity' referenced before assignment

I set perplexiy=100,then
2017-09-05 09:26:58.269298: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /device:GPU:0 for node 'map_1/while/foldr/while/TensorArrayReadV3/Enter' because the input edge from 'map_1/while/foldr/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0

I'm a beginner ,How should I change it? If I want to train Chinese, do I also modify other files ?
thank you very much indeed

AssertionError

I'm using TF 1.4.0 and am having an issue when training. When I run "aocr train datasets/training.tfrecords" I get the following error:

File "/usr/local/bin/aocr", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/aocr/main.py", line 306, in main
num_epoch=parameters.num_epoch
File "/usr/local/lib/python2.7/dist-packages/aocr/model/model.py", line 346, in train
for batch in s_gen.gen(self.batch_size):
File "/usr/local/lib/python2.7/dist-packages/aocr/util/data_gen.py", line 62, in gen
word = self.convert_lex(lex)
File "/usr/local/lib/python2.7/dist-packages/aocr/util/data_gen.py", line 80, in convert_lex
assert len(lex) < self.bucket_specs[-1][1]
AssertionError

Anyone have any idea what could be causing this? Do I need to downgrade my TF install?

Error assertion failed: [width must be <= target - offset]

While training following error occured, assertion failed: [width must be <= target - offset]
Detailed log as follows

aocr train datasets/training.tfrecords

2017-09-14 16:57:11.201375: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-14 16:57:11.201493: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-14 16:57:11,201 root INFO loading data
2017-09-14 16:57:11,244 root INFO phase: train
2017-09-14 16:57:11,245 root INFO model_dir: checkpoints
2017-09-14 16:57:11,245 root INFO load_model: True
2017-09-14 16:57:11,245 root INFO output_dir: results
2017-09-14 16:57:11,245 root INFO steps_per_checkpoint: 100
2017-09-14 16:57:11,245 root INFO batch_size: 65
2017-09-14 16:57:11,245 root INFO num_epoch: 1000
2017-09-14 16:57:11,246 root INFO learning_rate: 1
2017-09-14 16:57:11,246 root INFO reg_val: 0
2017-09-14 16:57:11,246 root INFO max_gradient_norm: 5.000000
2017-09-14 16:57:11,246 root INFO clip_gradients: True
2017-09-14 16:57:11,246 root INFO max_image_width 160.000000
2017-09-14 16:57:11,246 root INFO max_prediction_length 8.000000
2017-09-14 16:57:11,247 root INFO target_vocab_size: 39
2017-09-14 16:57:11,247 root INFO target_embedding_size: 10.000000
2017-09-14 16:57:11,247 root INFO attn_num_hidden: 128
2017-09-14 16:57:11,247 root INFO attn_num_layers: 2
2017-09-14 16:57:11,247 root INFO visualize: False
2017-09-14 16:57:19,646 root INFO Created model with fresh parameters.
2017-09-14 16:57:22,677 root INFO Starting the training process.
2017-09-14 16:57:24.235933: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /device:GPU:0 for node 'map_1/while/foldr/while/TensorArrayReadV3/Enter' because the input edge from 'map_1/while/foldr/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0
2017-09-14 16:57:24.236082: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /device:GPU:0 for node 'map_1/while/TensorArrayReadV3/Enter' because the input edge from 'map_1/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0
2017-09-14 16:57:24.236550: I tensorflow/core/common_runtime/simple_placer.cc:697] Ignoring device specification /device:GPU:0 for node 'map/while/TensorArrayReadV3/Enter' because the input edge from 'map/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0
Traceback (most recent call last):
File "/home/subhadeep/.local/bin/aocr", line 11, in
sys.exit(main())
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/main.py", line 238, in main
model.train()
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/model/model.py", line 306, in train
result = self.step(batch, self.forward_only)
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/model/model.py", line 391, in step
outputs = self.sess.run(output_feed, input_feed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [width must be <= target - offset]
[[Node: map/while/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](map/while/GreaterEqual, map/while/Assert/Assert/data_0)]]

Caused by op u'map/while/Assert/Assert', defined at:
File "/home/subhadeep/.local/bin/aocr", line 11, in
sys.exit(main())
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/main.py", line 235, in main
max_prediction_length=parameters.max_prediction,
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/model/model.py", line 125, in init
self.img_data = tf.map_fn(self._prepare_image, self.img_data, dtype=tf.float32)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/functional_ops.py", line 389, in map_fn
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2775, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2604, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2554, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/functional_ops.py", line 379, in compute
packed_fn_values = fn(packed_values)
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/model/model.py", line 456, in _prepare_image
padded = tf.image.pad_to_bounding_box(resized, 0, 0, self.height, self.max_width)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops_impl.py", line 472, in pad_to_bounding_box
'width must be <= target - offset')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops_impl.py", line 75, in _assert
return [control_flow_ops.Assert(cond, [msg])]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 124, in Assert
condition, data, summarize, name="Assert")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 35, in _assert
summarize=summarize, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [width must be <= target - offset]
[[Node: map/while/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](map/while/GreaterEqual, map/while/Assert/Assert/data_0)]]

Non-deterministic results on GPU

Hi @emedvedev ,

I ran test on same image multiple times using the readme command.
aocr test ./datasets/testing.tfrecords

Every time I ran the command, I'm getting same predicted word as output, but the inference probabilities are changing (including loss as well).

Run1:
Step 1 (1.096s). Accuracy: 100.00%, loss: 0.000364, perplexity: 1.00036, probability: 93.33% 100%

Run2:
Step 1 (0.988s). Accuracy: 100.00%, loss: 0.000260, perplexity: 1.00026, probability: 92.58% 100%

I've observed the same behavior when I used frozen checkpoint as well (probabilities are changing for the same image). Any reason why this is happening as it should not happen. Please let me know how to fix it.

aocr test does not work for real picture?

80% of http://www.cs.cmu.edu/~yuntiand/sample.tgz for train

the rest of it for testing

I trained and made model(model.ckpt-9000.data-00000-of-00001, model.ckpt-9000.index, model.ckpt-9000.meta) and aocr test command for the rest of the link sample data.

The result was great!, so I decided to aocr test command for my own picture.

I took a picture of a word(it was 'months' word) and edit size(to 187 x 31) and set background color to the same of the picture at ADOBE PHOTOSHOPmonths.(The reason I set the background color is aocr test did not work at the image which was applied only for size edit, so I just tried)

Anyway it did not work at all. I aocr dataset and make testing.tfrecords file and tried below and it does not make results folder and there is no step process.

C:\Users\kimduknam>aocr dataset ./datasets/annotations-testing.txt ./datasets/testing.tfrecords
2018-02-07 10:37:33.179317: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-02-07 10:37:33,182 root INFO Building a dataset from ./datasets/annotations-testing.txt.
2018-02-07 10:37:33,182 root INFO Output file: ./datasets/testing.tfrecords
2018-02-07 10:37:33,182 root INFO Processed 1 pairs.
2018-02-07 10:37:33,182 root INFO Dataset is ready: 1 pairs.
2018-02-07 10:37:33,182 root INFO Longest label (6): MONTHS

C:\Users\kimduknam>aocr test --max-prediction 30 --visualize ./datasets/testing.tfrecords
2018-02-07 10:37:39.292842: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-02-07 10:37:39,294 root INFO phase: test
2018-02-07 10:37:39,294 root INFO model_dir: ./checkpoints
2018-02-07 10:37:39,294 root INFO load_model: True
2018-02-07 10:37:39,294 root INFO output_dir: ./results
2018-02-07 10:37:39,294 root INFO steps_per_checkpoint: 0
2018-02-07 10:37:39,294 root INFO batch_size: 1
2018-02-07 10:37:39,294 root INFO learning_rate: 1
2018-02-07 10:37:39,294 root INFO reg_val: 0
2018-02-07 10:37:39,294 root INFO max_gradient_norm: 5.000000
2018-02-07 10:37:39,294 root INFO clip_gradients: True
2018-02-07 10:37:39,294 root INFO max_image_width 160.000000
2018-02-07 10:37:39,294 root INFO max_prediction_length 30.000000
2018-02-07 10:37:39,294 root INFO channels: 1
2018-02-07 10:37:39,294 root INFO target_embedding_size: 10.000000
2018-02-07 10:37:39,294 root INFO attn_num_hidden: 128
2018-02-07 10:37:39,294 root INFO attn_num_layers: 2
2018-02-07 10:37:39,294 root INFO visualize: True
2018-02-07 10:37:42,850 root INFO Reading model parameters from ./checkpoints\model.ckpt-9000

Recognize arbitrary length string

Hi,
I have followed the steps and successfully trained a machine to recognize characters from 0 to 9,
Currently it is able to recognize images with a single character, but unable to recognize arbitrary length of characters,
Please suggest me a way to make the OCR recognize arbitrary length of characters and properly arrange them as per the image

TypeError: descriptor 'encode' requires a 'unicode' object but received a 'str'

Hi,

I am getting the "TypeError: descriptor 'encode' requires a 'unicode' object but received a 'str'" when I build a TFRecords dataset by ruuing "aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords" command.

Following is the error traceback:

2017-10-09 23:49:00.678817: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-09 23:49:00.678867: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-09 23:49:00.678880: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-09 23:49:00,680 root INFO Building a dataset from ./datasets/annotations-training.txt.
2017-10-09 23:49:00,680 root INFO Output file: ./datasets/training.tfrecords
Traceback (most recent call last):
File "/usr/local/bin/aocr", line 11, in
load_entry_point('aocr==0.3.0', 'console_scripts', 'aocr')()
File "/usr/local/lib/python2.7/dist-packages/aocr/main.py", line 209, in main
dataset.generate(parameters.annotations_path, parameters.output_path, parameters.log_step)
File "/usr/local/lib/python2.7/dist-packages/aocr/util/dataset.py", line 29, in generate
'label': _bytes_feature(text_type.encode(label))}))
TypeError: descriptor 'encode' requires a 'unicode' object but received a 'str'

ERROR missing filename or label

This is on Windows 10
I tried many files, But below Error happened, and I tried only one file, the result was the same.

I tried 'aocr dataset datasets/annotations-training.txt datasets/training.tfrecords', but the result was the same.(C:\Users\kimduknam\datasets\images include jungle.jpg)

'annotations-training.txt' file is below.
./datasets/images/jungle.jpg jungle

C:\Users\kimduknam>aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
2018-02-05 17:39:28.312678: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-02-05 17:39:28,310 root INFO Building a dataset from ./datasets/annotations-training.txt.
2018-02-05 17:39:28,310 root INFO Output file: ./datasets/training.tfrecords
2018-02-05 17:39:28,310 root ERROR missing filename or label, ignoring line 1: ./datasets/images/jungle.jpg jungle
2018-02-05 17:39:28,310 root INFO Dataset is ready: 1 pairs.
2018-02-05 17:39:28,310 root INFO Longest label (0):

AttributeError: module 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_linear'

C:\Users\username>aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
Traceback (most recent call last):
File "C:\Python\Scripts\aocr-script.py", line 11, in
load_entry_point('aocr==0.6.0', 'console_scripts', 'aocr')()
File "C:\Python\lib\site-packages\pkg_resources_init_.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "C:\Python\lib\site-packages\pkg_resources_init_.py", line 2755, in load_entry_point
return ep.load()
File "C:\Python\lib\site-packages\pkg_resources_init_.py", line 2408, in load
return self.resolve()
File "C:\Python\lib\site-packages\pkg_resources_init_.py", line 2414, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in load_backward_compatible
File "C:\Python\lib\site-packages\aocr-0.6.0-py3.6.egg\aocr_main
.py", line 15, in
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "C:\Python\lib\site-packages\aocr-0.6.0-py3.6.egg\aocr\model\model.py", line 20, in
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "C:\Python\lib\site-packages\aocr-0.6.0-py3.6.egg\aocr\model\seq2seq_model.py", line 27, in
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "C:\Python\lib\site-packages\aocr-0.6.0-py3.6.egg\aocr\model\seq2seq.py", line 79, in
AttributeError: module 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_linear'

What should I do?

Error when Building TFRecords dataset.

I'm using TensorFlow 1.4r and when I attempt to build TFRecords I get the following error:

"File "/usr/local/lib/python2.7/dist-packages/aocr/model/seq2seq.py", line 74, in
linear = rnn_cell._linear # pylint: disable=protected-access
AttributeError: 'module' object has no attribute '_linear'"

I've tried editing seq2seq.py by changing line 71 from "from tensorflow.contrib.rnn.python.ops import rnn, rnn_cell"

To
"from tensorflow.contrib.rnn.python.ops import rnn
from tensorflow.python.ops import rnn_cell"

It didn't work though. Is there a solution to this error?

Poor results using exported model in some cases

I'm wondering if anyone else is seeing this.

I have a model trained with 20,000 synthesized images of between 1 and 50 characters, including spaces. Using the 'test' function, I'm getting good results--test images that have short text are usually 100%, and longer ones are usually off by just a character or two. So far so good.

I used the export function and ran the tensorflow_model_server, and then used a little python client to connect to it. With the same test images that I know the model can predict well, I'm seeing terrible results--the first 2-6 characters are usually right, but then almost complete gibberish.

Is anyone else using TensorFlow Serving, and if so, can you report how well it's working? If you're not using it, what are you doing instead? I figure I'll just make a little python wrapper around the "test" function (essentially) that I can communicate with from the rest of my system, but I'd prefer to use TensorFlow Serving if possible because it's one less thing for me to worry about.

error running aocr

After installing aocr (pip install aocr) in both my OS and the official tensor docker container(gcr.io/tensorflow/tensorflow) I'm getting this error even when I run aocr --help:

Traceback (most recent call last):
  File "/usr/local/bin/aocr", line 7, in <module>
    from aocr.__main__ import main
  File "/usr/local/lib/python2.7/dist-packages/aocr/__main__.py", line 15, in <module>
    from .model.model import Model
  File "/usr/local/lib/python2.7/dist-packages/aocr/model/model.py", line 20, in <module>
    from .seq2seq_model import Seq2SeqModel
  File "/usr/local/lib/python2.7/dist-packages/aocr/model/seq2seq_model.py", line 27, in <module>
    from .seq2seq import model_with_buckets
  File "/usr/local/lib/python2.7/dist-packages/aocr/model/seq2seq.py", line 74, in <module>
    linear = rnn_cell._linear  # pylint: disable=protected-access
AttributeError: 'module' object has no attribute '_linear'
# uname -a
Linux 71e18aa0cfc4 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

how to use this module?

how can i use this package in my code?
when i use that aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords line it shows command not found.
what to do after pip install aocr?
is there any tutorial available ?

Unable to export model

While exporting the model by using command
aocr export exported-model
following error is shown

subhadeep@sd-vm:~/Desktop/tensorflow_ocr/new_ocr$ aocr export exported-model
2017-08-31 18:06:08.565366: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-31 18:06:08.565580: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "/home/subhadeep/.local/bin/aocr", line 11, in
sys.exit(main())
File "/home/subhadeep/.local/lib/python2.7/site-packages/aocr/main.py", line 215, in main
data_path=parameters.dataset_path,
AttributeError: 'Namespace' object has no attribute 'dataset_path'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.