GithubHelp home page GithubHelp logo

holmeyoung / crnn-pytorch Goto Github PK

View Code? Open in Web Editor NEW
374.0 13.0 104.0 45 KB

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.

License: MIT License

Python 93.56% Lua 6.44%
cnn rnn ctc-loss ocr

crnn-pytorch's People

Contributors

holmeyoung avatar mineshmathew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crnn-pytorch's Issues

how to recognize blank, recognize English and Chinese in one model

Firstly, you codes are great. I trained with SynthText90k dataset and achieved very good performance on English words.

there are several questions. hopefully you can give me a hand. Thank you very much.
thanks for your time.

  1. How to recognize blank in one sentence?
    for example,I want to recognize "I love python"
    there is blank between I and love. how to handle this problem?
    just add blank in alphabet? like this? and prepare for the training data
    alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ """

  2. Can we recognize English and Chinese in one model?
    if we want to recognize English and Chinese in one model, how to do?
    just make alphabet contain all English and Chinese characters? just like this?
    alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ是不我一有大在人了中到資..."""

  3. if we want to recognize very long sentence?
    do you think it would be better to train with very long sentences or we can just train with short sentence?
    because your current model only support text length less than 26. so have to modify the network if I want to support training with long sentence.

why crnn.zero_grad() was removed in training?

cost = criterion(preds, text, preds_size, length) / batch_size
# crnn.zero_grad()
cost.backward()

why you removed crnn.zero_grad()? we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward.
So crnn.zero_grad() is necessary

測試資料不同長度圖片,但預測字數都是固定

請問, 我在訓練時給圖片長度是固定imgw= 160, 圖片裡面的字是4字含空白(標籤就是4個字), 或是圖片裡面是5個字沒有空白(標籤就是5個字)。
為什麼我將訓練好的模型,拿不同長度,比如說imgh=300 裡面有約10個字, 預測出來的字也是4-5個字,想當然是全錯, 是為什麼?

  1. 我在預測時候有等比例縮放圖片,就是固定高度32, 長度就等比例縮放
  2. 訓練和預測時候發現:
    零健里改狹------------------------------------ => 零健里改狹 , gt: 腎健員步狹
    而不是像一般看到:
    零-----健---里---改---------------------狹 ----=> 零健里改狹 , gt: 腎健員步狹

會是這地方出問題嗎?或是跟keep-ratio=True有關嗎

Loss turns into 'nan' when cuda is True

Hi,

I have used train.py for many times, and I had no issues. However, now when I use train.py, loss is always nan if cuda is True. I think the problem is on my laptop, so any idea how to solve this issue.
Thanks

Number of epochs

Do we have to add more epochs so it's able to recognize better in the demo phase?

I reached 260 and it gave good results but when the number of epochs increases it deviates from the right result and makes wrong guesses. But after that it gets better and later it deviates again.

Does it have to reach 1000 so that it can give the best results and never guesses wrong?

What do you think?

P.S: the program is good. It reached 95% accuracy, but when I want it to learn on noisy images it just takes sometimes to guess good. I shall be patient as you said ^_^

why use Variable to transform Tensor

Hello,thans for your sosution , but I'm a little confused about the Variable ,for exalple:

preds_size = Variable(torch.LongTensor([preds.size(0)] * batch_size))

I think pytorch has abandoned the Variable when version greater than 1.0.0

File name of training images

Hi,

Is it okay to name images' files as follow:
image: /Volumes/EXTERNAL/Models/crnn_chinese_characters_rec-master/to_lmdb/train_images/almofdal_11.jpg
label: المفضل

I wrote the label name of the image in English, and I have wrote the label of the image file in data file .txt in Arabic. Actually I was able to convert the files to lmdb, but when I train the model, it does not print out number of epochs and loss. It just show below info for some minutes and stoped.
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=71, bias=True)
)
)
)

数据集尺寸问题

请问您训练的图像尺寸都是一样的吗?如果我的数据集图像都不一样尺寸,是否要保持比例先行padding并resize到一样(32*100)?

Best tuning for certain applications

I have a dataset composed of English, japanese and korean characters (3340 characters in sum because of japanese kanji).

I can't seem to find the perfect parameters for such a problem, the accuracy is 0.000 mostly.

I tried an lr = 0.0001, epochs = 900 and batch size = 2.
However the accuracy is still not very good.

I'm wondering, when you have a large number of classes, what's the best way to train the model and changing the parameters? --> do we take it easy and give small values?

lmdb.Error when create dataset

Traceback (most recent call last):
File "/crnn-pytorch-master/tool/create_dataset.py", line 135, in
createDataset(args.out, image_path_list, label_list)
File "
/crnn-pytorch-master/tool/create_dataset.py", line 55, in createDataset
env = lmdb.open(outputPath, map_size=1099511627776)
lmdb.Error: ~/fake/lmdb: \ufffd\ufffd\ufffd\u033f\u057c\u4cbb\ufffd\u3863

Incorrect number of classes

Hey thanks for sharing the code, but I found a possible issue while training the network. While editing the number of characters in the alphabet.py file, I followed the guide and replaced the Chinese characters by English one and my network trained fine. But while reading the code and debugging I found that the nClass output dimension of the CRNN was 72 while the number of unique characters in the alphabet.py class was only 36. I eventually realized that that code is splitting the characters wrongly and considering \n newline as a character as well that's why the output dimension was [26x1x72] instead of [26x1x37] this can cause an issue in training. I can raise a PR fixing this if you want. Thanks.

crnn acc =0

Hi, sorry for the bothering, but I'm facing the problem.
All my accuracy during validation are 0 , that is really weird.
Have you ever facing this problem during training?
I just used your pre-trained model and with only 3000 pics of training set, because I only wanna quick check this model is work for me or not. If it works, I will put more dataset for my personal training.
My lr = 0.00005 and 0.0001 , both tried.
Image size = 120*32 with four Chinese characters.
keep_ratio =False

why set dealwith_lossnan = False by default

@Holmeyoung
dealwith_lossnan = False # whether to replace all nan/inf in gradients to zero
In the params.py why you set dealwith_lossnan = False?
to handle the problem "Just don't know why, but when i train the net, the loss always become nan after several epoch." should dealwith_lossnan be set as True?

When I activate 4 gpus at the same time, I get the following error

Hello, I'm using a aws instance with 4 gpus and when activated (in the params.py file - True multigpu and 4 for the number) I get the following error: (P.S: For 4, 3, 2 and even 1 which is incomprehensible even for 1):

CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace=True)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace=True)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace=True)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace=True)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace=True)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace=True)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace=True)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=7116, bias=True)
)
)
)
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Traceback (most recent call last):
File "train.py", line 253, in
cost = train(crnn, criterion, optimizer, train_iter)
File "train.py", line 241, in train
cost = criterion(preds, text, preds_size, length) / batch_size
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1295, in forward
self.zero_infinity)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1767, in ctc_loss
zero_infinity)
RuntimeError: input_lengths must be of size batch_size

The loss was very low at the beginning

firstly,thank you for your code.
I have 20000 images for training and 2000 for testing(all generate by code) when i started training the loss was always very small(e.g 0.09),can you tell me how can I deal with it.

How do we add more classes to the pretrained model

I wanted to use your pretrained model to add to it some alphabets.

But I get the error of size mistach because the number of classes is not the same.

How am I supposed to change in the last layer? I want to add more classes to the pretrained model (state).

换乘关于训练几次出现NAN的问题

你好,我用pytorch自带的CTC_loss训练一直出现NAN,把这个CTC_loss换成Warp-ctc,瞬间好了,浪费了两天时间,好气阿。被pytorch坑了第二次了,难受。大家不要再踩着个坑

loss is inf

why is it that my loss is always like below:
| [806/1000][400/410] Loss: inf
0|train | [807/1000][100/410] Loss: inf
0|train | [807/1000][200/410] Loss: inf
0|train | [807/1000][300/410] Loss: inf
0|train | [807/1000][400/410] Loss: inf
0|train | [808/1000][100/410] Loss: inf
0|train | [808/1000][200/410] Loss: inf
0|train | [808/1000][300/410] Loss: inf
0|train | [808/1000][400/410] Loss: inf
0|train | [809/1000][100/410] Loss: inf
0|train | [809/1000][200/410] Loss: inf
0|train | [809/1000][300/410] Loss: inf
0|train | [809/1000][400/410] Loss: inf
0|train | [810/1000][100/410] Loss: inf
0|train | [810/1000][200/410] Loss: inf

Error when encoding cpu_texts with custom dataset

Hi Holmeyoung,
I face this error when running train.py with a custom dataset
Annotation 2019-07-22 062747

I try text = b''.join(text) and it turn into another problem
Annotation 2019-07-22 063226

My question is: which is the proper type of cpu_texts (tuple of str or tuple of bytes)
I think that my custom lmdb dataset might be the problem, because cpu_images, cpu_texts = data returns tuple of bytes

When loading pretrained model with mutligpu True, I get the follwoing error

loading pretrained model from netCRNN_0_9000_1.pth
Traceback (most recent call last):
File "train.py", line 91, in
crnn = net_init()
File "train.py", line 88, in net_init
crnn.load_state_dict(torch.load(params.pretrained))
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.cnn.conv0.weight", "module.cnn.conv0.bias", "module.cnn.conv1.weight", "module.cnn.conv1.bias", "module.cnn.conv2.weight", "module.cnn.conv2.bias", "module.cnn.batchnorm2.weight", "module.cnn.batchnorm2.bias", "module.cnn.batchnorm2.running_mean", "module.cnn.batchnorm2.running_var", "module.cnn.conv3.weight", "module.cnn.conv3.bias", "module.cnn.conv4.weight", "module.cnn.conv4.bias", "module.cnn.batchnorm4.weight", "module.cnn.batchnorm4.bias", "module.cnn.batchnorm4.running_mean", "module.cnn.batchnorm4.running_var", "module.cnn.conv5.weight", "module.cnn.conv5.bias", "module.cnn.conv6.weight", "module.cnn.conv6.bias", "module.cnn.batchnorm6.weight", "module.cnn.batchnorm6.bias", "module.cnn.batchnorm6.running_mean", "module.cnn.batchnorm6.running_var", "module.rnn.0.rnn.weight_ih_l0", "module.rnn.0.rnn.weight_hh_l0", "module.rnn.0.rnn.bias_ih_l0", "module.rnn.0.rnn.bias_hh_l0", "module.rnn.0.rnn.weight_ih_l0_reverse", "module.rnn.0.rnn.weight_hh_l0_reverse", "module.rnn.0.rnn.bias_ih_l0_reverse", "module.rnn.0.rnn.bias_hh_l0_reverse", "module.rnn.0.embedding.weight", "module.rnn.0.embedding.bias", "module.rnn.1.rnn.weight_ih_l0", "module.rnn.1.rnn.weight_hh_l0", "module.rnn.1.rnn.bias_ih_l0", "module.rnn.1.rnn.bias_hh_l0", "module.rnn.1.rnn.weight_ih_l0_reverse", "module.rnn.1.rnn.weight_hh_l0_reverse", "module.rnn.1.rnn.bias_ih_l0_reverse", "module.rnn.1.rnn.bias_hh_l0_reverse", "module.rnn.1.embedding.weight", "module.rnn.1.embedding.bias".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked", "rnn.0.rnn.weight_ih_l0", "rnn.0.rnn.weight_hh_l0", "rnn.0.rnn.bias_ih_l0", "rnn.0.rnn.bias_hh_l0", "rnn.0.rnn.weight_ih_l0_reverse", "rnn.0.rnn.weight_hh_l0_reverse", "rnn.0.rnn.bias_ih_l0_reverse", "rnn.0.rnn.bias_hh_l0_reverse", "rnn.0.embedding.weight", "rnn.0.embedding.bias", "rnn.1.rnn.weight_ih_l0", "rnn.1.rnn.weight_hh_l0", "rnn.1.rnn.bias_ih_l0", "rnn.1.rnn.bias_hh_l0", "rnn.1.rnn.weight_ih_l0_reverse", "rnn.1.rnn.weight_hh_l0_reverse", "rnn.1.rnn.bias_ih_l0_reverse", "rnn.1.rnn.bias_hh_l0_reverse", "rnn.1.embedding.weight", "rnn.1.embedding.bias".

maybe the code itself support training with text length > 26

@Holmeyoung
in #17 you mentioned that your codes only support training with text length <= 26, I found that
(1) when resize the images to 100X32. length of the raw character output is 26. so we cannot train with text length > 26.
0

(2) when keep_ratio = True, only the height of the image is resized to 32, the width of the image is not fixed and vary for different images. so length of the raw character output is not fixed and depends on the width of the image, maybe we can train with any text length
2
3

conclusion: we can train with any text length when we set keep_ratio = True during training

Thank you so much.

Fine tune for longer text > 26

Hello, long time no see :))

I would like to ask how to fine tune the model (change the crnn inputs) for a text with longer length > 26 characters or a text with two lines or more?

Training the model using Arabic characters

Hi Holmeyoung,

I have built my own dataset, which consist of Arabic characters. I have followed your steps for building my dataset. I was able to convert my dataset to lmdb successfully. However, when I tried to train the model, got this error:
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=69, bias=True)
)
)
)
Traceback (most recent call last):
File "train.py", line 177, in
cost = trainBatch(crnn, criterion, optimizer)
File "train.py", line 154, in trainBatch
t, l = converter.encode(cpu_texts)
File "/Volumes/EXTERNAL/Models/crnn_pytorch-master/utils.py", line 61, in encode
index = self.dict[char]
KeyError: '٩'

It sounds like the model can't recognize Arabic characters. Any suggestions?

Thanks

Pretrained model innaccesible

Hello,

I hope you're doing good,

Is it okay if you put your pretained model "crnn.pth" on google drive?

Because pan.baidu demands a chinese phone which is inaccessible in some countries.
^_^
Thank you.

Finding the best parameters for trining

I have 304 training sample and 60 val samples,. I have been trying to train the model, but the accuracy started with 0.008333 and stays 0.000000. What would be the best parameters for my data?
image

Test accuracy

Hi,

I have 70 M training samples and 1 M validation samples. Test loss is reducing and accuracy has reached 0.83 but never exceeded 0.83. Now the number of epochs is 55, so should I wait or it will never get better?

training with variable length images and text

I have two questions:

  1. Do your codes currently support training with variable text length?

  2. Does "keep_ratio = True" work for training? If I want to train the model with with variable length images should I also have to modify create_dataset.py?

in this site (https://github.com/meijieru/crnn.pytorch) the author mentions that "If you want to train with variable length images (keep the origin ratio for example), please modify the tool/create_dataset.py and sort the image according to the text length"

Thanks a lot.

Cuda gpu

Hello,

I have a RAM with 13GB, I activated cuda in the params.py file but the training and test are still slow comparing to the capacity of my machine.

It's supposed to run quickly I mean.

I'm wondering if cuda is actually working or not in reality?

Create illegal symbol in File mode

As you said, there are limitations (illegal character) when you use Folder mode. But in File mode, these characters are also unallowed in Filename, So how can File mode solve this problem?'
Thank you.

Training accuracy

Hi Holmeyoung,

I have 20,000 training samples and 30 characters. I have been trying to train the model but the accuracy does not add up. How should I set the parameters?

Accuracy and epochs

Hello, long time no see :D

I wanna ask, (probably I did ask the same question before but I forgot the answer sorry ^^" lol ),

When I train the model (on about +7000 Japanese, english characters - 10M train samples and 1M test samples).

The accuracy gets high (about 50% while still in epoch 0 - let's say it has entered 5k images per pre-epoch), the loss is low 0.03 and still decreases though -
However when giving it a real life image case (the same as the test sample) it makes grave guessings (lol).

What do u think is the problem? Should I kill the process I mean? or wait for the epochs to finish?

Methodology for training

Hello,
Sorry for asking so many questions 😅
I was thinking of a way so that the program can grasp all the characters.

What if we start by teaching it for example a small amount of characters and then add the others little by little?

For example we give it a dataset of 50 characters only, we see how it performs, and then next time we add a dataset of new ones and see if it's able to differentiate the features.

Do you think an approach like this is fine? Or it's better to give it everything from the begining?

CTC 在训练几个 iter 后输出全为 0 的问题

你好,我用 torch.nn.CTCLoss 训练一个 cnn-rnn-ctc 网络的时候,迭代一段时间后, loss 值变小,网络预测出来的结果全部都是空白字符 0:

image

当我尝试把 lr 调小,迭代时间长了之后,网络仍然全部输出 0,这个问题的可能原因是什么呢?

Batch : N == 2
Seq : T == 9
Input Length 全部为 9
Target Length: 全部为 4

create_dataset.py

I am tying to use MJSynth 90 K, which continues millions of images. However, when I try to create dataset, only about 500K images can created. Is there a way of increasing this number?

Demo phase

After training the model and all, I would like to ask when we enter a new image so that our model would recognize it.

Is it able to recognize any composition of image? I mean even a text the model has never seen (been trained on) before?

For example:
We trained the model only on something like this:

AAA
BBB
CCC

And in the demo phase, we give it an image that contains ABC in it, will it be able to recognize it, or it can only recognize something it saw before?

字典里面字符的排列顺序重要吗?

碰到了一个很奇怪的事情,字典内容相同,把数字分别放在中文前面和中文后面,效果完全不同。当数字放在中文后面,然后训练,数字识别率为0,当数字放在中文前面,数字能被识别出来。求大佬解惑。 @Holmeyoung

assertion error arise

Hii sir, have created lmdb dataset using txt file of train as well as val, and i got data.mdb and lock .mdb of both the files but there is arising an assertion error while in the line during assert train , please help me out in this.

warp-ctc and torch.nn.CTCLoss

我想问一下warp-ctc是不是在直接是rnn的输出就好了?但是nn.ctcloss需要加一层softmax才能输入到ctc中,但是我看你的代码没有加softmax,这样是可以的吗?另外好像官网更新了ctcloss

input to train.py

I want to train with my own dataset. First I used the create_dataset.py and it created two files- data.mdb and lock.mdb.
Now i gave the same path of data.mdb to train and validate. If this is not right How do i split to train and validate.

And when i run train.py, I get the following error

File "train.py", line 60, in
train_dataset = dataset.lmdbDataset(root=opt.trainRoot)
File "/home/ramu_yarru/ctpn/crnn.pytorch/dataset.py", line 25, in init
meminit=False)
lmdb.Error: /home/ramu_yarru/ctpn/crnn_train/data/train/data.mdb: Not a directory

What exactly is the input to train.py ?

Fine-tuning

Hi Holmeyoung,

When I retrain a pre-trained model, it sounds like the model forgot what it has learned. I mean if I trained a model on synthetic images and fine-tuned the model with real-world images, the model accuracy on the synthetic images decreases. Please correct me if I am wrong, to use a pertained model, I should freeze the last layers. If so, how can I freeze the last layers?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.