holmeyoung / crnn-pytorch Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
License: MIT License
Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
License: MIT License
Firstly, you codes are great. I trained with SynthText90k dataset and achieved very good performance on English words.
there are several questions. hopefully you can give me a hand. Thank you very much.
thanks for your time.
How to recognize blank in one sentence?
for example,I want to recognize "I love python"
there is blank between I and love. how to handle this problem?
just add blank in alphabet? like this? and prepare for the training data
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ """
Can we recognize English and Chinese in one model?
if we want to recognize English and Chinese in one model, how to do?
just make alphabet contain all English and Chinese characters? just like this?
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ是不我一有大在人了中到資..."""
if we want to recognize very long sentence?
do you think it would be better to train with very long sentences or we can just train with short sentence?
because your current model only support text length less than 26. so have to modify the network if I want to support training with long sentence.
cost = criterion(preds, text, preds_size, length) / batch_size
# crnn.zero_grad()
cost.backward()
why you removed crnn.zero_grad()? we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward.
So crnn.zero_grad() is necessary
請問, 我在訓練時給圖片長度是固定imgw= 160, 圖片裡面的字是4字含空白(標籤就是4個字), 或是圖片裡面是5個字沒有空白(標籤就是5個字)。
為什麼我將訓練好的模型,拿不同長度,比如說imgh=300 裡面有約10個字, 預測出來的字也是4-5個字,想當然是全錯, 是為什麼?
會是這地方出問題嗎?或是跟keep-ratio=True有關嗎
Hi,
I have used train.py for many times, and I had no issues. However, now when I use train.py, loss is always nan if cuda is True. I think the problem is on my laptop, so any idea how to solve this issue.
Thanks
Do we have to add more epochs so it's able to recognize better in the demo phase?
I reached 260 and it gave good results but when the number of epochs increases it deviates from the right result and makes wrong guesses. But after that it gets better and later it deviates again.
Does it have to reach 1000 so that it can give the best results and never guesses wrong?
What do you think?
P.S: the program is good. It reached 95% accuracy, but when I want it to learn on noisy images it just takes sometimes to guess good. I shall be patient as you said ^_^
Hi, I have used the same code but for a different language. The issue is I'm getting a zero accuracy for all the epochs and the network is barely learning anything.
Please help me out.
I've attached my training output file.
slurm-synthetic_train-output-23-01-20_new.txt
Hello,thans for your sosution , but I'm a little confused about the Variable ,for exalple:
preds_size = Variable(torch.LongTensor([preds.size(0)] * batch_size))
I think pytorch has abandoned the Variable when version greater than 1.0.0
Hi,
Is it okay to name images' files as follow:
image: /Volumes/EXTERNAL/Models/crnn_chinese_characters_rec-master/to_lmdb/train_images/almofdal_11.jpg
label: المفضل
I wrote the label name of the image in English, and I have wrote the label of the image file in data file .txt in Arabic. Actually I was able to convert the files to lmdb, but when I train the model, it does not print out number of epochs and loss. It just show below info for some minutes and stoped.
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=71, bias=True)
)
)
)
请问您训练的图像尺寸都是一样的吗?如果我的数据集图像都不一样尺寸,是否要保持比例先行padding并resize到一样(32*100)?
Can it be trained on any text length?
I have a dataset composed of English, japanese and korean characters (3340 characters in sum because of japanese kanji).
I can't seem to find the perfect parameters for such a problem, the accuracy is 0.000 mostly.
I tried an lr = 0.0001, epochs = 900 and batch size = 2.
However the accuracy is still not very good.
I'm wondering, when you have a large number of classes, what's the best way to train the model and changing the parameters? --> do we take it easy and give small values?
Traceback (most recent call last):
File "/crnn-pytorch-master/tool/create_dataset.py", line 135, in /crnn-pytorch-master/tool/create_dataset.py", line 55, in createDataset
createDataset(args.out, image_path_list, label_list)
File "
env = lmdb.open(outputPath, map_size=1099511627776)
lmdb.Error: ~/fake/lmdb: \ufffd\ufffd\ufffd\u033f\u057c\u4cbb\ufffd\u3863
Hey thanks for sharing the code, but I found a possible issue while training the network. While editing the number of characters in the alphabet.py
file, I followed the guide and replaced the Chinese characters by English one and my network trained fine. But while reading the code and debugging I found that the nClass
output dimension of the CRNN was 72
while the number of unique characters in the alphabet.py
class was only 36. I eventually realized that that code is splitting the characters wrongly and considering \n
newline as a character as well that's why the output dimension was [26x1x72]
instead of [26x1x37]
this can cause an issue in training. I can raise a PR fixing this if you want. Thanks.
运行demo时输出全是乱码?
Hi, sorry for the bothering, but I'm facing the problem.
All my accuracy during validation are 0 , that is really weird.
Have you ever facing this problem during training?
I just used your pre-trained model and with only 3000 pics of training set, because I only wanna quick check this model is work for me or not. If it works, I will put more dataset for my personal training.
My lr = 0.00005 and 0.0001 , both tried.
Image size = 120*32 with four Chinese characters.
keep_ratio =False
@Holmeyoung
dealwith_lossnan = False # whether to replace all nan/inf in gradients to zero
In the params.py why you set dealwith_lossnan = False?
to handle the problem "Just don't know why, but when i train the net, the loss always become nan after several epoch." should dealwith_lossnan be set as True?
Hello, I'm using a aws instance with 4 gpus and when activated (in the params.py file - True multigpu and 4 for the number) I get the following error: (P.S: For 4, 3, 2 and even 1 which is incomprehensible even for 1):
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace=True)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace=True)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace=True)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace=True)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace=True)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace=True)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace=True)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=7116, bias=True)
)
)
)
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Traceback (most recent call last):
File "train.py", line 253, in
cost = train(crnn, criterion, optimizer, train_iter)
File "train.py", line 241, in train
cost = criterion(preds, text, preds_size, length) / batch_size
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1295, in forward
self.zero_infinity)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1767, in ctc_loss
zero_infinity)
RuntimeError: input_lengths must be of size batch_size
firstly,thank you for your code.
I have 20000 images for training and 2000 for testing(all generate by code) when i started training the loss was always very small(e.g 0.09),can you tell me how can I deal with it.
I wanted to use your pretrained model to add to it some alphabets.
But I get the error of size mistach because the number of classes is not the same.
How am I supposed to change in the last layer? I want to add more classes to the pretrained model (state).
你好,我用pytorch自带的CTC_loss训练一直出现NAN,把这个CTC_loss换成Warp-ctc,瞬间好了,浪费了两天时间,好气阿。被pytorch坑了第二次了,难受。大家不要再踩着个坑
why is it that my loss is always like below:
| [806/1000][400/410] Loss: inf
0|train | [807/1000][100/410] Loss: inf
0|train | [807/1000][200/410] Loss: inf
0|train | [807/1000][300/410] Loss: inf
0|train | [807/1000][400/410] Loss: inf
0|train | [808/1000][100/410] Loss: inf
0|train | [808/1000][200/410] Loss: inf
0|train | [808/1000][300/410] Loss: inf
0|train | [808/1000][400/410] Loss: inf
0|train | [809/1000][100/410] Loss: inf
0|train | [809/1000][200/410] Loss: inf
0|train | [809/1000][300/410] Loss: inf
0|train | [809/1000][400/410] Loss: inf
0|train | [810/1000][100/410] Loss: inf
0|train | [810/1000][200/410] Loss: inf
Hi Holmeyoung,
I face this error when running train.py with a custom dataset
I try text = b''.join(text)
and it turn into another problem
My question is: which is the proper type of cpu_texts (tuple of str or tuple of bytes)
I think that my custom lmdb dataset might be the problem, because cpu_images, cpu_texts = data
returns tuple of bytes
loading pretrained model from netCRNN_0_9000_1.pth
Traceback (most recent call last):
File "train.py", line 91, in
crnn = net_init()
File "train.py", line 88, in net_init
crnn.load_state_dict(torch.load(params.pretrained))
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.cnn.conv0.weight", "module.cnn.conv0.bias", "module.cnn.conv1.weight", "module.cnn.conv1.bias", "module.cnn.conv2.weight", "module.cnn.conv2.bias", "module.cnn.batchnorm2.weight", "module.cnn.batchnorm2.bias", "module.cnn.batchnorm2.running_mean", "module.cnn.batchnorm2.running_var", "module.cnn.conv3.weight", "module.cnn.conv3.bias", "module.cnn.conv4.weight", "module.cnn.conv4.bias", "module.cnn.batchnorm4.weight", "module.cnn.batchnorm4.bias", "module.cnn.batchnorm4.running_mean", "module.cnn.batchnorm4.running_var", "module.cnn.conv5.weight", "module.cnn.conv5.bias", "module.cnn.conv6.weight", "module.cnn.conv6.bias", "module.cnn.batchnorm6.weight", "module.cnn.batchnorm6.bias", "module.cnn.batchnorm6.running_mean", "module.cnn.batchnorm6.running_var", "module.rnn.0.rnn.weight_ih_l0", "module.rnn.0.rnn.weight_hh_l0", "module.rnn.0.rnn.bias_ih_l0", "module.rnn.0.rnn.bias_hh_l0", "module.rnn.0.rnn.weight_ih_l0_reverse", "module.rnn.0.rnn.weight_hh_l0_reverse", "module.rnn.0.rnn.bias_ih_l0_reverse", "module.rnn.0.rnn.bias_hh_l0_reverse", "module.rnn.0.embedding.weight", "module.rnn.0.embedding.bias", "module.rnn.1.rnn.weight_ih_l0", "module.rnn.1.rnn.weight_hh_l0", "module.rnn.1.rnn.bias_ih_l0", "module.rnn.1.rnn.bias_hh_l0", "module.rnn.1.rnn.weight_ih_l0_reverse", "module.rnn.1.rnn.weight_hh_l0_reverse", "module.rnn.1.rnn.bias_ih_l0_reverse", "module.rnn.1.rnn.bias_hh_l0_reverse", "module.rnn.1.embedding.weight", "module.rnn.1.embedding.bias".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked", "rnn.0.rnn.weight_ih_l0", "rnn.0.rnn.weight_hh_l0", "rnn.0.rnn.bias_ih_l0", "rnn.0.rnn.bias_hh_l0", "rnn.0.rnn.weight_ih_l0_reverse", "rnn.0.rnn.weight_hh_l0_reverse", "rnn.0.rnn.bias_ih_l0_reverse", "rnn.0.rnn.bias_hh_l0_reverse", "rnn.0.embedding.weight", "rnn.0.embedding.bias", "rnn.1.rnn.weight_ih_l0", "rnn.1.rnn.weight_hh_l0", "rnn.1.rnn.bias_ih_l0", "rnn.1.rnn.bias_hh_l0", "rnn.1.rnn.weight_ih_l0_reverse", "rnn.1.rnn.weight_hh_l0_reverse", "rnn.1.rnn.bias_ih_l0_reverse", "rnn.1.rnn.bias_hh_l0_reverse", "rnn.1.embedding.weight", "rnn.1.embedding.bias".
@Holmeyoung
in #17 you mentioned that your codes only support training with text length <= 26, I found that
(1) when resize the images to 100X32. length of the raw character output is 26. so we cannot train with text length > 26.
(2) when keep_ratio = True, only the height of the image is resized to 32, the width of the image is not fixed and vary for different images. so length of the raw character output is not fixed and depends on the width of the image, maybe we can train with any text length
conclusion: we can train with any text length when we set keep_ratio = True during training
Thank you so much.
Hello, long time no see :))
I would like to ask how to fine tune the model (change the crnn inputs) for a text with longer length > 26 characters or a text with two lines or more?
Hi Holmeyoung,
I have built my own dataset, which consist of Arabic characters. I have followed your steps for building my dataset. I was able to convert my dataset to lmdb successfully. However, when I tried to train the model, got this error:
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu0): ReLU(inplace)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu1): ReLU(inplace)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu3): ReLU(inplace)
(pooling2): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu5): ReLU(inplace)
(pooling3): MaxPool2d(kernel_size=(2, 2), stride=(2, 1), padding=(0, 1), dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace)
)
(rnn): Sequential(
(0): BidirectionalLSTM(
(rnn): LSTM(512, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=256, bias=True)
)
(1): BidirectionalLSTM(
(rnn): LSTM(256, 256, bidirectional=True)
(embedding): Linear(in_features=512, out_features=69, bias=True)
)
)
)
Traceback (most recent call last):
File "train.py", line 177, in
cost = trainBatch(crnn, criterion, optimizer)
File "train.py", line 154, in trainBatch
t, l = converter.encode(cpu_texts)
File "/Volumes/EXTERNAL/Models/crnn_pytorch-master/utils.py", line 61, in encode
index = self.dict[char]
KeyError: '٩'
It sounds like the model can't recognize Arabic characters. Any suggestions?
Thanks
Hello,
I hope you're doing good,
Is it okay if you put your pretained model "crnn.pth" on google drive?
Because pan.baidu demands a chinese phone which is inaccessible in some countries.
^_^
Thank you.
Hi,
I have 70 M training samples and 1 M validation samples. Test loss is reducing and accuracy has reached 0.83 but never exceeded 0.83. Now the number of epochs is 55, so should I wait or it will never get better?
I have two questions:
Do your codes currently support training with variable text length?
Does "keep_ratio = True" work for training? If I want to train the model with with variable length images should I also have to modify create_dataset.py?
in this site (https://github.com/meijieru/crnn.pytorch) the author mentions that "If you want to train with variable length images (keep the origin ratio for example), please modify the tool/create_dataset.py and sort the image according to the text length"
Thanks a lot.
Hello,
I have a RAM with 13GB, I activated cuda in the params.py file but the training and test are still slow comparing to the capacity of my machine.
It's supposed to run quickly I mean.
I'm wondering if cuda is actually working or not in reality?
As you said, there are limitations (illegal character) when you use Folder mode. But in File mode, these characters are also unallowed in Filename, So how can File mode solve this problem?'
Thank you.
Hi Holmeyoung,
I have 20,000 training samples and 30 characters. I have been trying to train the model but the accuracy does not add up. How should I set the parameters?
Hello, long time no see :D
I wanna ask, (probably I did ask the same question before but I forgot the answer sorry ^^" lol ),
When I train the model (on about +7000 Japanese, english characters - 10M train samples and 1M test samples).
The accuracy gets high (about 50% while still in epoch 0 - let's say it has entered 5k images per pre-epoch), the loss is low 0.03 and still decreases though -
However when giving it a real life image case (the same as the test sample) it makes grave guessings (lol).
What do u think is the problem? Should I kill the process I mean? or wait for the epochs to finish?
Hello,
Sorry for asking so many questions 😅
I was thinking of a way so that the program can grasp all the characters.
What if we start by teaching it for example a small amount of characters and then add the others little by little?
For example we give it a dataset of 50 characters only, we see how it performs, and then next time we add a dataset of new ones and see if it's able to differentiate the features.
Do you think an approach like this is fine? Or it's better to give it everything from the begining?
I am tying to use MJSynth 90 K, which continues millions of images. However, when I try to create dataset, only about 500K images can created. Is there a way of increasing this number?
After training the model and all, I would like to ask when we enter a new image so that our model would recognize it.
Is it able to recognize any composition of image? I mean even a text the model has never seen (been trained on) before?
For example:
We trained the model only on something like this:
AAA
BBB
CCC
And in the demo phase, we give it an image that contains ABC in it, will it be able to recognize it, or it can only recognize something it saw before?
I tried to reproduce your code, but failed, cost stuck at 7.0. Code is here:
https://github.com/jaysimon/crnn-houwei
By using SyntheticChineseStringDataset, your code runs well, cost decrease from 8.0 to 2.0 or less. Your model predicts well.
While I tried to build the model by myself, cost stuck at 7.0.
Any idea or advice?
Thank you very much.
碰到了一个很奇怪的事情,字典内容相同,把数字分别放在中文前面和中文后面,效果完全不同。当数字放在中文后面,然后训练,数字识别率为0,当数字放在中文前面,数字能被识别出来。求大佬解惑。 @Holmeyoung
Hii sir, have created lmdb dataset using txt file of train as well as val, and i got data.mdb and lock .mdb of both the files but there is arising an assertion error while in the line during assert train , please help me out in this.
我想问一下warp-ctc是不是在直接是rnn的输出就好了?但是nn.ctcloss需要加一层softmax才能输入到ctc中,但是我看你的代码没有加softmax,这样是可以的吗?另外好像官网更新了ctcloss
I want to train with my own dataset. First I used the create_dataset.py and it created two files- data.mdb and lock.mdb.
Now i gave the same path of data.mdb to train and validate. If this is not right How do i split to train and validate.
And when i run train.py, I get the following error
File "train.py", line 60, in
train_dataset = dataset.lmdbDataset(root=opt.trainRoot)
File "/home/ramu_yarru/ctpn/crnn.pytorch/dataset.py", line 25, in init
meminit=False)
lmdb.Error: /home/ramu_yarru/ctpn/crnn_train/data/train/data.mdb: Not a directory
What exactly is the input to train.py ?
Hi Holmeyoung,
When I retrain a pre-trained model, it sounds like the model forgot what it has learned. I mean if I trained a model on synthetic images and fine-tuned the model with real-world images, the model accuracy on the synthetic images decreases. Please correct me if I am wrong, to use a pertained model, I should freeze the last layers. If so, how can I freeze the last layers?
Thanks
Hello, I wanted to ask if we could merge two .pth files trained on different dataset of this project?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.