Hi, Do I need to convert my own images to .hdf5 format before I can

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi @nimanamjouyan Please check the path of the model in <code class=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Using my own dataset to run prediction about transformer-ocr HOT 8 CLOSED

him4318 commented on June 3, 2024

Using my own dataset to run prediction

from transformer-ocr.

Comments (8)

him4318 commented on June 3, 2024

Hi @nimanamjouyan

It's up to you, but you can follow the pre-processing steps(normalizing, removing cursive style,) for an image in the dataset as it helps the model to learn better. HDF5 format was just used to save the images in one place, you can store yours as flat files and write a custom data loader function in PyTorch which will perform all the pre-processing steps on the image while yielding.

from transformer-ocr.

ohmycaptainnemo commented on June 3, 2024

@him4318

Thank you for that.

I have stored my image files inside a folder and slightly changed one of the cells in your notebook to use my images for prediction:
I changed this cell:

test_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'test',transform), batch_size=1, shuffle=False, num_workers=2)

predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

to this:

import torchvision.transforms as T

device = torch.device("cuda")
transform = T.Compose([
    # T.ToPILImage(),
    T.Resize((1024,128)),
    T.ToTensor()])

test_dataset = datasets.ImageFolder('/content/mydata/', transform=transform) #my images are inside a folder called data inside mydata folder
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2)
predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

But not only my images show as all black, but also I do not get any useful predictions. Am I doing something wrong here?

from transformer-ocr.

him4318 commented on June 3, 2024

If you are using the trained model provided by me then images should be in the same format i.e same preprocessing should be done on the images as I did, to get accurate results.

You can use the CLI provided in the code to run a prediction for an image. If you go through the code you will see the appropriate steps to convert an image to a required format.

from transformer-ocr.

ohmycaptainnemo commented on June 3, 2024

Thank you Himanshu,

That code was extremely helpful.
I ended up running the following code segment in notebook based on the code you showed me ( I used your pretrained weights):

from google.colab.patches import cv2_imshow
from data import preproc as pp

input_size = (1024, 128, 1)
max_text_length = 256
charset_base = string.printable[:95]
tokenizer = Tokenizer(chars=charset_base, max_text_length=max_text_length)

path_2_im = '/content/data/3.PNG'
target_path = '/content/Transformer_ocr/src/resnet_best.pt'


img = pp.preprocess(path_2_im, input_size=input_size)


#making image compitable with resnet
img = np.repeat(img[..., np.newaxis],3, -1)
x_test = pp.normalization(img)


# model = make_model(tokenizer.vocab_size, hidden_dim=256, nheads=4,
#           num_encoder_layers=4, num_decoder_layers=4)
# device = torch.device(device)

model = make_model(vocab_len=100)
_=model.to(device)

transform = T.Compose([
        T.ToTensor()])
        

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

prediction = single_image_inference(model, x_test, tokenizer, transform, device)

print("\n####################################")
print("predicted text is: {}".format(prediction))
cv2_imshow(cv2.imread(path_2_im))
print("\n####################################")

I used one of your images and I got this:

The outcome is very different from yours in your notebook.
More importantly, I noticed something interesting. The function:

img = pp.preprocess(path_2_im, input_size=input_size)

which is strange. It seems the image is turned vertically for whatever reason.
I also tried a number of other images and still had no luck.

from transformer-ocr.

him4318 commented on June 3, 2024

Hi @nimanamjouyan

Please check the path of the model in model.load_state_dict(torch.load(target_path)) as you are getting just random output. I checked on my end it is working fine.
Image is transformed to like this only while pre-processing that is nothing to worry about as we are getting the features only from resnet.

from transformer-ocr.

ohmycaptainnemo commented on June 3, 2024

Hi @him4318

Thank you.
The path is definitely correct and the model exists there. Because otherwise this if statement would tell me that it does not:

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

Thank you for clarifying the preprocessing functions

from transformer-ocr.

him4318 commented on June 3, 2024

Hi @nimanamjouyan

I tried the same steps in the notebook and the result is fine.

from transformer-ocr.

ohmycaptainnemo commented on June 3, 2024

Hi @him4318

I get the same result as you with that image. That is interesting.

Thank you.

from transformer-ocr.

Using my own dataset to run prediction about transformer-ocr HOT 8 CLOSED

Comments (8)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs