GithubHelp home page GithubHelp logo

Comments (8)

him4318 avatar him4318 commented on June 3, 2024

Hi @nimanamjouyan

It's up to you, but you can follow the pre-processing steps(normalizing, removing cursive style,) for an image in the dataset as it helps the model to learn better. HDF5 format was just used to save the images in one place, you can store yours as flat files and write a custom data loader function in PyTorch which will perform all the pre-processing steps on the image while yielding.

from transformer-ocr.

ohmycaptainnemo avatar ohmycaptainnemo commented on June 3, 2024

@him4318

Thank you for that.

I have stored my image files inside a folder and slightly changed one of the cells in your notebook to use my images for prediction:
I changed this cell:

test_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'test',transform), batch_size=1, shuffle=False, num_workers=2)

predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

to this:

import torchvision.transforms as T

device = torch.device("cuda")
transform = T.Compose([
    # T.ToPILImage(),
    T.Resize((1024,128)),
    T.ToTensor()])

test_dataset = datasets.ImageFolder('/content/mydata/', transform=transform) #my images are inside a folder called data inside mydata folder
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2)
predicts, gt, imgs = test(model, test_loader, max_text_length)

predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))

But not only my images show as all black, but also I do not get any useful predictions. Am I doing something wrong here?

from transformer-ocr.

him4318 avatar him4318 commented on June 3, 2024

If you are using the trained model provided by me then images should be in the same format i.e same preprocessing should be done on the images as I did, to get accurate results.

You can use the CLI provided in the code to run a prediction for an image. If you go through the code you will see the appropriate steps to convert an image to a required format.

from transformer-ocr.

ohmycaptainnemo avatar ohmycaptainnemo commented on June 3, 2024

Thank you Himanshu,

That code was extremely helpful.
I ended up running the following code segment in notebook based on the code you showed me ( I used your pretrained weights):

from google.colab.patches import cv2_imshow
from data import preproc as pp

input_size = (1024, 128, 1)
max_text_length = 256
charset_base = string.printable[:95]
tokenizer = Tokenizer(chars=charset_base, max_text_length=max_text_length)

path_2_im = '/content/data/3.PNG'
target_path = '/content/Transformer_ocr/src/resnet_best.pt'


img = pp.preprocess(path_2_im, input_size=input_size)


#making image compitable with resnet
img = np.repeat(img[..., np.newaxis],3, -1)
x_test = pp.normalization(img)


# model = make_model(tokenizer.vocab_size, hidden_dim=256, nheads=4,
#           num_encoder_layers=4, num_decoder_layers=4)
# device = torch.device(device)

model = make_model(vocab_len=100)
_=model.to(device)

transform = T.Compose([
        T.ToTensor()])
        

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

prediction = single_image_inference(model, x_test, tokenizer, transform, device)

print("\n####################################")
print("predicted text is: {}".format(prediction))
cv2_imshow(cv2.imread(path_2_im))
print("\n####################################")

I used one of your images and I got this:

Capture

The outcome is very different from yours in your notebook.
More importantly, I noticed something interesting. The function:

img = pp.preprocess(path_2_im, input_size=input_size)

is

image

which is strange. It seems the image is turned vertically for whatever reason.
I also tried a number of other images and still had no luck.

from transformer-ocr.

him4318 avatar him4318 commented on June 3, 2024

Hi @nimanamjouyan

Please check the path of the model in model.load_state_dict(torch.load(target_path)) as you are getting just random output. I checked on my end it is working fine.
Image is transformed to like this only while pre-processing that is nothing to worry about as we are getting the features only from resnet.

from transformer-ocr.

ohmycaptainnemo avatar ohmycaptainnemo commented on June 3, 2024

Hi @him4318

Thank you.
The path is definitely correct and the model exists there. Because otherwise this if statement would tell me that it does not:

if os.path.exists(target_path):
    model.load_state_dict(torch.load(target_path))            
else:            
    print('No model checkpoint found')

Thank you for clarifying the preprocessing functions

from transformer-ocr.

him4318 avatar him4318 commented on June 3, 2024

Hi @nimanamjouyan

I tried the same steps in the notebook and the result is fine.

image

from transformer-ocr.

ohmycaptainnemo avatar ohmycaptainnemo commented on June 3, 2024

Hi @him4318

I get the same result as you with that image. That is interesting.

Thank you.

from transformer-ocr.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.