Comments (8)
Hi @nimanamjouyan
It's up to you, but you can follow the pre-processing steps(normalizing, removing cursive style,) for an image in the dataset as it helps the model to learn better. HDF5 format was just used to save the images in one place, you can store yours as flat files and write a custom data loader function in PyTorch which will perform all the pre-processing steps on the image while yielding.
from transformer-ocr.
Thank you for that.
I have stored my image files inside a folder and slightly changed one of the cells in your notebook to use my images for prediction:
I changed this cell:
test_loader = torch.utils.data.DataLoader(DataGenerator(source_path,charset_base,max_text_length,'test',transform), batch_size=1, shuffle=False, num_workers=2)
predicts, gt, imgs = test(model, test_loader, max_text_length)
predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))
to this:
import torchvision.transforms as T
device = torch.device("cuda")
transform = T.Compose([
# T.ToPILImage(),
T.Resize((1024,128)),
T.ToTensor()])
test_dataset = datasets.ImageFolder('/content/mydata/', transform=transform) #my images are inside a folder called data inside mydata folder
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2)
predicts, gt, imgs = test(model, test_loader, max_text_length)
predicts = list(map(lambda x : x.replace('SOS','').replace('EOS',''),predicts))
gt = list(map(lambda x : x.replace('SOS','').replace('EOS',''),gt))
But not only my images show as all black, but also I do not get any useful predictions. Am I doing something wrong here?
from transformer-ocr.
If you are using the trained model provided by me then images should be in the same format i.e same preprocessing should be done on the images as I did, to get accurate results.
You can use the CLI provided in the code to run a prediction for an image. If you go through the code you will see the appropriate steps to convert an image to a required format.
from transformer-ocr.
Thank you Himanshu,
That code was extremely helpful.
I ended up running the following code segment in notebook based on the code you showed me ( I used your pretrained weights):
from google.colab.patches import cv2_imshow
from data import preproc as pp
input_size = (1024, 128, 1)
max_text_length = 256
charset_base = string.printable[:95]
tokenizer = Tokenizer(chars=charset_base, max_text_length=max_text_length)
path_2_im = '/content/data/3.PNG'
target_path = '/content/Transformer_ocr/src/resnet_best.pt'
img = pp.preprocess(path_2_im, input_size=input_size)
#making image compitable with resnet
img = np.repeat(img[..., np.newaxis],3, -1)
x_test = pp.normalization(img)
# model = make_model(tokenizer.vocab_size, hidden_dim=256, nheads=4,
# num_encoder_layers=4, num_decoder_layers=4)
# device = torch.device(device)
model = make_model(vocab_len=100)
_=model.to(device)
transform = T.Compose([
T.ToTensor()])
if os.path.exists(target_path):
model.load_state_dict(torch.load(target_path))
else:
print('No model checkpoint found')
prediction = single_image_inference(model, x_test, tokenizer, transform, device)
print("\n####################################")
print("predicted text is: {}".format(prediction))
cv2_imshow(cv2.imread(path_2_im))
print("\n####################################")
I used one of your images and I got this:
The outcome is very different from yours in your notebook.
More importantly, I noticed something interesting. The function:
img = pp.preprocess(path_2_im, input_size=input_size)
is
which is strange. It seems the image is turned vertically for whatever reason.
I also tried a number of other images and still had no luck.
from transformer-ocr.
Hi @nimanamjouyan
Please check the path of the model in model.load_state_dict(torch.load(target_path))
as you are getting just random output. I checked on my end it is working fine.
Image is transformed to like this only while pre-processing that is nothing to worry about as we are getting the features only from resnet.
from transformer-ocr.
Hi @him4318
Thank you.
The path is definitely correct and the model exists there. Because otherwise this if statement would tell me that it does not:
if os.path.exists(target_path):
model.load_state_dict(torch.load(target_path))
else:
print('No model checkpoint found')
Thank you for clarifying the preprocessing functions
from transformer-ocr.
Hi @nimanamjouyan
I tried the same steps in the notebook and the result is fine.
from transformer-ocr.
Hi @him4318
I get the same result as you with that image. That is interesting.
Thank you.
from transformer-ocr.
Related Issues (19)
- LR and loss HOT 1
- Different dataset HOT 1
- Sain Gall and Washington links doesnt work HOT 1
- TypeError: iteration over a 0-d tensor HOT 6
- Google Colab link HOT 1
- Need help
- Question regarding error metrics/dataset creation HOT 1
- The link of 'Rimes' dataset can't be opened. Can you share it in some other way? Thansk! HOT 1
- Dataset Links
- AttributeError: 'numpy.ndarray' object has no attribute 'decode' HOT 1
- TypeError: normalize() argument 2 must be str, not numpy.ndarray HOT 8
- Rimes Dataset Access HOT 1
- Notebook loading error HOT 3
- RuntimeError: shape '[-1, 100]' is invalid for input of size 201168 HOT 1
- Invalid argument:Not enough time for target transition sequence
- how to replace the training process of CNN(resnet101) with a pretrained one ? HOT 5
- Issue with the .hdf5 file HOT 7
- Low performance when training your model on RWTH split on IAM HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformer-ocr.