Comments (6)
Hi
It shouldn't be this bad. Few things you can look for are the correct max length of the line, the same pre-processing steps are applied on the train and test images, you can also look for the training and validation loss if the model is over-fitting or not.
At last, you can train your model using arthurflor repository and check the metrics.
from transformer-ocr.
Hi,
It is overfitting, but you save the model with the best validation loss, so this shouldn't be a problem.
The only thing that could go wrong, is my constructor for the hdf file.
Do you know where I can see a reference for properly constructing data in the form that you need them?
Thank you,
PS: I used the following code:
Import torch
from torch import nn
from torch.utils.data import Subset, Dataset, DataLoader
from torch.nn import functional as F
import torchvision
import math
import numpy as np
from torchvision.transforms.functional import resize, pil_to_tensor, normalize
import os
import PIL
import copy
from tqdm import trange
import h5py
DATA_DIR = "<PATH-TO>/IAM/lines/"
ASCII_DIR = "<PATH-TO>/IAM/ascii/"
SPLIT_DIR = "<PATH-TO>/RWTH/splits"
def load_image(path, max_len=1024):
img = PIL.Image.open(path).convert('L')
array = torch.Tensor(np.array(img)).unsqueeze(0).permute(0, 2, 1).float()/255.0
img = resize(array, size=128).permute(0, 2, 1)
img = normalize(img, (0.5,), (0.5,))
a = nn.ZeroPad2d((0, max_len-img.size()[2], 0, 0))(img)
a = a.permute(0, 2, 1).squeeze(0).cpu().numpy()
assert a.shape == (1024, 128)
return (a*255).astype(np.uint8)
def read_lines_text(annotation_txt):
data = []
with open(annotation_txt, 'r') as f:
for line in f.readlines():
line = line.strip('\n')
if line.startswith('#'):
continue
else:
spl = line.split(' ')
image_dir = spl[0].split('-')
data.append((os.path.join(image_dir[0], image_dir[0]+'-'+image_dir[1], spl[0]+'.png'), ' '.join(spl[8:])))
return data
def load_text(inp):
return inp.replace("|", " ").encode()
class IAM(Dataset):
def __init__(self, annotation_txt, image_folder):
self.data = read_lines_text(annotation_txt)
self.image_folder = image_folder
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
img_path, txt = self.data[idx]
img = load_image(os.path.join(self.image_folder, img_path))
txt = load_text(txt)
return img, txt
def subset(self, file):
new_dataset = copy.deepcopy(self)
valid_key = {str(l.strip('\n')) for l in open(file, 'r').readlines()}
new_data = []
for d, i in new_dataset.data:
ds = os.path.split(os.path.split(d)[0])[1]
if ds in valid_key:
new_data.append((d, i))
new_dataset.data = new_data
return new_dataset
splits = {'train': 'train.uttlist', 'test': 'test.uttlist', 'val': 'validation.uttlist'}
with h5py.File("IAM.hdf5", "w") as f:
dataset = IAM(os.path.join(ASCII_DIR, 'lines.txt'), DATA_DIR)
for split in ['train', 'test', 'valid']:
new_dataset = dataset.subset(os.path.join(SPLIT_DIR, splits[split]))
images, texts = [], []
for i in trange(len(new_dataset)):
img, txt = new_dataset[i]
images.append(img)
texts.append(txt)
f.create_dataset(f"{split}/dt", data=images)
f.create_dataset(f"{split}/gt", data=texts)
from transformer-ocr.
Hi,
I have used the pre-processing from Arthur's repo only.
you can also look in the main.py for the steps of the creation of the dataset link.
one more thing resnet required the image to of 3 channels.
from transformer-ocr.
Ok I will try to adapt this and I will keep you posted.
from transformer-ocr.
I report the following for the RWTH split of IAM:
13.9% CER 36.87% WER 94.96% SER
from transformer-ocr.
Hi,
That's a great improvement.
Did you find the issue?
from transformer-ocr.
Related Issues (19)
- LR and loss HOT 1
- Different dataset HOT 1
- Sain Gall and Washington links doesnt work HOT 1
- TypeError: iteration over a 0-d tensor HOT 6
- Google Colab link HOT 1
- Need help
- Question regarding error metrics/dataset creation HOT 1
- The link of 'Rimes' dataset can't be opened. Can you share it in some other way? Thansk! HOT 1
- Dataset Links
- AttributeError: 'numpy.ndarray' object has no attribute 'decode' HOT 1
- TypeError: normalize() argument 2 must be str, not numpy.ndarray HOT 8
- Rimes Dataset Access HOT 1
- Using my own dataset to run prediction HOT 8
- Notebook loading error HOT 3
- RuntimeError: shape '[-1, 100]' is invalid for input of size 201168 HOT 1
- Invalid argument:Not enough time for target transition sequence
- how to replace the training process of CNN(resnet101) with a pretrained one ? HOT 5
- Issue with the .hdf5 file HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformer-ocr.