Hi, I recently trained and tested your model on the RWTH split of IA

Low performance when training your model on RWTH split on IAM about transformer-ocr HOT 6 CLOSED

him4318 commented on June 12, 2024

Low performance when training your model on RWTH split on IAM

from transformer-ocr.

Comments (6)

him4318 commented on June 12, 2024

It shouldn't be this bad. Few things you can look for are the correct max length of the line, the same pre-processing steps are applied on the train and test images, you can also look for the training and validation loss if the model is over-fitting or not.

At last, you can train your model using arthurflor repository and check the metrics.

from transformer-ocr.

ysig commented on June 12, 2024

Hi,

It is overfitting, but you save the model with the best validation loss, so this shouldn't be a problem.
The only thing that could go wrong, is my constructor for the hdf file.
Do you know where I can see a reference for properly constructing data in the form that you need them?

Thank you,

PS: I used the following code:

Import torch
from torch import nn
from torch.utils.data import Subset, Dataset, DataLoader
from torch.nn import functional as F
import torchvision
import math
import numpy as np
from torchvision.transforms.functional import resize, pil_to_tensor, normalize
import os
import PIL
import copy
from tqdm import trange

import h5py

DATA_DIR = "<PATH-TO>/IAM/lines/"
ASCII_DIR = "<PATH-TO>/IAM/ascii/"
SPLIT_DIR = "<PATH-TO>/RWTH/splits"

def load_image(path, max_len=1024):
    img = PIL.Image.open(path).convert('L')
    array = torch.Tensor(np.array(img)).unsqueeze(0).permute(0, 2, 1).float()/255.0
    img = resize(array, size=128).permute(0, 2, 1)
    img = normalize(img, (0.5,), (0.5,))
    a = nn.ZeroPad2d((0, max_len-img.size()[2], 0, 0))(img)
    a = a.permute(0, 2, 1).squeeze(0).cpu().numpy()
    assert a.shape == (1024, 128)
    return (a*255).astype(np.uint8)

def read_lines_text(annotation_txt):
    data = []
    with open(annotation_txt, 'r') as f:
        for line in f.readlines():
            line = line.strip('\n')
            if line.startswith('#'):
                continue
            else:
                spl = line.split(' ')
                image_dir = spl[0].split('-')
                data.append((os.path.join(image_dir[0], image_dir[0]+'-'+image_dir[1], spl[0]+'.png'), ' '.join(spl[8:])))
    return data

def load_text(inp):
    return inp.replace("|", " ").encode()

class IAM(Dataset):
    def __init__(self, annotation_txt, image_folder):
        self.data = read_lines_text(annotation_txt)
        self.image_folder = image_folder

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        img_path, txt = self.data[idx]
        img = load_image(os.path.join(self.image_folder, img_path))
        txt = load_text(txt)
        return img, txt

    def subset(self, file):
        new_dataset = copy.deepcopy(self)
        valid_key = {str(l.strip('\n')) for l in open(file, 'r').readlines()}
        new_data = []
        for d, i in new_dataset.data:
            ds = os.path.split(os.path.split(d)[0])[1]
            if ds in valid_key:
                new_data.append((d, i))
        new_dataset.data = new_data
        return new_dataset

splits = {'train': 'train.uttlist', 'test': 'test.uttlist', 'val': 'validation.uttlist'}
with h5py.File("IAM.hdf5", "w") as f:
    dataset = IAM(os.path.join(ASCII_DIR, 'lines.txt'), DATA_DIR)
    for split in ['train', 'test', 'valid']:
        new_dataset = dataset.subset(os.path.join(SPLIT_DIR, splits[split]))
        images, texts = [], []
        for i in trange(len(new_dataset)):
            img, txt = new_dataset[i]
            images.append(img)
            texts.append(txt)
        f.create_dataset(f"{split}/dt", data=images)
        f.create_dataset(f"{split}/gt", data=texts)

from transformer-ocr.

him4318 commented on June 12, 2024

Hi,

I have used the pre-processing from Arthur's repo only.
you can also look in the main.py for the steps of the creation of the dataset link.
one more thing resnet required the image to of 3 channels.

from transformer-ocr.

ysig commented on June 12, 2024

Ok I will try to adapt this and I will keep you posted.

from transformer-ocr.

ysig commented on June 12, 2024

I report the following for the RWTH split of IAM:
13.9% CER 36.87% WER 94.96% SER

from transformer-ocr.

him4318 commented on June 12, 2024

Hi,
That's a great improvement.
Did you find the issue?

from transformer-ocr.

Low performance when training your model on RWTH split on IAM about transformer-ocr HOT 6 CLOSED

Comments (6)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs