GithubHelp home page GithubHelp logo

Comments (6)

him4318 avatar him4318 commented on June 12, 2024

Hi

It shouldn't be this bad. Few things you can look for are the correct max length of the line, the same pre-processing steps are applied on the train and test images, you can also look for the training and validation loss if the model is over-fitting or not.

At last, you can train your model using arthurflor repository and check the metrics.

from transformer-ocr.

ysig avatar ysig commented on June 12, 2024

Hi,

It is overfitting, but you save the model with the best validation loss, so this shouldn't be a problem.
The only thing that could go wrong, is my constructor for the hdf file.
Do you know where I can see a reference for properly constructing data in the form that you need them?

Thank you,

PS: I used the following code:

Import torch
from torch import nn
from torch.utils.data import Subset, Dataset, DataLoader
from torch.nn import functional as F
import torchvision
import math
import numpy as np
from torchvision.transforms.functional import resize, pil_to_tensor, normalize
import os
import PIL
import copy
from tqdm import trange

import h5py

DATA_DIR = "<PATH-TO>/IAM/lines/"
ASCII_DIR = "<PATH-TO>/IAM/ascii/"
SPLIT_DIR = "<PATH-TO>/RWTH/splits"

def load_image(path, max_len=1024):
    img = PIL.Image.open(path).convert('L')
    array = torch.Tensor(np.array(img)).unsqueeze(0).permute(0, 2, 1).float()/255.0
    img = resize(array, size=128).permute(0, 2, 1)
    img = normalize(img, (0.5,), (0.5,))
    a = nn.ZeroPad2d((0, max_len-img.size()[2], 0, 0))(img)
    a = a.permute(0, 2, 1).squeeze(0).cpu().numpy()
    assert a.shape == (1024, 128)
    return (a*255).astype(np.uint8)

def read_lines_text(annotation_txt):
    data = []
    with open(annotation_txt, 'r') as f:
        for line in f.readlines():
            line = line.strip('\n')
            if line.startswith('#'):
                continue
            else:
                spl = line.split(' ')
                image_dir = spl[0].split('-')
                data.append((os.path.join(image_dir[0], image_dir[0]+'-'+image_dir[1], spl[0]+'.png'), ' '.join(spl[8:])))
    return data

def load_text(inp):
    return inp.replace("|", " ").encode()

class IAM(Dataset):
    def __init__(self, annotation_txt, image_folder):
        self.data = read_lines_text(annotation_txt)
        self.image_folder = image_folder

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        img_path, txt = self.data[idx]
        img = load_image(os.path.join(self.image_folder, img_path))
        txt = load_text(txt)
        return img, txt

    def subset(self, file):
        new_dataset = copy.deepcopy(self)
        valid_key = {str(l.strip('\n')) for l in open(file, 'r').readlines()}
        new_data = []
        for d, i in new_dataset.data:
            ds = os.path.split(os.path.split(d)[0])[1]
            if ds in valid_key:
                new_data.append((d, i))
        new_dataset.data = new_data
        return new_dataset

splits = {'train': 'train.uttlist', 'test': 'test.uttlist', 'val': 'validation.uttlist'}
with h5py.File("IAM.hdf5", "w") as f:
    dataset = IAM(os.path.join(ASCII_DIR, 'lines.txt'), DATA_DIR)
    for split in ['train', 'test', 'valid']:
        new_dataset = dataset.subset(os.path.join(SPLIT_DIR, splits[split]))
        images, texts = [], []
        for i in trange(len(new_dataset)):
            img, txt = new_dataset[i]
            images.append(img)
            texts.append(txt)
        f.create_dataset(f"{split}/dt", data=images)
        f.create_dataset(f"{split}/gt", data=texts)

from transformer-ocr.

him4318 avatar him4318 commented on June 12, 2024

Hi,

I have used the pre-processing from Arthur's repo only.
you can also look in the main.py for the steps of the creation of the dataset link.
one more thing resnet required the image to of 3 channels.

from transformer-ocr.

ysig avatar ysig commented on June 12, 2024

Ok I will try to adapt this and I will keep you posted.

from transformer-ocr.

ysig avatar ysig commented on June 12, 2024

I report the following for the RWTH split of IAM:
13.9% CER 36.87% WER 94.96% SER

from transformer-ocr.

him4318 avatar him4318 commented on June 12, 2024

Hi,
That's a great improvement.
Did you find the issue?

from transformer-ocr.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.