GithubHelp home page GithubHelp logo

roatienza / straug Goto Github PK

View Code? Open in Web Editor NEW
240.0 5.0 36.0 2.61 MB

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

License: Apache License 2.0

Python 100.00%
str data-augmentation scene-text-recognition

straug's Issues

Gaussian blur kernel size for small images

Currently, the kernel size is fixed to 31x31 (

straug/blur.py

Line 38 in 43f9ca9

kernel = (31, 31)
)

This causes an error internally in the call to reflection_pad2d():
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (15, 15) at dimension 3 of input 4

if one of the image's dimensions is less than the kernel size.

Should the kernel size be a percentage of the image's dimensions instead, e.g. 30-50% of the smaller dimension?

question about training speed

thanks for your excellent job! it seems that the training is very slow when i use the straug(6x times slower than that without straug). What about the real speed when you test? The following is my aug-code.

class RecStraugRandAug(object):
    def __init__(self, num_aug=2, prob=0.5, **kwargs):
        super().__init__()
        self.num_aug = num_aug
        self.prob = prob
        try:
            from straug.blur import GaussianBlur, DefocusBlur, MotionBlur, GlassBlur
            from straug.camera import Contrast, Brightness, JpegCompression, Pixelate
            from straug.geometry import Perspective, Shrink, Rotate
            from straug.noise import GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
            from straug.pattern import Grid, VGrid, HGrid, RectGrid, EllipseGrid
            from straug.process import Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
            from straug.warp import Stretch, Distort, Curve
            from straug.weather import Fog, Snow, Frost, Rain, Shadow
            self.augs = [
                [GaussianBlur(), DefocusBlur(), MotionBlur(), GlassBlur()],
                [Contrast(), Brightness(), JpegCompression(), Pixelate()],
                [Perspective(), Shrink(), Rotate()],
                [GaussianNoise(), ShotNoise(), ImpulseNoise(), SpeckleNoise()],
                [Grid(), VGrid(), HGrid(), RectGrid(), EllipseGrid()],
                [Posterize(), Solarize(), Invert(), Equalize(), AutoContrast(), Sharpness(), Color()],
                [Stretch(), Distort(), Curve()],
                [Fog(), Snow(), Frost(), Rain(), Shadow()],
            ]
        except Exception as ex:
            print(f"exception: {ex}, you can install straug using `pip install straug`")
            exit(-1)
    
    def __call__(self, data):
        img = Image.fromarray(data["image"])
        for idx in range(self.num_aug):
            aug_type_idx = np.random.randint(0, len(self.augs))
            aug_idx = np.random.randint(0, len(self.augs[aug_type_idx]))
            img = self.augs[aug_type_idx][aug_idx](img, mag=random.randint(-1,2), prob=self.prob)
        data["image"] = np.array(img)
        return data

RandAugment

hi thanks for your work

are you implement the RandAugment?

in your paper:

geometry = [Rotate(), Perspective(), Shrink()]
noise = [GaussianNoise()]
blur = [MotionBlur()]
augmentations = [geometry, noise, blur]
img = RandAugment(img, augmentations, N=3)

How to deal with underfitting?

Hello, I am a fresh researcher and recently I noticed your code which is very useful to solve my problem to some extent. My project is also scene text recognition while the dataset is much more irregular. I think your paper give me a constructive guidance. However, there is still some problems that when the N(number of functions in each channel) is going to be larger, maybe 3 or 4, the model preforms hardly to be fitted. the accuracy of training set is always surrounding about 90%. For more, if I add a preprocessing method of random cut, the accuracy is always surrounding about 80%. Could you give me some suggestions to deal with such problems? Thanks.

Questions about paper

First of all, thank you for your great work.
I read the paper and met 2 problem:
why there is no CRNN line in figure 13?
企业微信截图_16504400407081
And what's the N corresponding to table 5?Is it the best result in a grid search?

enhance a dataset

Thanks for your work! How to perform batch operations? I want to enhance a dataset.

how to glared image

@roatienza thank for great repo.
I want to aug image like that:
image

and how to aug. I think it is: "glared image"
Thank for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.