GithubHelp home page GithubHelp logo

josephcatrambone / pytorchtextoverlaydataset Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 76 KB

A PyTorch Dataset Adapter to Composite Text and Images

License: MIT License

Python 99.39% Shell 0.61%

pytorchtextoverlaydataset's Introduction

TextOverlayDataset

A meta-dataset builder to combine text datasets and image datasets.

License:

This software is dual licensed as MIT or GPLv2 at the discretion of the user. The MIT license is included in the LICENSE file.

Citation:

A citation is completely optional but would be very much appreciated if you use this project in your research.

@software{TextOverlayDataset,
  author = {Catrambone, Joseph},
  title = {{Text Overlay Dataset}},
  url = {https://github.com/JosephCatrambone/PyTorchTextOverlayDataset},
  version = {0.1.1},
  year = {2023}
  month = {06},
}

Recipes:

While the documentation in the TextOverlayDataset constructor is extensive, sometimes one simply wants recipes.

Towards that end:

# Basic Minimal Usage:

# %pip install text-overlay-dataset
from text_overlay_dataset import TextOverlayDataset
from PIL import Image

ds = TextOverlayDataset(
    image_dataset = [Image.new("RGB", (256, 256)), ], 
    text_dataset = ["Hello", "World"], 
    font_directory="<path to ttf dir>"
)

composite_image, text, etc = ds[0]

# composite_image is the 0th image with a randomly selected text.
# text is the given text that was selected.
# etc is an object with axis-aligned bounding box, font name, and so on.

# If desired, one can specify `randomly_choose='image'` in the constructor
# and text will be accessed sequentially with random images instead.
# Augmenting the text and making it harder to read by blurring, rotating, etc.

from text_overlay_dataset import TextOverlayDataset
from torchtext.datasets import IMDB  # A text dataset should be mappable.
from torchvision.datasets.fakedata import FakeData  # Any mappable image dataset is fine, or just a list of Images.

image_dataset = FakeData(size=100, image_size=(3, 256, 256),)

text_dataset_iter = IMDB(split='train')
text_dataset = [label_text[1] for label_text in text_dataset_iter] 

ds = TextOverlayDataset(
    image_dataset,
    text_dataset,
    font_directory="./fonts/",
    maximum_font_translation_percent=0.5,
    maximum_font_rotation_percent=0.25,
    maximum_font_blur=3.0
)
# Any torchvision transform can be used as part of the preprocessing.
# Perhaps your model requires images to be cropped to 512x512.
from torchvision.transforms import CenterCrop
ds = TextOverlayDataset(
    image_dataset = fake_image_dataset,
    text_dataset = ["Hello", "World"],  # This can also be a PyTorch text dataset.
    font_directory = "fonts",
    maximum_font_translation_percent=0.4,
    maximum_font_rotation_percent=0.5,
    maximum_font_blur=3.0,
    prefer_larger_fonts=True,
    pre_composite_transforms=[CenterCrop([512,])],
    # post_composite_transforms are also possible.
)
# It's possible to try and fill each image with text.
# Set prefer_larger_fonts to use the maximum font size.
ds = TextOverlayDataset(
    image_dataset = fake_image_dataset,
    text_dataset = ["Hello", "World"],  # This can also be a PyTorch text dataset.
    font_directory = "fonts",
    prefer_larger_fonts = True,
    # Or you can specify `font_sizes = [36, 48, ...]`
)
# If your dataset has a lot of long strings with no line breaks, it might be worth considering setting 
# 'long_text_behavior' to 'truncate_then_shrink' to avoid lots of null texts. 
ds = TextOverlayDataset(
    image_dataset = fake_image_dataset,
    text_dataset = ["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!!!!!!"],
    font_directory = "fonts",
    long_text_behavior = 'truncate_then_shrink',
)

TODO:

  • Add toggle to prefer larger fonts first?
  • Fix bounds checking on rotation so we don't put text off the edge of the image.
  • Add automatic line-breaking to fix long text inside image areas.
  • Check for sampling biases in the random generations.
  • Support streaming datasets.
  • Verify RTL languages.
  • Verify Unicode line breaks and non-English fonts.

pytorchtextoverlaydataset's People

Contributors

josephcatrambone avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

wendlerc

pytorchtextoverlaydataset's Issues

`fast_conservative_theta_range` is picking invalid theta values. Text gets placed outside of image bounds.

The fast_conservative_theta_range function is designed to take an outer_width, outer_height, and inner bounding box. It returns a min_theta and max_theta which indicate how much the inner box can be rotated around its centerpoint. We have proven mathematically that this angle range is conservative and should never result in a rotation which places text outside of the outer_rect.

YET HERE WE ARE.

About 0.25% (one in four-hundred) resultant AABBs have a corner with an x value of less than one. Generally, these are outside by between one and five pixels.

Expectation: selecting a theta such that min_angle <= theta <= max_angle and then rotating the bounding_box by theta should result in four points which are all inside 0 < x < outer_width (for x points) or 0 < y < outer_height.

Observation: once in a while, we see x < 0.

Minimal Usage Example Typo

# Basic Minimal Usage:

# %pip install text-overlay-dataset
from text_overlay_dataset import TextOverlayDataset
from PIL import Image

ds = TextOverlayDataset(
    image_dataset = [Image.new("RGB", (256, 256)), ], 
    text_dataset = ["Hello", "World"], 
    fonts="<path to ttf dir>"
)

composite_image, text, text_raster, aabb = ds[0]

the fonts keyword parameter got renamed to fond_directory

Examples from the readme don't work

`# Augmenting the text and making it harder to read by blurring, rotating, etc.

from text_overlay_dataset import TextOverlayDataset
from torchtext.datasets import IMDB # A text dataset should be mappable.
from torchvision.datasets.fakedata import FakeData # Any mappable image dataset is fine, or just a list of Images.

image_dataset = FakeData(size=100, image_size=(3, 256, 256),)

text_dataset_iter = IMDB(split='train')
text_dataset = [label_text[1] for label_text in text_dataset_iter]

ds = TextOverlayDataset(
image_dataset,
text_dataset,
font_directory="./fonts/",
maximum_font_translation_percent=0.5,
maximum_font_rotation_percent=0.25,
maximum_font_blur=3.0
)`

There are multiple errors:

FakeData returns tuples: so changing the image dataset to image_dataset = [d[0] for d in image_dataset]
would make sense. When doing that I still get an error:

`522 text_color_block.putalpha(result.text_rasterization)
523 img_pil.paste(text_color_block, (0, 0), result.text_rasterization)
524 result.image = img_pil

AttributeError: 'NoneType' object has no attribute 'text_rasterization`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.