GithubHelp home page GithubHelp logo

tab-aug's Introduction

Dataset Augmentation

Use augment_data.py script for augmenting a dataset. You have to provide folders containing images and their corresponding XML and OCR files. If OCR files are not available you may provide an empty directory and the program will generate the missing OCR files and save them there. log_file flag can be used for logging out warnings and errors to a text file. If a file name is not provided the logging will be skipped.

Note: that the script will not over-write generated files in any case. Thus you can call the augmentation script multiple times into the same output directory without a concern for them being over-written.

Requirement:

pip install truthpy

Usage

usage: main.py [-h] -img IMAGE_DIR -xml XML_DIR -ocr OCR_DIR -n NUM_SAMPLES -o
               OUT_DIR [-log LOG_FILE] [-vis]

optional arguments:
  -h, --help            show this help message and exit
  -img IMAGE_DIR, --image_dir IMAGE_DIR
                        Directory for images
  -xml XML_DIR, --xml_dir XML_DIR
                        Directory for xmls
  -ocr OCR_DIR, --ocr_dir OCR_DIR
                        Directory for ocr files. (If an OCR file is not found,
                        it will be generated and saved in this directory for
                        future use)
  -n NUM_SAMPLES, --num_samples NUM_SAMPLES
                        Number of augmented samples to generate
  -o OUT_DIR, --out_dir OUT_DIR
                        Output directory for generated data
  -log LOG_FILE, --log_file LOG_FILE
                        Output file path for error logging.
  -vis, --visualize

Command: python augment_data.py -img data/images/ -xml data/xmls/ -ocr data/ocr/ -n 100 -o augmented_data/ -log error_logs.txt -vis

tab-aug's People

Contributors

sohaib023 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

maxkinny222

tab-aug's Issues

About the format of XML files

Hi, in line 98 of generate_samples.py, I found that doc is nothing after Document(xml_file), is there something wrong with the format of the xml files? Can you give a xml file as an example?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.