UHDT Data Set Creator

The UHDT Data Set Creator is a JavaScript application built using the JavaScript Image Manipulation Program. It allows us to semi-autonomously incorporate various shapes including randomized sizes, colors, shapes, shape/letter orientation, location, and letters that we could possibly see in the competition rather than manually appending shapes and randomizing attributes in a photo editing program. As a result, not only less time will be needed to generate the dataset, but we will also increase the size of the data set significantly. Creating a gargantuan amount of data, on the order of 10^4 datapoints, is essential for accurate results.

In a nutshell, functionally, the program loops through all images within a directory and loads the said images into memory. After loading the image, it loads in a random shape and letter and also chooses a random blurriness amount, color, shape, orientation, location and size using a random number generator. From there, it will composite the previously selected random attributes onto image loaded into memory. It will loop until all images have been covered within the directory. After it is finished with compositing the image, a command will be sent in order to run the TensorFlow Record (tfrecord) generator. It requires the height, width, filename, image data, image format, bounding box minima and maxima and classes, all of which will be passed in via arguments. It will output the into a .tfrecord file which is required for training the object detection model. This whole process takes five seconds on average per image; to the contrary a manual process would take approximately 15x longer than this method.

The program generates shapes that are sufficiently accurate enough for the object detection model to learn; it only needs to learn the basic characteristics of each shape with similar backgrounds to the competition. We are exploiting the way that a neural network works through data augmentation. For example, given a shape with the same characteristics, if it is rotated even by only one degree, it sees that as a completely different shape. Thus, we can vary the same shape in different ways and as a result it will have a positive affect the neural network's learning.

For instructions on training the dataset, see here.

Requirements

Node.js (latest LTS version)
Git

Set up

Install Node.js - go to https://nodejs.org/en/download/ and install based on your operating system.
Install Git - go to https://git-scm.com/book/en/v2/Getting-Started-Installing-Git and install based on your operating system.
Clone this repository git clone https://github.com/spjy/uhdt.git
Change directories into the repository folder cd uhdt
To install dependencies, run the command npm i
Place images in the folder ./dataset
To run the script, npm start
Pictures/.tfrecords will output to ./dataset/images/test/output/{imageName}_with_target.jpg

Directory Structure

The following is a description of various files and directories relative to the root directory.

dataset_creation.js - This is the script that appends a random shape to each image.

tfrecord_gen.py - This script automatically generates the .tfrecord file.

/shapes - This directory contains the shape files to append to each image.

/dataset - This directory contains the images that you would like to append the shapes to. To change the working directory, edit line 5 within dataset_creation.js.

/dataset/output - This is where you will find the outputted images/.tfrecord files.

Randomized Data Options

As mentioned in the brief, this program has the ability to select a random shape and letter and also choose a random blurriness amount, color, shape, orientation, location and size. Here are the options in more detail:

Blurriness Amount

Since blurry pictures are a possibility due to, for example, having a low shutter speed but high aircraft speed, training for this instance is justified. To do this, we will apply a Gaussian blur; it has a range of values of 1-7 where 1 is the least blurry.

Color

The following are the possible colors that the program can randomize, and these colors are specified by AUVSI SUAS.

Red
Green
Blue
Black
White
Grey
Yellow
Purple
Brown
Orange

Shape (Class)

The following are the possible shapes (or in terms of TensorFlow, classes) that the program can randomize, and these shapes are specified by AUVSI SUAS. All the shapes, by default, are black in order to easily manipulate the colors specified above.

Cross
Ellipse
Half Circle
Heptagon
Octagon
Pentagon
Quarter Circle
Rectangle
Square
Star
Trapezoid
Triangle

Orientation

The possible degrees of orientation that a shape can be is between zero to three hundred and sixty degrees (0-360 degrees).

Size

The range of sizes that the shape can be is between fifty to seventy pixels (50-70 pixels).

Location

The shape can be placed anywhere randomly within the image minus the dimensions of the shape plus ten pixels (the addition of ten pixels is merely a buffer).

TensorFlow Record (TFRecord) Generation

After the compositing of the shape onto the image, a TFRecord file is generated within the tfrecord_gen.py. The following arguments are required in order to generate the file:

height

Data Type: int

Description: The height of the image.

width

Data Type: int

Description: The width of the whole image.

filename

Data Type: string

Description: The name of the image file.

encoded_image_data

Data Type: binary

Description: The image but encoded into base64 format.

format

Data Type: enum ['jpeg', 'png']

Description: The extension of the image, can be either jpeg or png.

xmin

Data Type: list of floats

Description: A list of normalized left x coordinates in bounding box (1 per box)

xmax

Data Type: list of floats

Description: A list of normalized right x coordinates in bounding box (1 per box)

ymin

Data Type: list of floats

Description: A list of normalized top y coordinates in bounding box (1 per box)

ymax

Data Type: list of floats

Description: A list of normalized bottom y coordinates in bounding box(1 per box)

text

Data Type: list of strings

Description: A list of strings of human readable class names.

label

Data Type: list of integers

Description: A list of integer values of the classes.

An example command to run the tfrecord_gen.py script:

python tfrecord_gen.py
  --height 1920
  --width 1080
  --filename 'IMAGE_NAME'
  --encoded_image_data
  --image_format 'jpeg'
  --xmins [20 / width]
  --xmaxs [403 / width]
  --ymins [1041 / height]
  --ymaxs [203 / height]
  --classes_text ['square']
  --classes [4]

spjy / uhdt Goto Github PK

uhdt's Introduction

UHDT Data Set Creator

Requirements

Set up

Directory Structure

Randomized Data Options

Blurriness Amount

Color

Shape (Class)

Orientation

Size

Location

TensorFlow Record (TFRecord) Generation

uhdt's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs