GithubHelp home page GithubHelp logo

boulderdash's Introduction

Boulderdash

Chris Kinsey

USU CS5600 Project 2

Objective

The purpose of this project is to perform image segmentation on indoor rock climbing walls. Indoor climbing walls can get very cluttered, and the hope with this project is to eventually be able to dynamically draw highlights over images of climbing walls in order to quickly see all rocks in a route. The first step toward that goal is to apply Tensorflow to highlight all images on a wall, regardless of what route they are part of.

Environment

  1. Windows 10

  2. Python 3.6.3

  3. OpenCV 3.4.4

  4. Tensorflow 1.2.1

Running the Application

Simply run main.py to train a new neural network on the provided images.

To run the unit tests, run unit_tests.py. These should both work out of the box if run in the above environment.

Data

I have not been able to find any datasets of segmented rock wall images, so I had to harvest them myself. All images were taken on a Samsung Galaxy S5 and split in half vertically. Segmentation was done for the images by hand using GIMP 2. Segmentation maps were drawn by tracing all the rocks in an image and creating an image mask. The segmentation maps are the same size as the images. Images are stored in ./img and segmentation maps are stored in ./seg_map

Roadblocks

As I anticipated in my proposal, my hardware was not powerful enough to handle a fully convolutional network for large images. I downsized the images at import time by a factor of 10 in each dimension. This meant a lot of lost detail, but not so much that the images were unrecognizable.

Fully convolutional networks are a fairly advanced topic. The most common way to do one is to take the weights from a pre-trained convolutional network, such as VGG Net, and replace the fully connected layers at the end with transpose convolutional layers and unpooling layer to upscale the representations back to image size.

Pre-trained convnets are highly specialized, and it is unlikely that using one that had not been trained on images similar to mine would perform well. This meant that I had to train my networks from scratch, which is even more difficult. TFLearn does not seem to have the right tools for this project. Other people have been able to do this using the base Tensorflow API, and some people have contributed parts of a segmentation net architecture to TFLearn, such as a 2D segmentation cross entropy loss function. Though some of these tools exist, they are almost entirely undocumented, and it is very possible that I am the first person to try to do a FCN for image segmentation in pure TFLearn. As an example, the aforementioned loss function for 2d segmentation does not work for TFLearn out of the box, it has to have its own wrapper.

Ultimately, the generated segmentation maps came out as grids. The TFLearn accuracy moved up from about 0.1 to about 0.2, but the output is nothing like what I expected. TFLearn is working as it expects to, but providing invalid output. This makes me think that the functions I am using are not valid for the kind of operation I am trying to do.

Deliverables

Since I was limited by training time and development roadblocks, I did not manage to get all the deliverables I had planned on. I did include plenty of source images and ground truth segmentation maps. I have the trained network, the code to run it, and a sample output segmentation map. Since the output maps were invalid, I did not bother making overlaid images.

Prospective Future Architecture

A simple architecture would be repeated convolutional networks that first move the representation to a small size with many filters, process them further with 1x1 convolutions, and then undo each shrinking layer step by step. Since this will ultimately have to be done in plain TensorFlow, I think would be worthwhile to use the inception layers that made GoogLeNet so successful. These use 1x1, 3x3, 5x5, and pooling filters at every single layer. The output of each operation is contatenated to the rest, and this is all treated as one layer. This allows many different kinds of features to be collected at every step.

boulderdash's People

Contributors

toph-goes-up avatar mptiki avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.