GithubHelp home page GithubHelp logo

elenasm7 / dsc-04-43-02-convolutional-neural-networks-nyc-ds-career-012819 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-students/dsc-04-43-02-convolutional-neural-networks-nyc-ds-career-012819

0.0 1.0 0.0 741 KB

License: Other

Jupyter Notebook 100.00%

dsc-04-43-02-convolutional-neural-networks-nyc-ds-career-012819's Introduction

Convolutional Neural Networks

Introduction

Convolutional Neural Networks (CNNs), can deal better with large images. Until now, images that we have used were fairly small. Imagine an color image with 500 x 500 pixels, this means you would end up having 500 x 500 x 3 = 750,000 input features, $(x_1,...,x_{750,000})$. Next, imagine having 2000 hidden units in the first hidden layer. Then the matrix $w^{[1]}$ would have dimensions (2000 x 750,000), and will have 1.5 billion parameters. So it becomes a very high-dimensional problem!

CNNs have certain features that identify patterns in images because of "convolution operation" including:

  • Dense layers learn global patterns in their input feature space

  • Convolution layers learn local patterns, and this leads to the following interesting features:

    • Unlike with densely connected networks, when a convolutional neural network recognizes a patterns let's say, in the upper-right corner of a picture, it can recognize it anywhere else in a picture.
    • Deeper convolutional neural networks can learn spatial hierarchies. A first layer will learn small local patterns, a second layer will learn larger patterns using features of the first layer patterns, etc.

Because of these properties, CNNs are great for tasks like:

  • Image classification
  • Object detection in images
  • Picture neural style transfer

Objectives

You will be able to:

  • Describe what a CNN is
  • Describe the basic architectural features of a CNN
  • Identify common use cases for CNNs

Building CNNs in Keras

Building a CNN in Keras is very similar to the previous neural networks that we have built to date. To start, you will initialize a sequential model as before and go on adding layers. However, rather then simply adding additional dense layers or dropouts between them, we will now start to investigate other potential layer architectures including convolutional layers.

The Convolution Operation

The idea behind the convolutional operation is to detect complex building blocks, or features, that can aid in the larger task such as image recognition. For example, we'll detect vertical or horizontal edges present in the image. Let's look at what horizontal edge detection would look like!

title

This is a simplified 5 x 5 pixel image (greyscale!). You use a so-called "filter" (denoted on the right) to perform a convolution operation. This particular filter operation will detect horizontal edges. The matrix in the left should have number in it (from 1-255, or let's assume we rescaled it to number 1-10). The output is a 3 x 3 matrix. (This example is for computational clarity, no clear edges)

In Keras, function for the convolution step is Conv2D.

The convolutional operation applies this filter (typically 3x3 or 5x5) to each possible 3x3 or 5x5 region of the original image. The graphic below demonstrates this process. Later,

Padding

There are some issues with using filters on images including:

  • The image shrinks with each convolution layer: you're throwing away information in each layer! For example:
    • Starting from a 5 x 5 matrix, and using a 3 x 3 matrix, you end up with a 3 x 3 image.
    • Starting from a 10 x 10 matrix, and using a 3 x 3 matrix, you end up with a 8 x 8 image.
    • etc.
  • The pixels around the edges are used much less in the outputs due to the filter.

For example, if we apply 3x3 filters to a 5x5 image, our original 5x5 image contains 25 pixels, but tiling our 3x3 filter only has 9 possible locations. Here's the 4 of the 9 possible locations for the 3x3 filter on a 5x5 image:

Fortunately, padding solves both of these problems! Just one layer of pixels around the edges preserves the image size when having a 3 x 3 filter. We can also use bigger filters, but generally the dimensions are odd!

Some further terminology regarding padding that you should be aware of includes:

  • "Valid" - no padding
  • "Same" - padding such that output is same as the input size

By adding padding to our 5x5 image, (now a 6x6 image by adding a border of pixels) we can add padding so that each pixel of our original 5x5 image can be the center of a 3x3 convolution window filter.

Strided convolutions

Another method to change the output of your convolutions is to change the stride. The stride is how the convolution filter is moved over the original image. In our above example, we moved the filter one pixel to the right starting from the upper left hand corner, and then began to do this again after moving the filter one pixel down. Alternatively, by changing the stride, we could move our filter by 2 pixels each time, resulting in a smaller number of possible locations for the filter.

Strided convolutions are rarely used in practice but a good feature to be aware of for some models.

Convolutions on RGB images

Instead of 5 x 5 grayscale, let's imagine a 7 x 7 RGB image, which boils down to having a 7 x 7 x 3 tensor. (The image itelf is compromised by a 7 by 7 matrix of pixels, each with 3 numerical values for the RGB values.) From there, you will need to use a filter that has the third dimension equal to 3 as well, let's say, 3 x 3 x 3 (a 3D "cube").

This allows us to detect, eg only horizontal edges in the blue channel (filter on the red and green channel all equal to 0).

Then, in each layer, you can convolve with several 3D filters. Then, you stack every output result together, and that way you end up having a 5 x 5 x (number of filters) shape.

If you think of it, the filter plays the same role as the w^{[1]} in our densely connected networks.

The advantage is, your image can be huge, the amount of parameters you have still only depends on how many filters you're using!

Imagine 20 (3 x 3 x 3) --> 20 * 27 + a bias for each filter (1* 20) = 560 parameters.

Notation:

  • $f^{[l]}$ = size of the filter

  • $p^{[l]}$ = padding

  • $s^{[l]}$ = amount of stride

  • $ n_c^{[l]}$ = number of filters

  • filter: $f^{[l]}$ x $f^{[l]}$ x $ n_c^{[l-1]}$

  • Input = $ n_h^{[l-1]} * n_w^{[l-1]} * n_c^{[l-1]} $

  • Output = $ n_h^{[l]} * n_w^{[l]} * n_c^{[l]} $

Height and width are given by:

$n_h^{[l]}= \Bigr\lfloor\dfrac{n_h^{[l-1]}+2p^{[l]}-f^{[l]}}{s^{[l]}}+1\Bigr\rfloor$

$n_w^{[l]}= \Bigr\lfloor\dfrac{n_w^{[l-1]}+2p^{[l]}-f^{[l]}}{s^{[l]}}+1\Bigr\rfloor$

Activations: $a^{[l]}$ is of dimension $ n_h^{[l]} * n_w^{[l]} * n_c^{[l]} $

Pooling layer

The last element in a CNN architecture (before fully connected layers as we have previously discussed in other neural networks) is the pooling layer. This layer is meant to substantially downsample the previous convolutional layers. The idea behind this is that the previous convolutional layers will find patterns such as edges or other basic shapes present in the pictures. From there, pooling layers such as Max pooling (the most common) will take a summary of the convolutions from a larger section. In practice Max pooling (taking the max of all convolutions from a larger area of the original image) works better then average pooling as we are typically looking to detect whether a feature is present in that region. Downsampling is essential in order to produce viable execution times in the model training.

Max pooling has some important hyperparameters:

  • f (filter size)
  • S (stride)

Common hyperparameters include: f=2, s=2 and f=3, s=2, this shrinks the size of the representations. If a feature is detected anywhere in the quadrants, a high number will appear. so max pooling preserves this feature.

Fully Connected Layers in Your CNN.

Once you have addded a number of convolutional layers and pooling layers, yoiu then. will add fully connected (dense) layers as we did before in previous neural network models. This now allows the network to learn a final decision function based on these transformed informative inputs generating from the convolutional and pooling layers.

Additional Resources

Summary

In this lesson, we discussed the basic concepts behind CNNs including their use cases and general archictecture. In the upcoming lab, we'll begin to look at how you can build these models in Python using Keras.

dsc-04-43-02-convolutional-neural-networks-nyc-ds-career-012819's People

Contributors

loredirick avatar mathymitchell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.