GithubHelp home page GithubHelp logo

gan's Introduction

Premise of GANs

A GAN takes a random sample from a latent or prior distribution as input and maps it to the data space. The task of training is to learn a deterministic function that can efficiently capture the dependencies and patterns in the data so that the mapped point resembles a sample generated from the data distribution.

Example:

I have generated 300 samples from Isotropic Bivariate Gaussian distribution.

bi var guass

When passed through a function , the points form a ring, which demonstrates that there could be a high capacity function that may be able to model data distribution of high dimensional data like images. Neural networks are out best bet as they are universal functional approximators. Hence, deep neural networks are used while modeling data distribution of images. Unlike MLE or KDE, this is an implicit density estimation

ring

Probability Review

freq table frequency table and joint distribution of two discrete random variables

Conditional Distribution

In the above table, fix the value of a random variable, say x = x_1; the distribution of y when x = x_1 is called conditional distribution, . Conditional expectation is expectation of the conditional distribution.

In the above table, the conditional probability of y_1 given X=x_1 is 2/17

Marginal Distribution

Integrate or summate over a variable, to get the marginal distribution of another variable.

In the above table, the marginal probability of x_1 according to above formula is 2/50 + 10/50 + 5/50 = 17/50.

Joint Distribution

A join distribution a.k.a data distribution captures the joint probabilities between random variables. In the above table, the join probability of P(X = x_2 & Y = y_3) is 2/50. This is what a GAN tries to model from the sample data.

Consider images of size 28 x 28. Each pixel is a random variable that can take any value from 0 to 255. Hence, we have 784 random variables in total. GAN tries to model the dependencies between the pixels.

Bayes Theorm

From the above table: P(Y = y_1 | X = x_1) = P(Y = y_1 & X = x_1) / P(X = x_1) = (2/50)/(17/50) = 2/17

Entropy

Entropy measures the degree of uncertainty of an outcome of a trial according to a p(x).

entropy

The entropy of an unbiased coin is higher than a biased coin. The difference in entropy increases with the degree of polarization of probabilities of the biased coin.

Cross Entropy

Cross entropy measures the degree of uncertainty of a trial according to p(y) but in truth according to p(x).

entropy

Cross entropy is higher when a trial is conducted according to unbiased coin probability distribution but you think it is being conducted according to the biased coin probability distribution.

KL Divergence

KL Divergence is the difference between cross-entropy and entropy of the true distribution. KL Divergence is equal to zero when two distributions are equal. Hence, when you want to approximate or model a probability distribution with other, minimizing the KL Divergence between them will make them similar.

D(fair||biased) = H(fair, biased) โ€“ H(fair) = 2.19-1 = 1.19

JS Divergence

Due to division by a probability of an outcome in KL Divergence equation, it may become intractable in some cases. For example, if q_k is zero, KL Divergence becomes infinite. Moreover, KL Divergence is not symmetric i.e., D(p||q) is not equal to D(q||p), which makes it unusable as a distance metric. To suppress these effects, JS divergence uses avg probability of an outcome.

GANs

As aforementioned, GANs take a random sample from the latent space as input and maps it to data space. In DCGANs, the mapping function is a deep neural network, which is differentiable and parameterized by network weights. The mapping function is called Generator(G). A Discriminator(D) is also a deep neural network that takes a sample in the data space and maps it to the action space i.e., the probability of that sample being generated from the data distribution.

: Prior / latent distribution. Typically, this space is much smaller than the data space.

: Data distribution of data generated by the generator

: Real data distribution

D__m = (d_1, d_2, d_3, ... d_m) be the data generated according to P_r

G__n = (g_m+1, g_m+2, .. g_n) be the data generated according to P_g

Train D to minimize the empirical loss. I am including min functions, as most deep learning frameworks only implement, minimization of a function.

Fix the D network, and train G to maximize the loss of D over G_n.

As stated in the original paper, in the early training period, the above loss doesn't offer enough gradient to update the parameters of G network, as initially P_g is distant from P_d, which makes it easy for D to classify generated images. Hence, we try to maximize it by switching labels.

Optimization And Theoritical Results

Optimal Discriminator for fixed `G`

Equation 1 is an empiracal loss function. Its risk function or loss on the whole population i.e., for every possible image can be written as:

So when y_hat = y_hat*, the discriminator is at its minimum. At the end of the training, if G does a good job at approximating P_r, then P_g ~ P_r.

substituting it in equation 3 gives the optimal loss of the discriminator at the end of the training.

This is the cost is obtained when both D and G are perfectly optimized.

From the JS divergence equation, the JS divergence between P_g and P_r is

From equation 3,

The JS divergence is positive semidefinite. Hence, for the value of the above equation to be equal to the value calculated in 4, the JS divergence should be 0 i.e., P_g = P_r. To conclude, when the D is at its best, G need to make P_g ~ P_r to reach the global optimum.

gan's People

Contributors

ajitsamudrala avatar

Watchers

Moen Chishti avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.