machine-learning-knowledge

knowledge memos/citations of machine-learning based on a coursera class: Machine Learning by Andrew Ng.

What is machine learning? (Two definitions)

Arthur Samuel's older and informal definition:
"the field of study that gives computers the ability to learn without being explicitly programmed."
Tom Mitchell's more modern definition:
"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
- Example: playing checkers.
- E = the experience of playing many games of checkers
- T = the task of playing checkers.
- P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.

***

Supervised Learning and Unsupervised Learning

Supervised Learning

Definition

"right answer" is given.
In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Unsupervised Learning

Definition

Approaching problems with little or no idea what our results should look like.
We can derive structure from data where we don't necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
With unsupervised learning there is no feedback based on the prediction results.

Cost Function

Definition

summation of the difference between the predicted value and the actual value.
goal of machine learning to solove problem is, in other words, to minimize a cost function.
it is also called "Squared error function" or "Mean squared error".

Equasion

1/m with Summation: averaging it
1/2 : we rather play with smaller numbers than big numbers
If all data(x) are plotted on the hypothesis, cost function = 0.

Visual Image

ex. h(X) = θ0 + θ1 * X

3D plot	Contour plot/figure

Two ways to minimize cost function

Gradient Descent (any case)
Normal Equation (n = # of features < 10,000)

***

Gradient Descent

Definition

Start with some θ(parameter)
Keep changing θ to reduce J(θ) until we end up at a minimum
"Batch" Gradient Descent: Each step of gradient descent uses all the training examples.

Algorithm

As we approach a local minimum, gradient descent will automatically take smaller steps.
So, no need to decrease α over time.

Caution

correct	incorrect

Gradient descent could be stuck at local optima.

Technics about efficiency

make each of input values in roughly the same range to converge efficiently, speedy.
ideal range: −1 ≤ x(i) ≤ 1 or −0.5 ≤ x(i) ≤ 0.5
Feature Scailing + Mean Normalization:

Feature Scaling

"S" in the formula written above.
divide the input values by the range (i.e. max - min) of the input variable.

Mean Normalization

"μ" in the formula written above.
subtract the average value from the values for that input variable.
the average of the processed values is 0.

Learning Rate

"α" in the gradient descent formula.
If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.

Normal Equation

Definition

minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero.
This allows us to find the optimum theta without iteration.

Notice

***

Linear Regression

name	logic
Model
Cost Function
Algorithm

Linear Regression for Multiple variables

Every formula is equal.

description	formula
Expanded
X:column(vector)
X:row

Gradient Descent behind it

simultaneously update

***

Binary Classification Problem(0 or 1)

Logistic Function / Sigmoid Function

Hypothesis

0 <= h(x) <= 1

Key Concept

Decision Boundary

In order to get our discrete 0 or 1 classification, we can translate the output of the hypothesis function as follows:

***

Appendix

Expression in equation

Label	Definition
x	input variable, feature
y	output/target variable
m	number of training examples
h	logic(relation) between x and y
θ	parameter in h
J	cost function
α	learning rate
	real number

Wiki: Table of expressions in math

naosk8 / machine-learning-knowledge Goto Github PK

machine-learning-knowledge's Introduction

machine-learning-knowledge

What is machine learning? (Two definitions)

Supervised Learning and Unsupervised Learning

Supervised Learning

Definition

Categories

Unsupervised Learning

Definition

Categories

Cost Function

Definition

Equasion

Visual Image

Two ways to minimize cost function

Gradient Descent

Definition

Algorithm

Caution

Technics about efficiency

Feature Scaling

Mean Normalization

Learning Rate

Normal Equation

Definition

Notice

Linear Regression

Linear Regression for Multiple variables

Gradient Descent behind it

Binary Classification Problem(0 or 1)

Hypothesis

Key Concept

Decision Boundary

Appendix

Expression in equation

Recommend Projects

Recommend Topics

Recommend Org

Jobs