Bug in Logistic Regression cost gradient

BigOptim – Large Scale Finite Sums Cost functions Optimization for R

Description

BigOptim is an R package that implements the Stochastic Average Gradient(SAG)[1] optimization method. For strongly convex problems, SAG achieves batch gradient descent convergence rates while keeping the iteration complexity of stochastic gradient descent. This allows for efficient training of machine learning algorithms with convex cost functions.

Setup

install.packages("devtools")
devtools::install_github("hadley/devtools")  ## Optional
devtools::install_github("IshmaelBelghazi/bigoptim")

Example: Fit with Linesearch

## Loading Data set
data(covtype.libsvm)
## Normalizing Columns and adding intercept
X <- cbind(rep(1, NROW(covtype.libsvm$X)), scale(covtype.libsvm$X))
y <- covtype.libsvm$y
y[y == 2] <- -1
## Setting seed
#set.seed(0)
## Setting up problem
maxiter <- NROW(X) * 10  ## 10 passes throught the dataset
lambda <- 1/NROW(X) 
sag_ls_fit <- sag_fit(X=X, y=y, lambda=lambda,
                      maxiter=maxiter, 
                      tol=1e-04, 
                      family="binomial", 
                      fit_alg="linesearch",
                      standardize=FALSE)
## Getting weights
weights <- coef(sag_ls_fit)
## Getting cost
cost <- get_cost(sag_ls_fit)

Example: Demo – Monitoring gradient norm

demo("monitoring_training")

Runtime comparison

Ran on intel i7 4710HQ 16G with intel MKL and compilers.

demo("run_times")

Dense dataset: Logistic regression on covertype

Logistic Regression on Covertype – 581012 sample points, 55 variables

	constant	linesearch	adaptive	glmnet
Cost at optimum	0.513603	0.513497	0.513676	0.513693
Gradient L2 norm at optimum	0.001361	0.001120	0.007713	0.001806
Approximate gradient L2 norm at optimum	0.001794	0.000146	0.000214	NA
Time(seconds)	1.930	2.392	8.057	8.749

Sparse dataset: Logistic regression on rcv1_train

Logistic Regression on RCV1_train – 20242 sample points, 47237 variables

	constant	linesearch	adaptive	glmnet
Cost at optimum	0.046339	0.046339	0.046339	0.046342
Gradient L2 norm at optimum	3.892572e-07	4.858723e-07	6.668943e-10	7.592185e-06
Approximate gradient L2 norm at optimum	3.318267e-07	4.800463e-07	2.647663e-10	NA
Time(seconds)	0.814	0.872	1.368	4.372

References

[1] Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388 [cs, math, stat], September 2013. arXiv: 1309.2388. [ bib | http ]

	YEAR: <2015>
	COPYRIGHT HOLDER: <Mohamed Ishmael Diwan Belghazi>

ishmaelbelghazi / bigoptim Goto Github PK

bigoptim's Introduction

BigOptim – Large Scale Finite Sums Cost functions Optimization for R

Description

Setup

Example: Fit with Linesearch

Example: Demo – Monitoring gradient norm

Runtime comparison

Dense dataset: Logistic regression on covertype

Sparse dataset: Logistic regression on rcv1_train

References

bigoptim's People

Contributors

Stargazers

Watchers

Forkers

bigoptim's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs