GithubHelp home page GithubHelp logo

masatoi / cl-online-learning Goto Github PK

View Code? Open in Web Editor NEW
50.0 8.0 3.0 173 KB

A collection of machine learning algorithms for online linear classification written in Common Lisp

License: MIT License

Common Lisp 100.00%
common-lisp machine-learning perceptron classifier

cl-online-learning's Introduction

Cl-Online-Learning

http://quickdocs.org/badge/cl-online-learning.svg https://github.com/masatoi/cl-online-learning/workflows/CI/badge.svg

A collection of machine learning algorithms for online linear classification written in Common Lisp.

Implemented algorithms

Binary classifier

  • Perceptron
  • AROW (Crammer, Koby, Alex Kulesza, and Mark Dredze. “Adaptive regularization of weight vectors.” Advances in neural information processing systems. 2009.)
  • SCW-I (Soft Confidence Weighted) (Wang, Jialei, Peilin Zhao, and Steven C. Hoi. “Exact Soft Confidence-Weighted Learning.” Proceedings of the 29th International Conference on Machine Learning (ICML-12). 2012.)
  • Logistic Regression with SGD or ADAM optimizer (Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic optimization.” ICLR 2015)

Multiclass classifier

  • one-vs-rest ( K binary classifier required )
  • one-vs-one ( K*(K-1)/2 binary classifier required )

Command line tools

Installation

cl-online-learning is available from Quicklisp.

(ql:quickload :cl-online-learning)

When install from github repository,

cd ~/quicklisp/local-projects/
git clone https://github.com/masatoi/cl-online-learning.git

When using Roswell,

ros install masatoi/cl-online-learning

Usage

Prepare dataset

A data point is a pair of a class label (+1 or -1) and a input vector. Both of them have to be declared as single-float.

And dataset is represented as a sequence of data points. READ-DATA function is available to make a dataset from a sparse format used in LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/). This function requires the number of features of that dataset.

;; Number of features
(defparameter a1a-dim 123)

;; Read dataset from file
(defparameter a1a
  (clol.utils:read-data
   (merge-pathnames #P"t/dataset/a1a" (asdf:system-source-directory :cl-online-learning))
   a1a-dim))

;; A data point
(car a1a)

; (-1.0
;  . #(0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
;     1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
;     0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
;     1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
;     1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
;     0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
;     0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0))

Define learner

A learner object is just a struct, therefore their constructor is available to make it.

(defparameter arow-learner (clol:make-arow a1a-dim 10))

Update and Train

To update the model destructively with one data point, use an update function corresponding to the model type.

(clol:arow-update arow-learner
                  (cdar a1a)  ; input
                  (caar a1a)) ; label

TRAIN function can be used to learn the dataset collectively.

(clol:train arow-learner a1a)

It may be necessary to call this function several times until learning converges. For now, the convergence test has not been implemented yet.

Predict and Test

(clol:arow-predict arow-learner (cdar a1a))
; => -1.0

(clol:test arow-learner a1a)
; Accuracy: 84.85981%, Correct: 1362, Total: 1605

Multiclass classification

For multiclass data, the label of the data point is an integer representing the index of the class. READ-DATA function with MULTICLASS-P keyword option is available for make such a dataset.

(defparameter iris-dim 4)

; A dataset in which a same label appears consecutively need to be shuffled
(defparameter iris
  (clol.utils:shuffle-vector
   (coerce (clol.utils:read-data
            (merge-pathnames #P"t/dataset/iris.scale"
                             (asdf:system-source-directory :cl-online-learning))
            iris-dim :multiclass-p t)
	   'simple-vector)))

(defparameter iris-train (subseq iris 0 100))
(defparameter iris-test (subseq iris 100))

ONE-VS-REST and ONE-VS-ONE are available for multiclass classification by using multiple binary classifiers. In many cases, ONE-VS-ONE is more accurate, but it requires more computational resource as the number of classes increases.

;; Define model
(defparameter arow-1vs1
  (clol:make-one-vs-one iris-dim      ; Input data dimension
                        3             ; Number of class
                        'arow 0.1)) ; Binary classifier type and its parameters

;; Train and test model
(clol:train arow-1vs1 iris-train)
(clol:test  arow-1vs1 iris-test)
; Accuracy: 98.0%, Correct: 49, Total: 50

Sparse data

For sparse data (most elements are 0), the data point is a pair of a class label and a instance of SPARSE-VECTOR struct, and a learner with SPARSE- prefix is used. READ-DATA function with SPARSE-P keyword option is available for make such a dataset.

For example, news20.binary data has too high dimensional features to handle with normal learners. However, by using the sparse version, the learner can be trained with practical computational resources.

(defparameter news20.binary-dim 1355191)
(defparameter news20.binary (clol.utils:read-data "/path/to/news20.binary" news20.binary-dim :sparse-p t))

(defparameter news20.binary.arow (clol:make-sparse-arow news20.binary-dim 10))
(time (loop repeat 20 do (clol:train news20.binary.arow news20.binary)))
;; Evaluation took:
;;   1.527 seconds of real time
;;   1.526852 seconds of total run time (1.526852 user, 0.000000 system)
;;   100.00% CPU
;;   5,176,917,149 processor cycles
;;   11,436,032 bytes consed
(clol:test news20.binary.arow news20.binary)
; Accuracy: 99.74495%, Correct: 19945, Total: 19996

In a similar way, the sparse version learners are also available in multiclass classification.

(defparameter news20-dim 62060)
(defparameter news20-train (clol.utils:read-data "/path/to/news20.scale" news20-dim :sparse-p t :multiclass-p t))
(defparameter news20-test (clol.utils:read-data "/path/to/news20.t.scale" news20-dim :sparse-p t :multiclass-p t))
(defparameter news20-arow (clol:make-one-vs-rest news20-dim 20 'sparse-arow 10))
(loop repeat 12 do (clol:train news20-arow news20-train))
(clol:test news20-arow news20-test)
; Accuracy: 86.90208%, Correct: 3470, Total: 3993

Save/Restore model

For saving a learner model to a file or restoring from the model file, SAVE and RESTORE function are available respectively. For the above multiclass classification example, saving / restoring code would be:

;; Save
(clol:save arow-1vs1 #P"/tmp/iris.model")
;; Restore
(defparameter restored-learner (clol:restore #P"/tmp/iris.model"))

(clol:test restored-learner iris-test)
; Accuracy: 98.0%, Correct: 49, Total: 50

Author

Satoshi Imai ([email protected])

Licence

This software is released under the MIT License, see LICENSE.txt.

cl-online-learning's People

Contributors

guicho271828 avatar masatoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.