GithubHelp home page GithubHelp logo

zhongyuchen / outlier-detection Goto Github PK

View Code? Open in Web Editor NEW
13.0 0.0 2.0 1.16 MB

Detect outliers with 3 methods: LOF, DBSCAN and one-class SVM

License: Apache License 2.0

Python 100.00%
outlier-detection local-outlier-factor dbscan one-class-svm

outlier-detection's Introduction

Outlier Detection

build status python version Apache License

Detect outliers with 3 methods: LOF, DBSCAN and one-class SVM

Prerequisites

  • Required packages can be installed with the following command:
pip install -r requirements.txt

Data

  • consumption_data.xls is provided. There are 4 columns with 940 entries. The first column denotes entry ID, which is ignored in detecting outliers. Therefore, the data entries are 3-dimensional.
  • Get numpy array data with size [940, 3] with the following code (check out dataset.py for implementation):
from dataset import get_dataset

data = get_dataset()
  • Data visualization:

data

Methods

For detailed descriptions please see report.pdf.

Density based method: LOF (local outlier factor)

  • Check out lof.py for implementation.
  • Result:

lof

Cluster based method: DBSCAN

  • Check out dbscan.py for implementation.
  • Result:

dbscan

Classification based method: One-class SVM

  • Check out svdd.py for implementation.
  • Result with Gaussian kernel:

rbf

  • Result with linear kernel:

linear

Author

Zhongyu Chen

outlier-detection's People

Contributors

zhongyuchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

outlier-detection's Issues

Problem with "reachability distance" formula

Hi, i have read your code and i have to say that it is a beautiful code. but it seems to have a small problem at below line in lof.py file.

https://github.com/czhongyu/outlier-detection/blob/842e14d7c011951728888ec249b4077be3f8f8ec/lof.py#L48

I think it should be: rd += max(dis[index][i], dis[kdn[index][i]][-1]) (max instead of min)

I read some articals about this algorithm, and and they all seem to use max instead of min. For example in this artical: https://en.wikipedia.org/wiki/Local_outlier_factor

And i try to compare your result from your code with sklearn.neighbors.LocalOutlierFactor, when i change min to max, they give the same result.

And by the way, thank you for your great code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.