GithubHelp home page GithubHelp logo

kyosek / change-point-detection-kl-divergence Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 1.33 MB

Change point detection using KL divergence

Python 100.00%
kl-divergence bitcoin-price price-changes monitoring

change-point-detection-kl-divergence's Introduction

Change Point Detection - KL Divergence

When we build a machine learning model, we validate and test its accuracies/performance in various ways. Although the accuracy is quite high in the test set, the prediction will eventually drift by time. This is often caused by the future data "drift" to a different state from the training dataset. But how can we detect this drift effectively? This blogpost shows using direct ratio estimation to detect the change point with Bitcoin daily price data. This project is documented in my medium blog and Bitcoin price data are from CoinMarketCap

Table of contents

Requirements

matplotlib == 3.1.2
numpy == 1.18.1
pandas == 0.23.4
scipy == 1.4.1
seaborn == 0.9.0

Data

It includes dates, open, high, low, close prices, volumes and market caps of the date (all the prices are in USD) from 2013–04–29 until 2020–03–28.

Here is the historical day to day price plot of Bitcoin.

price

KL Divergence to detect the state change

we will use a fixed window in the train set. So that we can calculate how (not) train set and unseen data are similar.

Here suppose we built a machine learning model that can predict the Bitcoin close price of the next day by using its data from 2013–04–29 until the end of 2018. So we will compare some subsequence from the train set and unseen future prices. Here we will try to see the model drift on a monthly level, thus set the window as 30.

from scipy.stats import entropy
from datetime import date, timedelta

start_date = date(2019,2,1)
end_date = date(2020,3,28)
delta = timedelta(days=1)

kl = []
date = []

while start_date <= end_date:
    kld = entropy(np.array(df[df['date'] < 
                  '2019-01-01'].close.tail(30)), 
                   np.array(df[df['date'] <= 
                   start_date].close.tail(30)))
    kl += [kld]
    date += [start_date]
    start_date+=delta

kl = pd.DataFrame(list(zip(date,kl)), columns=['date','kl'])

Here is the plot of the estimated KL Divergence and Bitcoin price in 2019.

kl

close2019

Comparing to the actual price changes, we notice that when the price sores a lot in a short time of period, the estimated value also spikes. It is interesting that even though the Bitcoin price is an upward trend since the beginning of 2019 until it hit its highest value around July 2019 and then became downtrend, the estimated KL divergence values show 'constant waves' until 2020. It might be because the distribution of the train set already takes into the volatility of Bitcoin price and the theoretical constant inflation account so that the estimated KL divergence values are not really responding to a gradual increase of the price but rather abrupt price change.

Also we can apply Bolinger Bands to have a threshold for the decision making.

kl-bolinger

Conclusion and Next Steps

  • CPD aims to detect the dissimilarity of two subsequences in the time-series probability distributions.
  • As both of KL divergence and RuLSIF are not normalised ratios and don't have a clear threshold to determine the "state change". So it is important to determine an appropriate rule to detect the change.
  • The length of the window is also one of the key points for this method. When you set the window small, it is more sensitive to changes. It can detect even small change quickly however this also can capture false-positive cases - like just a temporal drift of the variables. On the other hand, when you set a bigger window, its estimation would be more stable and get less false-positive cases. Yet it would take some period of time to actually detect the state change.
  • We don't know what caused the state change by merely checking the target variable and often our interest is what caused the change. One of the solutions for this is monitoring all the features in the model so that we can actually see which part of the model is changing.

change-point-detection-kl-divergence's People

Contributors

dependabot[bot] avatar kyosek avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.