GithubHelp home page GithubHelp logo

00mjk / scamp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zpzim/scamp

0.0 0.0 0.0 45.04 MB

CPU/GPU Implementation of the SCAMP algorithm for computing the matrix profile

Home Page: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

License: MIT License

Cuda 5.35% C++ 68.71% Python 11.47% MATLAB 2.03% Shell 7.95% Dockerfile 0.19% CMake 3.70% C 0.60%

scamp's Introduction

Travis Build Status Docker Build Status RTD Build Status

SCAMP: SCAlable Matrix Profile

Table of Contents

Overview
Documentation
Performance
Python Module
Run Using Docker
Distributed Operation
Reference

Overview

This is a GPU/CPU implementation of the SCAMP algorithm. SCAMP takes a time series as input and computes the matrix profile for a particular window size. You can read more at the Matrix Profile Homepage This is a much improved framework over GPU-STOMP which has the following additional features:

  • Tiling for large inputs
  • Computation in fp32, mixed fp32/fp64, or fp64 (double is recommended for most datasets, single precision will work for some)
  • fp32 version should get good performance on GeForce cards
  • AB joins (you can produce the matrix profile from 2 different time series)
  • Distributable (we use GCP but other cloud platforms can work) with verified scalability to billions of datapoints
  • More types of matrix profiles! See the Docs!
  • Extremely Efficient Implementation
  • Extensible to adding optimized versions of custom join operations.
  • Can compute joins with the CPU (Only enabled for double precision and does not support all-neighbors joins or distance matrix summaries yet)
  • Handles NaN input values. The matrix profile will be computed while excluding any subsequence with a NaN value
  • Python module: Use SCAMP in Python with pyscamp

Documentation

SCAMP's documentation can be found at readthedocs.

Performance

SCAMP is extremely fast, especially on Tesla series GPUs. I belive this repository contains the fastest code in existance for computing the matrix profile. If you find a way to improve the speed of SCAMP, or compute matrix profiles any faster than SCAMP does, please let me know, I would be glad to point to your work and incorporate any improvements that can be made to SCAMP.

More details on the performance of SCAMP can be found in the documentation.

Python module

A source distribution for a python3 module using pybind11 is available on pypi.org to install run:

# Python 3; python 2 can work but is unsupported
# cmake is required (if you don't have it you can pip install cmake)
pip install pyscamp

then you can use SCAMP in Python as follows:

import pyscamp as mp # Uses GPU if available and CUDA was available during the build

# Allows checking if pyscamp was built with CUDA and has GPU support
has_gpu_support = mp.gpu_supported()

# Self join
profile, index = mp.selfjoin(a, sublen)
# AB join using 4 threads, outtputing pearson correlation.
profile, index = mp.abjoin(a, b, sublen, pearson=True, threads=4)

More information and the API documentation for pyscamp is available on readthedocs

Run Using Docker

You can run SCAMP via nvidia-docker using the prebuilt image on dockerhub.

In order to expose the host GPUs nvidia-docker must be installed correctly. Please follow the directions provided on the nvidia-docker github page. The following example uses docker 19.03 functionality:

docker pull zpzim/scamp:latest
docker run --gpus all \
   --volume /path/to/host/input/data/directory:/data \
   --volume /path/to/host/output/directory:/output \
   zpzim/scamp:latest /SCAMP/build/SCAMP \
   --window=<window_size> --input_a_file_name=/data/<filename> \
   --output_a_file_name=/output/<mp_filename> \
   --output_a_index_file_name=/output/<mp_index_filename>

Distributed Operation

We have a client/server architecture built using grpc. Tested on GKE but should be possible to get working on Amazon EKS as well.

For more information on how to use the scamp client and server, please take a look at the documentation

Reference

If you use SCAMP in your work, please reference the following paper:

Zimmerman, Zachary, et al. "Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond." Proceedings of the ACM Symposium on Cloud Computing. 2019.

scamp's People

Contributors

jrecuerda avatar kavj avatar ucrdavies avatar zpzim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.