GithubHelp home page GithubHelp logo

atenearesearchgroup / cep-preprocessing Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 82.2 MB

Complex event processing for data stream preprocessing

Java 100.00%
cep data-stream-mining preprocessing rules

cep-preprocessing's Introduction

CEP for data stream preprocessing

This repository contains additional material for the paper entitled Rule-based preprocessing for data stream mining using complex event processing, currently under revision.

The repository includes code and datasets for reproducibility purposes, as well as extended experimental results and statistical analysis.

Requirements/dependencies

The code is provided as a Maven project, which has been developed in Eclipse v.2019-03 with Java 8.

The following dependecies are required:

  • Esper 8, a Java implementation of a CEP engine (already configured as Maven dependency)
  • MOA 19.03, a Java library for massive online analysis (available in Github)

Some classes related to instances in MOA have been modified to allow external manipulation, they can be found in the folder code/dependencies/moa.

In order to use MOA timing functionalities, the sizeofag dependency has to be properly configured. If you run the code outside an IDE, you might need to specify it as a JVM parameter, e.g., in Windows:

java -javaagent:"C:\Users\YourUserName\.m2\repository\com\github\fracpete\sizeofag\1.0.4\sizeofag-1.0.4.jar"

Datasets

The data streams used in the experiments are generated from public datasets, currently available from UCI and OpenML repositories:

  • Electricity dataset, which includes electricity price and demand records to predict whether electricity price will rise or not.
  • Airlines dataset, which contains flight schedules in American airports from which delays can be predicted.
  • Occupancy dataset, which provides sensor measurements to predict room occupancy.

The two first datasets can be found as part of MOA distribution in Sourceforge too.

Running experiments

For each experiment, a CEP service has been developed following these steps:

  1. Register preprocessing rules in the CEP engine.
  2. Load instances from an ARFF dataset.
  3. Convert MOA instances to CEP events and send them to the engine.
  4. Apply CEP rules and process their outcomes in the subscribers.
  5. Convert the derived events into MOA instances and write them in an ARFF dataset.

The CEP services can be found in the package es.uma.atenea.cepdm.service. Each one includes a demo program that applies some of the preprocessing rules described in the paper. The resulting data streams are available in the folder datasets.

For experimental purposes, classification algorithms are executed using ARFF datasets produced by CEP as inputs. The main program to run six classification algorithms (Hoeffding tree, kNN, naive Bayes, rule-based classifier and two ensemble methods) is available in the package es.uma.atenea.cepdm.learning (see MainMOAClassificationExperiment).

For the electricity case study, we provide an additional main program that applies both preprocessing and learning every time a new instance arrives. It can be found in the package es.uma.atenea.cepdm.service.electricity (see CEPOnlineServiceElectricity).

Results and statistical analysis

The folder results contains a spreadsheet for each experiment with the following classification measures: accuracy, precision, recall and F1. The metrics are computed in absolute terms (total in file names) and in windows (window in file names). Time and memory results are also detailed.

Accuracy results are provided as CSV files, from which Kruskal-Wallis and Wilcoxon statistical tests have been run in R. The outputs of the tests have been dumped to text files, which are provided too.

cep-preprocessing's People

Contributors

aurorarq avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.