GithubHelp home page GithubHelp logo

chen0040 / java-sequential-pattern-mining Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 2.0 63 KB

Package provides java implementation of sequential pattern mining algorithms

License: MIT License

Shell 17.41% Batchfile 13.44% Java 69.15%
sequential-patterns sequential-data pattern-discovery java

java-sequential-pattern-mining's Introduction

java-sequential-pattern-mining

Package provides java implementation of sequential pattern mining algorithm GSP

Build Status Coverage Status

Overview of GSP

The implementation of the algorithm is based on Srikant & Agrawal, 1996

The algorithm makes multiple passes over the data. The first pass determines the support of each item, that is, the number of data-sequences that include the item. At the end of the first pas, the algorithm knows which items are frequent, that is, have minimum support. Each such item yields a 1-element frequent sequence consisting of that item.

Each subsequent pass starts with a seed set: the frequent sequences found in the previous pass. The seed set is used to generate new potentially frequent sequences, called candidate sequences. Each candidate sequence has one more item than a seed sequence; so all the candidate sequences in a pass will have the same number of items. The support for these candidate sequences is found during the pass over the data. At the end of the pass, the algorithm determines which of the candidate sequences are actually frequent. These frequent candidates become the seed for the next pass.

Install

Add the following dependency to your POM file:

<dependency>
  <groupId>com.github.chen0040</groupId>
  <artifactId>java-sequential-pattern-mining</artifactId>
  <version>1.0.1</version>
</dependency>

Usage

The sample code belows illustrates how to use the GSP to find the frequent sequential pattern in a simple sequence database.

List<Sequence> database = new ArrayList<>();

// Below is 4 sequences of transactions stored in the database 
/*
S1 	(1), (1 2 3), (1 3), (4), (3 6)
S2 	(1 4), (3), (2 3), (1 5)
S3 	(5 6), (1 2), (4 6), (3), (2)
S4 	(5), (7), (1 6), (3), (2), (3)
*/

database.add(Sequence.make("1", "1,2,3", "1,3", "4", "3,6"));
database.add(Sequence.make("1,4", "3", "2,3", "1,5"));
database.add(Sequence.make("5,6", "1,2", "4,6", "3", "2"));
database.add(Sequence.make("5", "7", "1,6", "3", "2", "3"));

GSP method = new GSP();
method.setMinSupportLevel(2);
List<String> uniqueItems = new MetaData(database).getUniqueItems();
Sequences result = method.minePatterns(database, uniqueItems, -1);

result.getSequences().stream().forEach(sequence -> {
 System.out.println("sequence: " + sequence);
});

java-sequential-pattern-mining's People

Contributors

chen0040 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.