GithubHelp home page GithubHelp logo

lolei / spmf-py Goto Github PK

View Code? Open in Web Editor NEW
61.0 5.0 18.0 43 KB

Python SPMF Wrapper ๐Ÿ ๐ŸŽ

License: GNU General Public License v3.0

Python 100.00%
spmf python wrapper data-mining pattern-mining frequent-patterns sequential-patterns hacktoberfest

spmf-py's Introduction

spmf-py

Python Wrapper for SPMF ๐Ÿ ๐ŸŽ

Information

The SPMF [1] data mining Java library usable in Python.

Essentially, this module calls the Java command line tool of SPMF, passes the user arguments to it, and parses the output.
In addition, transformation of the data to Pandas DataFrame and CSV is possible.

In theory, all algorithms featured in SPMF are callable. Nothing is hardcoded, the desired algorithm and its parameters need to be perused in the SPMF documentation.

Installation

pip install spmf

Usage

Example:

from spmf import Spmf

spmf = Spmf("PrefixSpan", input_filename="contextPrefixSpan.txt",
            output_filename="output.txt", arguments=[0.7, 5])
spmf.run()
print(spmf.to_pandas_dataframe(pickle=True))
spmf.to_csv("output.csv")

Output:

=============  PREFIXSPAN 0.99-2016 - STATISTICS =============
 Total time ~ 2 ms
 Frequent sequences count : 14
 Max memory (mb) : 6.487663269042969
 minsup = 3 sequences.
 Pattern count : 14
===================================================

      pattern sup
0         [1]   4
1      [1, 2]   4
2      [1, 3]   4
3   [1, 3, 2]   3
4   [1, 3, 3]   3
5         [2]   4
6      [2, 3]   3
7         [3]   4
8      [3, 2]   3
9      [3, 3]   3
10        [4]   3
11     [4, 3]   3
12        [5]   3
13        [6]   3

The usage is similar to the one described in the SPMF documentation.
For all Python parameters, see the Spmf class.

SPMF Arguments

The arguments parameter are the arguments that are passed to SPMF and depend on the chosen algorithm. SPMF handles optional parameters as an ordered list. As there are no named parameters for the algorithms, if e.g. only the first and the last parameter of an algorithm are to be used, the ones in between must be filled with "" blank strings.
For advanced usage examples, see examples.

SPMF Executable

Download it from the SPMF Website.
It is assumed that the SPMF binary spmf.jar is located in the same directory as spmf-py. If it is not, either symlink it, or use the spmf_bin_location_dir parameter.

Input Formats

Either use an input file as specified by SPMF, or use one of the in-line formats as seen in examples.

Memory

The maxmimum memory can be increased in the constructor via Spmf(memory=n), where n is megabyte, see SPMF's FAQ.

Background

Why? If you're in a Python pipeline, like a Jupyter Notebook, it might be cumbersome to use Java as an intermediate step. Using spmf-py you can stay in your pipeline as though Java is never used at all.

Bibliography

Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).  
The SPMF Open-Source Data Mining Library Version 2.  
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853,  pp. 36-40.

Disclaimer

Use at your own risk. This repo is not/barely maintained. Use SPMF itself for more robust results.

This module has been tested for a fraction of the algorithms offered in SPMF. Calling them and writing to the output file should be possible for all. Output parsing however should work for those that have outputs like the sequential pattern mining algorithms. It was not tested with other types, some adaption of the output parsing might be necessary.

If something is not working, submit an issue or create a PR yourself!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.