GithubHelp home page GithubHelp logo

andi611 / apriori-and-eclat-frequent-itemset-mining Goto Github PK

View Code? Open in Web Editor NEW
46.0 3.0 19.0 4.15 MB

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

License: MIT License

Python 100.00%
data-mining data-mining-algorithms apriori apriori-algorithm eclat eclat-algorithm plot frequent-pattern-mining frequent-itemsets frequent-itemset-mining

apriori-and-eclat-frequent-itemset-mining's Introduction

Data Mining: Apriori and Eclat Frequent Itemset Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Implementaions

  • Apriori algorithm
  • Eclat algorithm (recursive method w/ GPU acceleration support)
  • Eclat algorithm (iterative method)

Requirements

  • < Python 3.6+ >
  • < NVIDIA CUDA 9.0 > (Optional)
  • < Pycuda 2018.1.1 > (Optional)
  • < g++ [gcc version 6.4.0 (GCC)] > (Optional)

Environment Setup

sudo pip3 install pycuda
  • Refer here for "CUDA unsupported GNU version" problem, or follow the following steps:
1. sudo apt-get install gcc-6
2. sudo apt-get install g++-6
3. sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 10
4. sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 10

Datasets:

  • ./data/data.txt: suggested min support range: [0.6 0.02]
  • ./data/data2.txt: a harder dataset, only eclat can find results in reasonable time. Suggested min support range: [0.1 0.0002]

Usage

  • To run the Apriori / Cclat algorithm with defaul settings:
python3 runner.py apriori
python3 runner.py eclat
  • Other arguments can be given by:
python3 runner.py [mode] --min_support 0.6 --input_path ./data/data.txt --output_path ./data/output.txt
  • To run Eclat with GPU acceleration (Suggested dataset: data2.txt):
python3 runner.py eclat --min_support 0.02 --input_path ./data/data2.txt --use_CUDA
  • To plot run time v.s. different experiment values:
python runner.py [mode] --plot_support
python runner.py [mode] --plot_support_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --compare_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_thread --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_block --input_path ./data/data2.txt --use_CUDA
  • To test with toy data:
python runner.py [mode] --toy_data
  • To run the eclat algorithm with the iterative method:
python runner.py [mode] --iterative

Apriori minimum support v.s. run time plot

Eclat minimum support v.s. run time plot

Eclat minimum support v.s. run time plot (data2.txt w/ GPU version)

Eclat w/ GPU and w/o GPU comparison plot (data2.txt w/ GPU version)

Reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.