GithubHelp home page GithubHelp logo

andi611 / apriori-and-eclat-frequent-itemset-mining Goto Github PK

View Code? Open in Web Editor NEW
46.0 3.0 19.0 4.15 MB

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

License: MIT License

Python 100.00%
data-mining data-mining-algorithms apriori apriori-algorithm eclat eclat-algorithm plot frequent-pattern-mining frequent-itemsets frequent-itemset-mining

apriori-and-eclat-frequent-itemset-mining's Introduction

Data Mining: Apriori and Eclat Frequent Itemset Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Implementaions

  • Apriori algorithm
  • Eclat algorithm (recursive method w/ GPU acceleration support)
  • Eclat algorithm (iterative method)

Requirements

  • < Python 3.6+ >
  • < NVIDIA CUDA 9.0 > (Optional)
  • < Pycuda 2018.1.1 > (Optional)
  • < g++ [gcc version 6.4.0 (GCC)] > (Optional)

Environment Setup

sudo pip3 install pycuda
  • Refer here for "CUDA unsupported GNU version" problem, or follow the following steps:
1. sudo apt-get install gcc-6
2. sudo apt-get install g++-6
3. sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 10
4. sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 10

Datasets:

  • ./data/data.txt: suggested min support range: [0.6 0.02]
  • ./data/data2.txt: a harder dataset, only eclat can find results in reasonable time. Suggested min support range: [0.1 0.0002]

Usage

  • To run the Apriori / Cclat algorithm with defaul settings:
python3 runner.py apriori
python3 runner.py eclat
  • Other arguments can be given by:
python3 runner.py [mode] --min_support 0.6 --input_path ./data/data.txt --output_path ./data/output.txt
  • To run Eclat with GPU acceleration (Suggested dataset: data2.txt):
python3 runner.py eclat --min_support 0.02 --input_path ./data/data2.txt --use_CUDA
  • To plot run time v.s. different experiment values:
python runner.py [mode] --plot_support
python runner.py [mode] --plot_support_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --compare_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_thread --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_block --input_path ./data/data2.txt --use_CUDA
  • To test with toy data:
python runner.py [mode] --toy_data
  • To run the eclat algorithm with the iterative method:
python runner.py [mode] --iterative

Apriori minimum support v.s. run time plot

Eclat minimum support v.s. run time plot

Eclat minimum support v.s. run time plot (data2.txt w/ GPU version)

Eclat w/ GPU and w/o GPU comparison plot (data2.txt w/ GPU version)

Reference

apriori-and-eclat-frequent-itemset-mining's People

Contributors

andi611 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

apriori-and-eclat-frequent-itemset-mining's Issues

Data context required

Hi @andi611 ,

I loved the implementation! I was wondering if you could describe the data in the two input files?

Thanks a lot!
Shivam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.