GithubHelp home page GithubHelp logo

csaiprashant / frequent_itemset_mining Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 912 KB

Frequent Itemset Mining Using the Apriori Algorithm

Python 100.00%
apriori-algorithm frequent-itemset-mining frequent-pattern-mining

frequent_itemset_mining's Introduction

Mining a Transaction Database For k+ Frequent Itemsets

Introduction

Frequent pattern mining is a heavily researched area in the field of data mining with a wide range of applications. Mining frequent patterns from large scale databases has emerged as an important problem in data mining and knowledge discovery community. A number of algorithms have been proposed to determine frequent patterns. The Apriori algorithm is the most popular algorithm proposed in this field.

Keywords

  • Support: Support is the probability of having expected k-item set in a transaction.
  • Confidence: Confidence is the conditional probability that a transaction having X also contains Y.
  • Frequent Itemset: The sets of items which have atleast minimum support.
  • Apriori Algorithm: The subsets of a frequent itemset are also frequent - Downward Closure Property.

Problem Statement

The transaction database (transactionDB.txt) is a set of reviewers from Amazon.com. Specifically, reviewer ids are the items. A transaction is a set of all reviewer ids which were used to post a review on that product. Each line in the transaction database represents a transaction. For a given transaction, the items (reviewer ids) are separated by a space character. We are required to find frequent patterns in this dataset having minimum support as specified by the user.

Command to run pattern_mining.py

python pattern_mining.py <min_sup> <k> <input_file> <output_file>

Sample input and output

min_sup = 4, k = 3 This would yield all itemsets appearing atleast 4 times and containig atleast 3 elements. Some results of this query would include the following itemsets:

A37787I8C184FW AWE8HU0AZKASV A3UIATN5XW74NQ (4) 
A3Y9BX5AS769T AWE8HU0AZKASV A3UIATN5XW74NQ (5) 
AZ7I5GAJZA3JO A28R83ADQPMF2X A2GKW94L6HRND7 A2IE7YPWUYZAXS (4)

The first itemset is a frequent 3-itemset having a support count of 4 (i.e., appears 4 times in the data, satisfies the min_sup = 4 and hence is frequent). The second itemset is a frequent 3-itemset with a support of 5 (i.e., it appears 5 times in the transaction database). The last itemset is a frequent 4-itemset with a support of 4.

Files in this Repository

  • pattern_mining.py - Python 3.6 script which contains my implementation of the Apriori algorithm.
  • README.md
  • transactionDB - The transaction database of Amazon.com product reviews.

frequent_itemset_mining's People

Contributors

csaiprashant avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

khanhbrandy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.