MySQL Apriori

This is an implementation of the Apriori algorithm in MySQL, initially developed as part of a database project for the Databases exam at the University of Pisa.

The Apriori algorithm is an algorithm for frequent ItemSet mining and association rule learning. It is used to discover frequent ItemSets from a transactional database and generate association rules based on the discovered ItemSets.

Algorithm Steps

The Apriori algorithm implemented in this code follows these steps:

Extract the names of the items, these are the 1-ItemSets;
Calculate the support for each 1-ItemSet. Insert the frequent 1-ItemSet into the Large_ItemSet_1 table.
For each ItemSet size from k=2 to the maximum ItemSet size:
- Generate the table C containing the candidate ItemSets.
- Prune the candidate ItemSets by calculating their support and inserting the frequent ItemSets into a new table Large_ItemSet_k.
- If the Large_ItemSet_k table is empty, to to the next step.
Calculate the confidence for each associative rule based on the frequent ItemSets in the last Large_ItemSet_k.

To learn more about the algorithm:

Transaction table structure

The transaction table must have the following format:

ID	Item_1_name	Item_2_name	...	Item_n_name
1	1	1	...	0
2	0	1	...	1
3	1	1	...	0

In the repository, the file Groceries_Dataset.sql contains the Groceries Dataset. The procedure contained in the file CreateTransactionTable.sql allows you to generate the transaction table using the table containing the Groceries Dataset.

Getting Started

Clone the repository:

git clone https://github.com/sirius-0/apriori-mysql.git

Connect to your MySQL server using a client
Create a new database where you want to run the Apriori algorithm
Import the Groceries_Dataset.sql
Import the CreateTransactiontable.sql
Import the Apriori.sql
Create the transaction table T running the CreateTransactionTable procedure

Usage

To run the Apriori algorithm, use the following syntax:

CALL Apriori(transactionTableName, supportThreshold, ItemSetSize);

transactionTableName: The name of the table containing the transaction data. The table should have one column for each item and a row for each transaction.
supportThreshold: The minimum support threshold for an ItemSet to be considered frequent. It should be a number between 0 and 1.
ItemSetSize: The maximum size of the ItemSets to be generated.

Example:

CALL Apriori('T', 0.5, 3);

This will run the Apriori algorithm on the transactions table with a support threshold of 0.5 and generate ItemSets up to size 3.

Final notes

This implementation is not optimized and is extremely slow.

Introducing indexes on the transaction table and Large_ItemSet_k tables could speed up the generation of candidate ItemSets and the support calculation, but introducing indexes has some problems:

InnoDB supports up to 64 secondary indexes per table, which might not be enough if the number of Items is too high;
You could dynamically add and drop indexes while executing the Apriori procedure but modifying the information schema is onerous and would perhaps affect performance more than the introduction of indexes improves it;

To solve the indexing problem, one could switch to a different representation of the transaction table, such as the Compressed Sparse Row representation.

pingdred / apriori-mysql Goto Github PK

apriori-mysql's Introduction

MySQL Apriori

Algorithm Steps

Transaction table structure

Getting Started

Usage

Final notes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs