jirifilip / pyarc Goto Github PK

An implementation of CBA (Classification Based on Assocation) algorithm

Python 0.13% Jupyter Notebook 99.23% TeX 0.63% R 0.01%

pyarc's Introduction

pyARC

pyARC is an implementation of CBA (Classification Based on Assocation) algorithm introduced in

Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp 80-86.

In addition, pyARC contains the implementation of QCBA (Quantitative CBA) algorithm introduced in

KLIEGR, Tomas. Quantitative CBA: Small and Comprehensible Association Rule Classification Models. arXiv preprint arXiv:1711.10166, 2017.

The use of QCBA algorithm is demonstrated in this jupyter notebook.

The fim package is used for the rule generation step.

If you find this package useful in your research, please cite (EasyChair link):

 @techreport{filip2018classification,
  title={Classification based on Associations (CBA)-a performance analysis},
  author={Filip, Ji{\v{r}}{\'\i} and Kliegr, Tom{\'a}{\v{s}}},
  year={2018},
  institution={EasyChair}
}

Installation

pip install pyarc

For using pyARC, the fim package needs to be installed (refer to http://www.borgelt.net/pyfim.html for installation guide).

Testing

python -m unittest discover -s pyarc/test  -p '*test_*.py'

Examples

Simplest example

from pyarc import CBA, TransactionDB
import pandas as pd

data_train = pd.read_csv("iris.csv")
data_test = pd.read_csv("iris.csv")

txns_train = TransactionDB.from_DataFrame(data_train)
txns_test = TransactionDB.from_DataFrame(data_test)


cba = CBA(support=0.20, confidence=0.5, algorithm="m1")
cba.fit(txns_train)

accuracy = cba.rule_model_accuracy(txns_test)

Using top_rules function to mine the best rules possible

from pyarc import TransactionDB
from pyarc.algorithms import (
    top_rules,
    createCARs,
    M1Algorithm,
    M2Algorithm
)
import pandas as pd


data_train = pd.read_csv("iris.csv")
data_test = pd.read_csv("iris.csv")

txns_train = TransactionDB.from_DataFrame(data_train)
txns_test = TransactionDB.from_DataFrame(data_test)

# get the best association rules
rules = top_rules(txns_train.string_representation)

# convert them to class association rules
cars = createCARs(rules)

classifier = M1Algorithm(cars, txns_train).build()
# classifier = M2Algorithm(cars, txns_train).build()

accuracy = classifier.test_transactions(txns_test)

pyarc's People

Contributors

Stargazers

Watchers

pyarc's Issues

init file contains incorrect import

from .cba import CBA

should probably be

 from . import CBA

 from .CBA import CBA

(case sensitive file names on Linux)

how to get best prediction( best rules) when number of rules is limited to target_rule_count value

Hello ,
as stated in
Associative Classification in R: arc,
arulesCBA, and rCBA
by Michael Hahsler, Ian Johnson, Tomáš Kliegr and Jaroslav Kuchaˇr
https://journal.r-project.org/archive/2019/RJ-2019-048/RJ-2019-048.pdf

'''
Automatic threshold tuning. Association rule learning is notorious for how difficult it is to set the
minimum support and minimum confidence thresholds. The necessity to set these thresholds applies
also to CBA. The arc package contains an optional procedure for automatic setting of these thresholds
detailed in (Kliegr and Kuchar, 2019) . The package contains a wrapper for the apriori function from
the arules package that iterative changes mining parameters (maximum antecedent length, minimum
support threshold and minimum confidence threshold) until a desired number of rules is obtained, all
options are exhausted or a preset time limit is reached. The desired number of rules can be specified
by the target_rule_count parameter.

'''
Is python code supports : "The desired number of rules can be specified
by the target_rule_count parameter"
So I try to get best prediction / best rules when number of rules is limited to target_rule_count value

Can be CBA used or top_rules (Function for finding the best n (target_rule_count) rules from transaction list).
I try to set target_rule_count = 2 , but still get many rules?
, something like this example:

'''
from pyarc import CBA
from pyarc.data_structures import (
TransactionDB
)

from pyarc.algorithms import (
top_rules,
createCARs,
M1Algorithm,
generateCARs,
)
from sklearn.metrics import confusion_matrix

header1 = ['F1' ,'F2','F3','F4' ,'F5','F6', "F7", 'F8' ,'Target']

rows1 = [
[1, 1, 0, 0, 0, 0, 1, 0,1],
[1, 1, 0, 0, 0, 0, 1, 1,0],
[1, 1, 0, 0, 0, 0, 1, 1,0],
[0, 1, 1, 0, 0, 0, 1, 1,0],
[1, 1, 1, 0, 0, 0, 1, 0,1],
[1, 1, 1, 0, 0, 0, 1, 1,0],
[0, 1, 1, 0, 0, 0, 1, 1,1],
[1, 1, 1, 0, 0, 0, 1, 1,0],
[1, 1, 0, 1, 0, 1, 1, 0,0],
[1, 0, 0, 1, 1, 1, 1, 1,0],
[1, 0, 1, 1, 1, 1, 1, 0,1],
[1, 1, 0, 1, 1, 1, 1, 1,0],
[0, 1, 1, 1, 1, 1, 1, 1,0],
[1, 0, 1, 1, 0, 1, 1, 1,0],
[1, 0, 0, 1, 0, 1, 1, 1,0],
[0, 0, 0, 1, 0, 1, 1, 1,0],
[1, 0, 0, 1, 0, 1, 1, 1,0],
[0, 1, 0, 0, 0, 0, 1, 1,0],
[0, 1, 0, 0, 1, 0, 0, 0,1],
[1, 0, 0, 1, 1, 1, 0, 0,1],
[0, 1, 0, 1, 1, 1, 0, 0,1],
[0, 1, 0, 1, 0, 1, 0, 0,1],
]

target = [x[-1] for x in rows1]
transactions = TransactionDB(rows1, header1)

#cba = CBA(confidence= 0.7)
cba = CBA(support=0.20, confidence=0.8, algorithm="m1" , maxlen = 6) #good
#cba = CBA(support=0.60, confidence=0.8, algorithm="m1" , maxlen = 3)

cba.fit(transactions)
print(cba.predict(transactions) )
print('rules')
[print(x) for x in cba.clf.rules]
cba.clf.rules
print('cba.clf.default_class')
print(cba.clf.default_class)
cba.clf.default_class_attribute
cba.clf.default_class_support
print('default_class_confidence' , cba.clf.default_class_confidence)
#print('\n *** predict_matched_rules ***')
#[print(x) for x in cba.predict_matched_rules(transactions) ]

print('\n predict_probability')
print([int(x * 100) for x in cba.predict_probability(transactions) ])

cars = generateCARs(transactions , maxlen= 5, support= 20 , confidence = 30 )
cars = generateCARs(transactions , maxlen= 2, support= 20 , confidence = 30 )
cars = generateCARs(transactions , maxlen= 1, support= 20 , confidence = 30 )
cars = generateCARs(transactions , maxlen= 1, support= 20 , confidence = 30 )

rules = top_rules(transactions.string_representation ,
init_conf = 0.4, conf_step = 0.1 ,
init_support = 20, supp_step = 5 ,
minlen = 2, init_maxlen = 4, target_rule_count = 2 , total_timeout=10000. )
'''
len(rules)
1085
'''

cars = createCARs(rules)

'''
len(cars)
1085
'''
print('createCARs : number of Class Association Rules after optimization = ' , len(cars))
ARC_classifier = M1Algorithm(cars, transactions).build()

train_ARC_accuracy = ARC_classifier.test_transactions(transactions)
#m1clf = classifier.build()
train_predicted_ARC = ARC_classifier.predict_all(transactions)
train_predicted_ARC = [int(x) for x in train_predicted_ARC]
train_ARC_classifier_predict_probability = ARC_classifier.predict_probability_all(transactions)
print( 'TRAIN confusion_matrix for ARC)' )
print(confusion_matrix(target, train_predicted_ARC))

'''
example
rules[22]
('Target:=:0', ('F5:=:0', 'F2:=:1', 'F7:=:1'), 0.3181818181818182, 0.7)
rules[23]
('F2:=:1', ('Target:=:0', 'F5:=:0'), 0.3181818181818182, 0.6363636363636364)
'''

'''
def top_rules(transactions,
appearance={},
target_rule_count=1000,
init_support=0.,
init_conf=0.5,
conf_step=0.05,
supp_step=0.05,
minlen=2,
init_maxlen=3,
total_timeout=100.,
max_iterations=30):
"""Function for finding the best n (target_rule_count)
rules from transaction list
< or > ??

    if (rule_count >= target_rule_count):
        flag = False
        print("Target rule count satisfied:", target_rule_count)

'''
q=0

'''

Thank you very much in advance

predict_probability not = confidence ?

seems to be predict_probability in
cba.predict_probability(transactions)
is different from confidence of rule used for particular data observation
can you share algorithm how predict_probability is calculated ?

is it correct code: cars = generateCARs(txns_train , maxlen= 3, support= 0.1 , confidence = 0.2 )

1
is it correct code to set key parameters ?
2
by the way is it possible to set max number of rules?
or some recommendations how to gt less rules for the approximately same confusion matrix (or F1)
3
what to do with unbalanced data?
4
did you compared performance with with corels ?
https://github.com/fingoldin/pycorels

5
how to set better accuracy by worse recall
or
vice versa ?
6
can predict_proba be used?

from pyarc import TransactionDB

from pyarc.algorithms import (

top_rules,

createCARs,

M1Algorithm,

M2Algorithm,

generateCARs

)

import pandas as pd

import numpy as np

data_train = pd.read_csv("iris.csv")

data_test = pd.read_csv("iris.csv")

txns_train = TransactionDB.from_DataFrame(data_train)

txns_test = TransactionDB.from_DataFrame(data_test)

get the best association rules

rules = top_rules(txns_train.string_representation)

convert them to class association rules

#cars = createCARs(rules)

cars = generateCARs(txns_train , maxlen= 3, support= 0.1 , confidence = 0.2 )

classifier = M1Algorithm(cars, txns_train).build()

classifier = M2Algorithm(cars, txns_train).build()

accuracy = classifier.test_transactions(txns_test)

predicted_txns_train = classifier.predict_all(txns_train)

data_train['class'].values

from sklearn.metrics import confusion_matrix

print(confusion_matrix(predicted_txns_train,data_train['class'].values ))

for each_rule in classifier.rules :

print(each_rule)

q=0

So how to find for each row Which elementary rule matches this row

Is there possibility to get which rule, from discovered rule set, is responsible for classifying given one particular data sample
So how to find for each row Which elementary rule matches this row
It can be very useful to add this functionality

can you share more about algorithms used : for example about M1Algorithm, M2Algorithm

can you share more about algorithms used

for example about M1Algorithm, M2Algorithm
how to find description about them in
Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp 80-86.

KLIEGR, Tomas. Quantitative CBA: Small and Comprehensible Association Rule Classification Models. arXiv preprint arXiv:1711.10166, 2017.

per your code
pred = m1clf.predict_all(txns_test)

predM2 = m2clf.predict_all(txns_test)

from
https://github.com/jirifilip/pyARC/blob/master/notebooks/extensions/benchmarks/qcba_accuracy_benchmark.ipynb

MacOS support?

Looks like fim doesn't have Darwin support? My installation for fim fails and I'm seeing -arch arm64 -arch x86_64 in the logs. Does pyARC plan to support MacOS?

how to get probability of classification reliability

how to get probability of classification reliability
like
predict_proba(self, X)
in
LogisticRegression
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Treatment of null values

If the dataset contains null values, the output contains rules which contain attributes referring to empty values.

may you clarify about support

may you clarify about support
for example
header1 = ["A", "B", "C" , "Y"]
rows1 = [
[1, 1, 0, 0],
[1, 1, 0, 0],
[1, 1, 0, 1],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 1]
]

transactions = TransactionDB(rows1, header1)

cba = CBA()

cba.fit(transactions)

probs1 = cba.clf.predict_probability_all(transactions)
probs2 = cba.predict_probability(transactions)

cba.clf.rules
cba.clf.default_class
cba.clf.default_class_attribute
cba.clf.default_class_support
cba.clf.default_class_confidence
cba.predict_matched_rules(transactions)

q=0
then
cba.clf.rules
[CAR {A=0,B=1} => {Y=1} sup: 0.14 conf: 1.00 len: 3, id: 15, CAR {A=0} => {Y=1} sup: 0.43 conf: 0.75 len: 2, id: 19]
4/7
0.5714285714285714
1/7
0.14285714285714285

len(rows1)
7
cba.clf.default_class_support
0.42857142857142855
cba.clf.default_class
'0'
3/7
0.42857142857142855
so instead of
sup: 0.43
would be expected
4/7
0.5714285714285714

the same story
header1 = ["A", "B", "Y"]
rows1 = [
[1, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 0, 0],
[0, 0, 1],
[0, 0, 1]
]

transactions = TransactionDB(rows1, header1)

cba = CBA()

cba.fit(transactions)

cba.clf.rules
cba.clf.default_class
cba.clf.default_class_attribute
cba.clf.default_class_support
cba.clf.default_class_confidence
cba.predict_matched_rules(transactions)

q=0
cba.clf.rules
[CAR {A=0} => {Y=1} sup: 0.33 conf: 0.67 len: 2, id: 1]
cba.clf.default_class_support
0.5

is it should be cba.clf.default_class_support equal sup: 0.33
where
sup: 0.33 is for rule {A=0} => {Y=1}

switch default mining to standard support

The FIM package by default uses different definition of support than common in association rule learning:http://www.borgelt.net/doc/apriori/apriori.html#supprule.

The "standard" support needs to be explicitly turned on using mode="o".
The suggestion is to replace:

rules = fim.apriori(transactionDB.string_representation, supp=support, conf=confidence, target="r", report="sc", appear=appear, **kwargs, zmax=maxlen)

with

rules = fim.apriori(transactionDB.string_representation, supp=support, conf=confidence, mode=o, target="r", report="sc", appear=appear, **kwargs, zmax=maxlen)

theoretical issue

Default Class ID results in NaN

Hi,

it turned out to me, that the inspect method within the classifier class assigns the NaN value to the default class. This seems a bit inconsistent to me, since the predict_matched_rules_all method returns the max id for this class.

My question is, if you could assign a number to the default class in the results of the inspect method?

Thanks and good work;-)
Lars

if something can be done for unbalanced data?

if something can be done for unbalanced data?
like this

import unittest
import pandas as pd
from pyarc import CBA
from pyarc.data_structures import (
TransactionDB
)
import os

header1 = ['F1' ,'F2','F3','F4' ,'F5','Target']

rows1 = [
[1, 1, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 1],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 1],
[1, 1, 1, 0, 0, 0],
[1, 1, 0, 1, 0, 1],
[1, 0, 0, 1, 1, 0],
[1, 0, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 0, 1, 1, 0, 0],
[1, 0, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
]

transactions = TransactionDB(rows1, header1)

cba = CBA()

cba.fit(transactions)
print(cba.predict(transactions) )
[print(x) for x in cba.clf.rules]
cba.clf.rules
cba.clf.default_class
cba.clf.default_class_attribute
cba.clf.default_class_support
print('default_class_confidence' , cba.clf.default_class_confidence)
print('\n *** predict_matched_rules ***')
[print(x) for x in cba.predict_matched_rules(transactions) ]

q=0

so output is: all detected as 0s
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 40
default_class_confidence 0.7368421052631579

*** predict_matched_rules ***
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 105
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 106
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 107
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 108
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 109
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 110
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 111
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 112
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 113
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 40
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 114
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 115
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 116
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 117
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 40
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 40
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 40
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 118
CAR {} => {Target=0} sup: 0.74 conf: 0.74 len: 1, id: 119

what can be done for unbalanced data?

what can be done for unbalanced data?
let say 90% of labels is no and 10% of data is yes

default class confidence is less than set by classier parameter confidence

default class confidence is less than set by classier parameter confidence
what can be done to get default class predictions to be reliable ?

in example below default_class_confidence is 0.26 , but was used cba = CBA(confidence= 0.7)

import unittest
import pandas as pd
from pyarc import CBA
from pyarc.data_structures import (
TransactionDB
)
import os

header1 = ['F1' ,'F2','F3','F4' ,'F5','Target']

rows1 = [
[1, 1, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 1],
[1, 1, 1, 0, 0, 0],
[0, 1, 1, 0, 0, 1],
[1, 1, 1, 0, 0, 0],
[1, 1, 0, 1, 0, 1],
[1, 0, 0, 1, 1, 0],
[1, 0, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 0],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 1, 0, 0],
[1, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
]

transactions = TransactionDB(rows1, header1)

cba = CBA(confidence= 0.7)

print('\n predict_probability')
print([int(x * 100) for x in cba.predict_probability(transactions) ])
q=0

then output
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0']
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 36
CAR {F5=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 39
CAR {F3=0,F1=0} => {Target=0} sup: 0.16 conf: 1.00 len: 3, id: 20
CAR {F5=1,F3=0} => {Target=0} sup: 0.11 conf: 1.00 len: 3, id: 5
CAR {F5=1,F2=1} => {Target=0} sup: 0.11 conf: 1.00 len: 3, id: 6
CAR {F1=0} => {Target=0} sup: 0.26 conf: 0.83 len: 2, id: 24
CAR {F3=0} => {Target=0} sup: 0.47 conf: 0.82 len: 2, id: 60
CAR {F1=1,F5=0,F3=1} => {Target=0} sup: 0.16 conf: 0.75 len: 4, id: 41
default_class_confidence 0.2631578947368421 !!!!!!!!!!!!!!!

*** predict_matched_rules ***
CAR {F3=0} => {Target=0} sup: 0.47 conf: 0.82 len: 2, id: 60
CAR {F3=0} => {Target=0} sup: 0.47 conf: 0.82 len: 2, id: 60
CAR {F3=0} => {Target=0} sup: 0.47 conf: 0.82 len: 2, id: 60
CAR {F1=0} => {Target=0} sup: 0.26 conf: 0.83 len: 2, id: 24
CAR {F1=1,F5=0,F3=1} => {Target=0} sup: 0.16 conf: 0.75 len: 4, id: 41
CAR {F1=1,F5=0,F3=1} => {Target=0} sup: 0.16 conf: 0.75 len: 4, id: 41
CAR {F1=0} => {Target=0} sup: 0.26 conf: 0.83 len: 2, id: 24
CAR {F1=1,F5=0,F3=1} => {Target=0} sup: 0.16 conf: 0.75 len: 4, id: 41
CAR {F3=0} => {Target=0} sup: 0.47 conf: 0.82 len: 2, id: 60
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 36
CAR {} => {Target=1} sup: 0.26 conf: 0.26 len: 1, id: 64
CAR {F5=1,F3=0} => {Target=0} sup: 0.11 conf: 1.00 len: 3, id: 5
CAR {F5=1,F2=1} => {Target=0} sup: 0.11 conf: 1.00 len: 3, id: 6
CAR {F5=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 39
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 36
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 36
CAR {F3=0,F2=0} => {Target=0} sup: 0.21 conf: 1.00 len: 3, id: 36
CAR {F3=0,F1=0} => {Target=0} sup: 0.16 conf: 1.00 len: 3, id: 20
CAR {F3=0,F1=0} => {Target=0} sup: 0.16 conf: 1.00 len: 3, id: 20

predict_probability
[81, 81, 81, 83, 75, 75, 83, 75, 81, 100, 26, 100, 100, 100, 100, 100, 100, 100, 100]
^
???!!!!

pyFIM dependency

When package is installed via PIP and run, there is an error referring to fim module not found.

Since PyFIM cannot be installed via pip: https://pyfim.readthedocs.io/en/latest/source/install.html, it would make sense to explain the user how to install the missing dependency manually.

As a side note, the installation complaints about missing LICENSE file.

after installation pyarc-1.0.25 : builtins.ModuleNotFoundError: No module named 'pyarc.cba'

I tested on several computers but new version 25 can not be run?

S_mar20_test_pyarc.py", line 1, in
from pyarc import CBA, TransactionDB
File "C:\Users\User\Anaconda3\Lib\site-packages\pyarc_init_.py", line 10, in
from .cba import CBA

builtins.ModuleNotFoundError: No module named 'pyarc.cba'

when installation done correctly
D:>pip install pyarc
Collecting pyarc
Downloading https://files.pythonhosted.org/packages/7d/15/e13185f3d2b2f1b14872e1bb3f3c0072ba9b0344a10336fb9a34ba36e513/pyarc-1.0.25-py2.py3-none-any.whl
Requirement already satisfied: pandas in c:\users\user\anaconda3\lib\site-packages (from pyarc) (0.24.2)
Requirement already satisfied: numpy in c:\users\user\anaconda3\lib\site-packages (from pyarc) (1.16.2)
Collecting sklearn (from pyarc)
Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Requirement already satisfied: python-dateutil>=2.5.0 in c:\users\user\anaconda3\lib\site-packages (from pandas->pyarc) (2.8.0)
Requirement already satisfied: pytz>=2011k in c:\users\user\anaconda3\lib\site-packages (from pandas->pyarc) (2018.9)
Requirement already satisfied: scikit-learn in c:\users\user\anaconda3\lib\site-packages (from sklearn->pyarc) (0.20.3)
Requirement already satisfied: six>=1.5 in c:\users\user\anaconda3\lib\site-packages (from python-dateutil>=2.5.0->pandas->pyarc) (1.12.0)
Requirement already satisfied: scipy>=0.13.3 in c:\users\user\anaconda3\lib\site-packages (from scikit-learn->sklearn->pyarc) (1.2.1)
Building wheels for collected packages: sklearn
Building wheel for sklearn (setup.py) ... done
Stored in directory: C:\Users\User\AppData\Local\pip\Cache\wheels\76\03\bb\589d421d27431bcd2c6da284d5f2286c8e3b2ea3cf1594c074
Successfully built sklearn
Installing collected packages: sklearn, pyarc
Successfully installed pyarc-1.0.25 sklearn-0.0

can you share some simple example how to user .predict

great code thanks
can you share some simple example how to user .predict
there is some in
https://github.com/jirifilip/pyARC/blob/master/notebooks/extensions/benchmarks/qcba_accuracy_benchmark.ipynb

only seems to be it runs from specific folder
directory = "c:/code/python/machine_learning/assoc_rules"

pred = m1clf.predict_all(txns_test)

predM2 = m2clf.predict_all(txns_test)

is it possible to set maximum number of rules like in pycorels?

is it possible to set maximum number of rules like in
https://github.com/fingoldin/pycorels
C = CorelsClassifier(max_card=2, c=0.0, verbosity=["loud", "samples"])

where
max_card=2
sets max number of rules to 2

in any case it would be very helpful some examples for very beginners like in first page of
https://github.com/fingoldin/pycorels

with predict and rules printing and very basic parameters use demo

thanks a lot in advance

will it work on windows 10? and vs pycorels?

it may be problem to install fim
by the way did you compared this with corels
https://github.com/fingoldin/pycorels

Discretization

As the original 'QCBA' paper says that discretization of dataset is needed for Quantitative attributes, and 'iris.csv' dataset which is used in this repository for demonstration purpose is also discretized (inf_to_val).

Does this repository contains code for discretization of dataset because I was unable to find the same?

Default example returns empty rule list

The default example on the project home page returns an empty rule list. The second advanced example works.

from pyarc import CBA, TransactionDB
import pandas as pd

data_train = pd.read_csv("iris.csv")
data_test = pd.read_csv("iris.csv")

txns_train = TransactionDB.from_DataFrame(data_train)
txns_test = TransactionDB.from_DataFrame(data_test)


cba = CBA(support=0.20, confidence=0.5, algorithm="m1")
cba.fit(txns_train)

accuracy = cba.rule_model_accuracy(txns_test)

Folders within qcba missing

Hi,

I would like to try the qcba optimization but I am afraid that the subfolders within the qcba folder like data_structures are not being installed via pip. Or do I miss something?

Best,
Lars

if there is way to make sure which data used as target

it can be dangerous to train classier on wrong column of train data
then
if there is way to make sure which data used as target
for example to add attribute to classifier
like

dir(cba)
['class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', 'algorithm', 'available_algorithms', 'clf', 'confidence', 'fit', 'maxlen', 'predict', 'predict_probability', 'rule_model_accuracy', 'support']

jirifilip / pyarc Goto Github PK

pyarc's Introduction

pyARC

Installation

Testing

Examples

pyarc's People

Contributors

Stargazers

Watchers

Forkers

pyarc's Issues

get the best association rules

convert them to class association rules

classifier = M2Algorithm(cars, txns_train).build()

Recommend Projects

Recommend Topics

Recommend Org

Jobs