udayrage / pami Goto Github PK

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Home Page: https://udaylab.github.io/PAMI/

License: GNU General Public License v3.0

Python 10.88% Shell 0.01% Makefile 0.01% HTML 42.15% CSS 0.07% JavaScript 0.11% Batchfile 0.01% C++ 0.04% Jupyter Notebook 46.74%

pami's People

Contributors

Stargazers

Watchers

pami's Issues

Bug in parallel Periodic-Frequent Pattern-growth algorithm. Periodicity information is not being stored for PFPs

Bug in RSFP-growth (PAMI.relativeFrequentPattern.basic)

TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 obj = alg.RSFPGrowth(iFile=inputFile, minSup=minimumSupportCount, minRatio=minRatioEx,sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: init() got an unexpected keyword argument 'minSup'

Help

Unmentioned Constraints for denseDF2DB Class

In this line, it is expected to have a column named "tid." However, the documentation does not mention anything about it, does it? The documentation states: inputDataFrame - the dataframe that needs to be converted into a database.

https://github.com/udayRage/PAMI/blob/681a7e66f1ce14a50b40278935d91e87aba676d2/PAMI/extras/DF2DB/denseDF2DB.py#L39

Furthermore, in the following line, the items are taken from the first column. Is this because it assumes that column index 0 is the timestamp? If I manually remove the timestamp in the dataframe, I will be missing one column.

https://github.com/udayRage/PAMI/blob/681a7e66f1ce14a50b40278935d91e87aba676d2/PAMI/extras/DF2DB/denseDF2DB.py#L40

Is there any Sequential pattern mining algorithms?

Such as PrefixSpan, FreeSpan, etc..

Bug in parallelECLAT

TypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 obj = alg.parallelECLAT(iFile=inputFile, minSup=minimumSupportCount,numWorkers=mumberWorkersCount, sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: Can't instantiate abstract class parallelECLAT with abstract methods printResults, save

Categorical values and data requirements for algorithms

Thanks for developing this great library! can we use categorical data for the temporal database scenario? looking at the example databases, can we use only numeric data variables for all the algorithms?

Add #frequent-pattern-mining to the Repo

I'm looking at various repos for the purpose of frequent pattern mining. I found this repo in this article, and I think the repo can be added to this topic for more visibility.

High utility sequential pattern mining

Can you update the code for high utility sequential pattern mining related algorithms？ thank you very much

Bug in printing temporalDatabaseStats.

The below program prints an item's minimum, average, and maximum periodicity in the database as 1, 1, and 1, respectively.

URL of the notebook: https://colab.research.google.com/github/UdayLab/PAMI/blob/main/notebooks/parallelFPGrowth.ipynb

#import the class file
import PAMI.extras.dbStats.temporalDatabaseStats as stats

#specify the file name
inputFile = 'Temporal_T10I4D100K.csv'

#initialize the class
obj=stats.temporalDatabaseStats(inputFile,sep='\t')

#execute the class
obj.run()

#Printing each of the database statistics
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
print(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj. getVarianceTransactionLength()}')
print(f'Minimum period : {obj.getMinimumPeriod()}')
print(f'Average period : {obj.getAveragePeriod()}')
print(f'Maximum period : {obj.getMaximumPeriod()}')

itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
numberOfTransactionPerTimeStamp = obj.getNumberOfTransactionsPerTimestamp()
obj.save(itemFrequencies,'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(numberOfTransactionPerTimeStamp, 'numberOfTransaction.csv')

Bug in PAMI.extras.graph.visualizePatterns.py

from PAMI.extras.graph import visualizePatterns as fig

obj = fig.visualizePatterns('soramame_frequentPatterns.txt',10)
obj.visualize()

ValueError Traceback (most recent call last)
in <cell line: 4>()
2
3 obj = fig.visualizePatterns('soramame_frequentPatterns.txt',10)
----> 4 obj.visualize()

/usr/local/lib/python3.10/dist-packages/PAMI/extras/graph/visualizePatterns.py in visualize(self)
62 temp = points[i].split()
63 if i % 2 == 0:
---> 64 lat.append(float(temp[0]))
65 name.append(freq)
66 color.append("#" + RHex + GHex + BHex)

ValueError: could not convert string to float: 'oint(130.7998865'

PAMI/extras/DF2DB/denseDF2DB.py

Dear Sir,
I am encountering an indentation error when importing this code. Specifically, there is an indentation issue in the block of code for createTransactional, after the else statement. I kindly request your assistance in resolving this matter promptly.
Thank you for your attention to this matter.
Sincerely,
Ashutosh Kumar

def createTransactional(self, outputFile):
"""
:Description: Create transactional data base

     :param outputFile: str :
          Write transactional data base into outputFile

    """

    self.outputFile = outputFile
    with open(outputFile, 'w') as f:
         if self.condition not in condition_operator:
            print('Condition error')
         else:
            for tid in self.tids:
                transaction = [item for item in self.items if condition_operator[self.condition](self.inputDF.at[tid, item], self.thresholdValue)]
                if len(transaction) > 1:
                    f.write(f'{transaction[0]}')
                    for item in transaction[1:]:
                        f.write(f'\t{item}')
                elif len(transaction) == 1:
                    f.write(f'{transaction[0]}')
                else:
                    continue
                f.write('\n')

Bug in FTApriori (continous printing of transactions) - fault-tolerant frequent patterns

Check the notebook, FTApriori.ipynb

Bug in maxFPgrowth algorithm

TypeError Traceback (most recent call last)
Cell In[8], line 2
1 obj = alg.MaxFPGrowth(iFile=inputFile, minSup=minimumSupportCount, sep=seperator) #initialize
----> 2 obj.startMine() #Start the mining process

File ~/Library/CloudStorage/Dropbox/Github/PAMI_new/PAMI/frequentPattern/maximal/MaxFPGrowth.py:661, in MaxFPGrowth.startMine(self)
659 self._finalPatterns = {}
660 self._maximalTree = _MPTree()
--> 661 Tree = self._buildTree(updatedTransactions, info, self._maximalTree)
662 Tree.generatePatterns([], patterns)
663 for x, y in patterns.items():

TypeError: _buildTree() takes 2 positional arguments but 3 were given

Unable to run the fuzzy periodic frequent pattern (FPFP) algorithm

Hi, thank you for developing such a wonderful open-source library for Pattern Mining.
I am using the FPFP algorithm and face some problems:

The data format from the doc (https://udayrage.github.io/PAMI/fuzzyPeriodicFrequentPatternMining.html) does not work
Particularly, for example, each row (transaction) from the website only has 1 colon(:) for separating between item and fuzzy value. However, with this format, the algorithm return an error (which inferred as an additional colon (:) is needed)
....
I have also read your paper (*) and fuzzying values from transactional database as written but it seems not right to your implemented algorithm (as I inspect to the code).
I have also visit your website to search for example of fuzzy database (https://u-aizu.ac.jp/~udayrage/datasets.html). However, nothing helps.

Can you please provide me the correct format of the data for this FPFP algorithm as well as explaination for how to create that format with a simple example?

Thanks in advance

(*) Kiran, R. Uday, et al. "Discovering fuzzy periodic-frequent patterns in quantitative temporal databases." 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2020.

Bug in parallelApriori

TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 obj = alg.parallelApriori(iFile=inputFile, minSup=minimumSupportCount,numWorkers=mumberWorkersCount, sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: Can't instantiate abstract class parallelApriori with abstract methods printResults, save

Bug in createTemporal(),

When we convert a data frame into a temporal database, the first column, i.e., timestamp in the constructed database contains 0.

However, we all know that the timestamp of the first transaction will always be greater than or equal 1. (Never cannot be zero).

So we have to change the code so that the temporal database contains timestamp starting from 1 (and from 0).

Check Step 6 in https://github.com/UdayLab/PAMI/blob/main/notebooks/periodicFrequentPatternMiningPollutionDemo.ipynb

           !head -5 temporalDatabasePM25HeavyPollution.csv

Coverage patterns, need to check the following code

if name=="main":
_ap = str()
if len(_ab._sys.argv) == 7 or len(_ab._sys.argv) == 6:
if len(_ab._sys.argv) == 7:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5], _ab._sys.argv[6])
if len(_ab._sys.argv) == 6:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5])
_ap.startMine()
print("Total number of coverage Patterns:", len(_ap.getPatterns()))
_ap.save(_ab._sys.argv[2])
print("Total Memory in USS:", _ap.getMemoryUSS())
print("Total Memory in RSS", _ap.getMemoryRSS())
print("Total ExecutionTime in ms:", _ap.getRuntime())
else:
print("Error! The number of input parameters do not match the total number of parameters provided")

Need to combine the algorithms in frequentSpatialPattern and geoReferencedFrequentPattern

In frequentSpatialPattern sub-package, we have basic folder and algorithms, e.g., FSP-growth.

In geoReferencedFrequentPattern sub-package, we have one algorithm GFP-growth.

We need to check whether FSP-growth and GFP-growth algorithms are different algorithms are same?
We have to remove frequentSpatialPattern sub-package and move the algorithms to geoReferencedFrequentPattern

Questions on how to use it

Hello, I am a researcher that recently encountered a problem which requires me to use sequence pattern mining algorithm, so I found this package which is perfect. However, I still have some issues using it because there is too little information and documentation on this project, I don't know how to do the visualization and how to switch algorithms. It would be great if there is more manual, tutorial, etc.

Bug in generating statistics of the temporal database

from PAMI.extras.dbStats import temporalDatabaseStats as tempDS
obj = tempDS.temporalDatabaseStats('temporalDatabasePM25HeavyPollution.csv',sep=',')
obj.run()
obj.printStats()
obj.plotGraphs(). <--- error

TypeError Traceback (most recent call last)
in <cell line: 5>()
3 obj.run()
4 obj.printStats()
----> 5 obj.plotGraphs()

1 frames
/usr/local/lib/python3.10/dist-packages/PAMI/extras/graph/plotLineGraphFromDictionary.py in init(self, data, end, start, title, xlabel, ylabel)
31 """
32 end = int(len(data) * end / 100)
---> 33 start = int(len(data) * start / 100)
34 x = tuple(data.keys())[start:end]
35 y = tuple(data.values())[start:end]

TypeError: unsupported operand type(s) for /: 'str' and 'int'

Error on converting a sparse dataframe into a transactional database

When trying to convert a sparse dataframe into a transactional database, through the code provided on link the following error appears : " AttributeError: module 'PAMI.extras.DF2DB.sparseDF2DB' has no attribute 'sparse2DB'. "

Firstly, I simply change the word sparse2DB to sparseDF2DB, but then a different error appears " ValueError: DataFrame constructor not properly called! "
My dataframe was already imported into the Jupyter notebook when I called it to the function, however, I also tried to save it and export it as an excel file and import it directly on the function, however, nothing worked and the error persisted.

Can you please help?

Thanks in advance.

udayrage / pami Goto Github PK

pami's People

Contributors

Stargazers

Watchers

Forkers

pami's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs