GithubHelp home page GithubHelp logo

udayrage / pami Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 128.06 MB

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Home Page: https://udaylab.github.io/PAMI/

License: GNU General Public License v3.0

Python 10.88% Shell 0.01% Makefile 0.01% HTML 42.15% CSS 0.07% JavaScript 0.11% Batchfile 0.01% C++ 0.04% Jupyter Notebook 46.74%

pami's People

Contributors

avvari1830s avatar kundai-kwangwari avatar likhitha-palla avatar nakamura204 avatar pallamadhavi avatar pradeepppc avatar raashika214 avatar raviua138 avatar saichitrab avatar saideepchennupati avatar shiridikumar avatar suzuki-zudai avatar tarun-sreepada avatar udayrage avatar vanithakattumuri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pami's Issues

Bug in RSFP-growth (PAMI.relativeFrequentPattern.basic)


TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 obj = alg.RSFPGrowth(iFile=inputFile, minSup=minimumSupportCount, minRatio=minRatioEx,sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: init() got an unexpected keyword argument 'minSup'

Unmentioned Constraints for denseDF2DB Class

In this line, it is expected to have a column named "tid." However, the documentation does not mention anything about it, does it? The documentation states: inputDataFrame - the dataframe that needs to be converted into a database.

https://github.com/udayRage/PAMI/blob/681a7e66f1ce14a50b40278935d91e87aba676d2/PAMI/extras/DF2DB/denseDF2DB.py#L39

Furthermore, in the following line, the items are taken from the first column. Is this because it assumes that column index 0 is the timestamp? If I manually remove the timestamp in the dataframe, I will be missing one column.

https://github.com/udayRage/PAMI/blob/681a7e66f1ce14a50b40278935d91e87aba676d2/PAMI/extras/DF2DB/denseDF2DB.py#L40

Bug in parallelECLAT


TypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 obj = alg.parallelECLAT(iFile=inputFile, minSup=minimumSupportCount,numWorkers=mumberWorkersCount, sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: Can't instantiate abstract class parallelECLAT with abstract methods printResults, save

Bug in printing temporalDatabaseStats.

The below program prints an item's minimum, average, and maximum periodicity in the database as 1, 1, and 1, respectively.

URL of the notebook: https://colab.research.google.com/github/UdayLab/PAMI/blob/main/notebooks/parallelFPGrowth.ipynb


#import the class file
import PAMI.extras.dbStats.temporalDatabaseStats as stats

#specify the file name
inputFile = 'Temporal_T10I4D100K.csv'

#initialize the class
obj=stats.temporalDatabaseStats(inputFile,sep='\t')

#execute the class
obj.run()

#Printing each of the database statistics
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
print(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj. getVarianceTransactionLength()}')
print(f'Minimum period : {obj.getMinimumPeriod()}')
print(f'Average period : {obj.getAveragePeriod()}')
print(f'Maximum period : {obj.getMaximumPeriod()}')

itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
numberOfTransactionPerTimeStamp = obj.getNumberOfTransactionsPerTimestamp()
obj.save(itemFrequencies,'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(numberOfTransactionPerTimeStamp, 'numberOfTransaction.csv')

Bug in PAMI.extras.graph.visualizePatterns.py

from PAMI.extras.graph import visualizePatterns as fig

obj = fig.visualizePatterns('soramame_frequentPatterns.txt',10)
obj.visualize()


ValueError Traceback (most recent call last)
in <cell line: 4>()
2
3 obj = fig.visualizePatterns('soramame_frequentPatterns.txt',10)
----> 4 obj.visualize()

/usr/local/lib/python3.10/dist-packages/PAMI/extras/graph/visualizePatterns.py in visualize(self)
62 temp = points[i].split()
63 if i % 2 == 0:
---> 64 lat.append(float(temp[0]))
65 name.append(freq)
66 color.append("#" + RHex + GHex + BHex)

ValueError: could not convert string to float: 'oint(130.7998865'

PAMI/extras/DF2DB/denseDF2DB.py

Dear Sir,
I am encountering an indentation error when importing this code. Specifically, there is an indentation issue in the block of code for createTransactional, after the else statement. I kindly request your assistance in resolving this matter promptly.
Thank you for your attention to this matter.
Sincerely,
Ashutosh Kumar

def createTransactional(self, outputFile):
"""
:Description: Create transactional data base

     :param outputFile: str :
          Write transactional data base into outputFile

    """

    self.outputFile = outputFile
    with open(outputFile, 'w') as f:
         if self.condition not in condition_operator:
            print('Condition error')
         else:
            for tid in self.tids:
                transaction = [item for item in self.items if condition_operator[self.condition](self.inputDF.at[tid, item], self.thresholdValue)]
                if len(transaction) > 1:
                    f.write(f'{transaction[0]}')
                    for item in transaction[1:]:
                        f.write(f'\t{item}')
                elif len(transaction) == 1:
                    f.write(f'{transaction[0]}')
                else:
                    continue
                f.write('\n')

Bug in maxFPgrowth algorithm


TypeError Traceback (most recent call last)
Cell In[8], line 2
1 obj = alg.MaxFPGrowth(iFile=inputFile, minSup=minimumSupportCount, sep=seperator) #initialize
----> 2 obj.startMine() #Start the mining process

File ~/Library/CloudStorage/Dropbox/Github/PAMI_new/PAMI/frequentPattern/maximal/MaxFPGrowth.py:661, in MaxFPGrowth.startMine(self)
659 self._finalPatterns = {}
660 self._maximalTree = _MPTree()
--> 661 Tree = self._buildTree(updatedTransactions, info, self._maximalTree)
662 Tree.generatePatterns([], patterns)
663 for x, y in patterns.items():

TypeError: _buildTree() takes 2 positional arguments but 3 were given

Unable to run the fuzzy periodic frequent pattern (FPFP) algorithm

Hi, thank you for developing such a wonderful open-source library for Pattern Mining.
I am using the FPFP algorithm and face some problems:

  • The data format from the doc (https://udayrage.github.io/PAMI/fuzzyPeriodicFrequentPatternMining.html) does not work
    Particularly, for example, each row (transaction) from the website only has 1 colon(:) for separating between item and fuzzy value. However, with this format, the algorithm return an error (which inferred as an additional colon (:) is needed)
    ....
  • I have also read your paper (*) and fuzzying values from transactional database as written but it seems not right to your implemented algorithm (as I inspect to the code).
  • I have also visit your website to search for example of fuzzy database (https://u-aizu.ac.jp/~udayrage/datasets.html). However, nothing helps.

Can you please provide me the correct format of the data for this FPFP algorithm as well as explaination for how to create that format with a simple example?

Thanks in advance

(*) Kiran, R. Uday, et al. "Discovering fuzzy periodic-frequent patterns in quantitative temporal databases." 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2020.

Bug in parallelApriori


TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 obj = alg.parallelApriori(iFile=inputFile, minSup=minimumSupportCount,numWorkers=mumberWorkersCount, sep=seperator) #initialize
2 obj.startMine() #Start the mining process

TypeError: Can't instantiate abstract class parallelApriori with abstract methods printResults, save

Bug in createTemporal(),

When we convert a data frame into a temporal database, the first column, i.e., timestamp in the constructed database contains 0.

However, we all know that the timestamp of the first transaction will always be greater than or equal 1. (Never cannot be zero).

So we have to change the code so that the temporal database contains timestamp starting from 1 (and from 0).

Check Step 6 in https://github.com/UdayLab/PAMI/blob/main/notebooks/periodicFrequentPatternMiningPollutionDemo.ipynb

           !head -5 temporalDatabasePM25HeavyPollution.csv

Coverage patterns, need to check the following code

if name=="main":
_ap = str()
if len(_ab._sys.argv) == 7 or len(_ab._sys.argv) == 6:
if len(_ab._sys.argv) == 7:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5], _ab._sys.argv[6])
if len(_ab._sys.argv) == 6:
_ap = CMine(_ab._sys.argv[1], _ab._sys.argv[3], _ab._sys.argv[4], _ab._sys.argv[5])
_ap.startMine()
print("Total number of coverage Patterns:", len(_ap.getPatterns()))
_ap.save(_ab._sys.argv[2])
print("Total Memory in USS:", _ap.getMemoryUSS())
print("Total Memory in RSS", _ap.getMemoryRSS())
print("Total ExecutionTime in ms:", _ap.getRuntime())
else:
print("Error! The number of input parameters do not match the total number of parameters provided")

Need to combine the algorithms in frequentSpatialPattern and geoReferencedFrequentPattern

In frequentSpatialPattern sub-package, we have basic folder and algorithms, e.g., FSP-growth.

In geoReferencedFrequentPattern sub-package, we have one algorithm GFP-growth.

  1. We need to check whether FSP-growth and GFP-growth algorithms are different algorithms are same?

  2. We have to remove frequentSpatialPattern sub-package and move the algorithms to geoReferencedFrequentPattern

Questions on how to use it

Hello, I am a researcher that recently encountered a problem which requires me to use sequence pattern mining algorithm, so I found this package which is perfect. However, I still have some issues using it because there is too little information and documentation on this project, I don't know how to do the visualization and how to switch algorithms. It would be great if there is more manual, tutorial, etc.

Bug in generating statistics of the temporal database

from PAMI.extras.dbStats import temporalDatabaseStats as tempDS
obj = tempDS.temporalDatabaseStats('temporalDatabasePM25HeavyPollution.csv',sep=',')
obj.run()
obj.printStats()
obj.plotGraphs(). <--- error


TypeError Traceback (most recent call last)
in <cell line: 5>()
3 obj.run()
4 obj.printStats()
----> 5 obj.plotGraphs()

1 frames
/usr/local/lib/python3.10/dist-packages/PAMI/extras/graph/plotLineGraphFromDictionary.py in init(self, data, end, start, title, xlabel, ylabel)
31 """
32 end = int(len(data) * end / 100)
---> 33 start = int(len(data) * start / 100)
34 x = tuple(data.keys())[start:end]
35 y = tuple(data.values())[start:end]

TypeError: unsupported operand type(s) for /: 'str' and 'int'

Error on converting a sparse dataframe into a transactional database

When trying to convert a sparse dataframe into a transactional database, through the code provided on link the following error appears : " AttributeError: module 'PAMI.extras.DF2DB.sparseDF2DB' has no attribute 'sparse2DB'. "

Firstly, I simply change the word sparse2DB to sparseDF2DB, but then a different error appears " ValueError: DataFrame constructor not properly called! "
My dataframe was already imported into the Jupyter notebook when I called it to the function, however, I also tried to save it and export it as an excel file and import it directly on the function, however, nothing worked and the error persisted.

Can you please help?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.