GithubHelp home page GithubHelp logo

ksr313 / text-mining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mrpatel95/text-mining

0.0 0.0 0.0 36 KB

Text Mining code using TF-IDF algorithm for finding keywords and Apriori algorithm to produce association rules

Python 100.00%

text-mining's Introduction

Text-Mining

This code can be used to assign keywords to documents and find association rules between words from database of documents. Further, with little modifications one can create a document suggestion system using search keywords.

Getting Started

  • Clone this repository
  • Execute textMining.py
  • You will be asked support and confidence value. Ones you enter those, you'll get the association rules as output.
  • That's pretty much it. Good Job!

Prerequisites

Need to have python 3.6 installed on your machine.

Running the tests

  • The code is written in such a way that when you execute TextMining.py, it will check for the folder named documentDatabase and read all the .txt files in it. Each text file acts as a separate document. Since the input of the code should be database of documents, we have multiple documents in documentDatabase folder.
  • Ones all the documents are read, they are cleaned by removing stop words. A word is further cleaned using stemming. A list of stop words can be found in listOfStopWords.txt
Example of stemming: fill, filled, filling can be interpreted as fill
  • Further, each document is assigned few keywords using tf-idf algorithm. Keywords are written in a file named aprioriInput.txt At last Apriori Algorithm takes on the work. It reads aprioriInput.txt and generate association rules based on Minimum Support and Minimum Confidence
  • Minimum Support: A minimum support is applied to find all frequent itemsets in a database.
  • Minimum Confidence: A minimum confidence is applied to these frequent itemsets in order to form rules.

Built With

Fork the repo and try to come up with some optimized version of the algorithm.

Author

Social

It is crucial to stay social ;)

text-mining's People

Contributors

mrpatel95 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.