GithubHelp home page GithubHelp logo

davidnabergoj / rake-nltk Goto Github PK

View Code? Open in Web Editor NEW

This project forked from csurfer/rake-nltk

0.0 0.0 0.0 469 KB

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Home Page: https://csurfer.github.io/rake-nltk

License: MIT License

Python 100.00%

rake-nltk's Introduction

rake-nltk

pypiv pyv Build Status codecov Licence Downloads

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Demo

Features

  • Ridiculously simple interface.
  • Configurable word and sentence tokenizers, language based stop words etc
  • Configurable ranking metric.

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Quick start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()

# Extraction given the text.
r.extract_keywords_from_text(<text to process>)

# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)

# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

  1. Checkout the repository.
  2. Make your changes and add/update relavent tests.
  3. Install poetry using pip install poetry.
  4. Run poetry install to create project's virtual environment.
  5. Run tests using poetry run tox (Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.
  6. Make documentation changes that are relavant.
  7. Install pre-commit using pip install pre-commit and run pre-commit run --all-files to do lint checks.
  8. Generate documentation using poetry run sphinx-build -b html docs/ docs/_build/html.
  9. Generate requirements.txt for automated testing using poetry export --dev --without-hashes -f requirements.txt > requirements.txt.
  10. Commit the changes and raise a pull request.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

Donate

rake-nltk's People

Contributors

csurfer avatar letianfeng avatar cgratie avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.