GithubHelp home page GithubHelp logo

ayoubam / harry_potter_nlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from raffg/harry_potter_nlp

0.0 1.0 0.0 87.4 MB

Harry Potter and the Allocation of Dirichlet

Jupyter Notebook 99.83% Python 0.17%

harry_potter_nlp's Introduction

NLP on the Books of Harry Potter

This repo demonstrates a collection of NLP tasks all using the books of Harry Potter for source documents. Individual tasks can be read about here:

  1. Topic modeling with Latent Dirichlet Allocation
  2. Regular Expression case study
  3. Extractive text summarization
  4. Sentiment analysis

Instructions for BasicNLP class (basic_nlp.py)

Functions of the class are topic modeling with LDA, document summarization, and sentiment analysis.

  1. Initialize the class with a list of documents and an optional list of document titles, for example:
texts = ['this is the first document', 'this is the second document', 'this is the third document']
titles = ['doc1', 'doc2', 'doc3']

nlp = BasicNLP(texts, titles)
  1. LDA:

    1. Create an elbow plot and print the coherence scores by specifying the number of topics to include, with:
      nlp.compute_coherence(start=5, stop=20, step=3)
      
    2. Set the number of topics to use in the model with:
      nlp.set_number_of_topics(10)
      
    3. View the clusters (only available in Jupyter notebook):
      import pyLDAvis
      pyLDAvis.enable_notebook()
      vis = nlp.view_clusters()
      pyLDAvis.display(vis)
      
    4. Get the vocabulary for each topic in the LDA model with (topics can be 'all', a list of integers, or a single integer):
      nlp.get_topic_vocabulary(topics='all', num_words=10)
      
    5. Get the documents most highly associated with the given topics with:
      nlp.get_representative_documents(topics='all', num_docs=1)
      
    6. Get the sentence summaries of the documents most highly associated with the given topics with:
      nlp.get_representative_sentences(topics='all', num_sentences=3)
      
    7. Provide a name for an LDA topic (if preferred over the numbering system) with:
      nlp.name_topic(topic_number=1, topic_name='My topic')
      
  2. Document summarization:

    Get the sentence summaries of the requested documents with:

    nlp.get_document_summaries(documents='all', num_sent=5)
    
  3. Sentiment analysis:

    Get the sentiment scores (compound, positive, neutral, negative) for the requested documents with:

    nlp.get_sentiment(documents='all')
    

harry_potter_nlp's People

Contributors

raffg avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.