GithubHelp home page GithubHelp logo

lda.js's Introduction

lda.js

LDA-Based Topic Modelling in Javascript

Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as input and produces a list of topics. One catch is that you have to tell it how many topics you want. There’s much more to it but since this is not a tutorial post, I will stop here. (If you are interested in how it works, read the references given on the wiki page.)

Output PNG

Here's a Javascript version of LDA, based on my no-longer-functioning earlier work. For testing, I use a subset of the SMS Spam Corpus available here (and thus take no responsibility of the inappropriateness of the text within :) ). Each topic is represented as a word cloud; the larger a word, the more weight it has in the topic. The source sentences are displayed again with a bar which shows the percentage distribution of topics for that sentence. Hovering on each area in the bar would show you the words in the topic. You can of course replace it with any other text, change the number of topics using the slider, and press the 'Analyse' button to see it work.

lda.js's People

Contributors

awaisathar avatar

Stargazers

Simeon Dimitrov avatar  avatar Christopher Belanger avatar Elias avatar Jonah H. Harris avatar Jared Wright avatar Gurumurthi V Ramanan avatar Victoria McKinney avatar Nathan Wright avatar Justin avatar Michael Le avatar  avatar Vishal Bheda avatar Dmitry Paranyushkin avatar Alfredo Serafini avatar Diego avatar Michael Joseph Rosenthal avatar Paul Gowder avatar winterren avatar Chris Russell avatar Michael Jett avatar Peihua Chen avatar Jon Demelo avatar Saul Maddox avatar SHUKE avatar Tony avatar  avatar  avatar JSS avatar Rhema Linder avatar Petri Kola avatar Harry Moreno avatar Angus H. avatar RYeah Sh avatar Cui avatar George Chan avatar Xav Laumonier avatar  avatar  avatar Hamza Harkous avatar Arman Didandeh avatar Tom Clarkson avatar vignesh anand avatar Namyun Kim avatar Ivan Savov avatar

Watchers

Samuel Marks avatar James Cloos avatar  avatar  avatar Victoria McKinney avatar

lda.js's Issues

An out of the ordinary request

Hi,

I would like to make an unusual request.
I started with playing with the LDA on top of Wikipedia articles (or actually excerpts form Wikipedia articles).

I find that on the one hand, results are very sensitive to the choice of parameters (# topics, # top terms per topic,# iterations,# of documents and size of each document) and on the other hand, each run takes a very long time, so self-learning is very slow.
I would very much like to have some guidelines/logic/intuition/huristics to the choice of parameters.

What do you think a good medium would be to do that?
Thanks
Ilan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.