GithubHelp home page GithubHelp logo

kejunxiao / sparse_btm Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 1.0 26 KB

a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-sampler.

Makefile 1.43% C++ 96.53% C 2.04%
btm short-text-clustering topic-model biterm-topic-model

sparse_btm's Introduction

sparse_btm

a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-gibbs-sampler.

features:

  • being suitable to model for user-click-sequenece(Rcommandation System) or short-text(NLP), because it assume that adjacent N-items belong to a topic;
  • using sparse-gibbs-sampler, 10x faster than origin implementation;

arguments:


Biterm Topic Model (Sparse-Sampler)


Parameters:

  • -input
    path of docs file, lines of file look like "word1 word2 word3 ... \n"
  • -output
    dir of model(topic_biterm_sum, topic_word) file
  • -num_topics
    number of topics
  • -alpha
    symmetric doc-topic prior probability, default is 0.05
  • -beta
    symmetric topic-word prior probability, default is 0.01
  • -window_size
    window size for biterms, default is 2
  • -num_iters
    number of iteration, default is 20
  • -save_step
    save model every save_step iteration, default is -1 (no save)

usage:

./sparse_btm -input short_text.txt -output model_out/ -num_topics 100 -window_size 3 -num-iters 20 -save_step 10

sparse_btm's People

Contributors

kejunxiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

captaintsao

sparse_btm's Issues

Difference from BTM

I’m really excited for this repo because it has the potential to process TBs of text data.

However, I’m interested in the difference between sparse-btm and the original btm.

I’m not sure what the sparse sampler means? And what’s the PMI or coherence difference between the two models?

Any disadvantages of sparse btm?

Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.