GithubHelp home page GithubHelp logo

culda_cgs's Introduction

CuLDA_CGS

CuLDA_CGS is GPU solution for CGS-based LDA sampling. It's efficient and is able to achieve 686M tokens/sec. To the best of our knowledge, it's the first LDA solution that support GPUs.

Input Data Preparation

./src_format contantions a program to transform to text corpus to the input format of CuLDA_CGS. The transformed data format is more efficient for subsequent processing and partitioned to multiple chunks to support multi-GPU scaling.

Run Command "make" in the directory and use the following command to transform the data:

./format input output_prefix numChunks[default=1]

The input format of ./format is like:

doc-name1 token1 token2 token3\n
doc-name2 token4 token5 token6\n
...

Tokens are separated by space, documents are separated by line.

Compile and Run CuLDA_CGS

Everything about CuLDA_CGS is in ./src_culda. It does not relies on any 3rd party denpendency. What you need is only a CUDA environment and a CUDA-enabled GPU.

Before you run command "make" in the directory, remember to change CXX_FLAG to your targeted architecture and change CUDA_INSTALL_PATH to your CUDA directory.

Then you can run ./culda for LDA sampling, the usage is:

./culda [options]

Possible options

-g <numer of GPUs> <br />
-k <topic number>: currently only support 1024<br />
-t <number of iterations><br />
-s <number of thread blocks>: it has been deprecated<br />
-a <alpha>: 50/1024 for our tested data sets<br />
-b <beta>: 0.01 for our tested data sets<br />
-c <number of input data chunks>: must be equal with -g, and must be consistency with the specified chunk number in the data prepration stage<br />
-i <input file name prefix>: Same with the output_prefix in the data preparation stage.<br />
-o <output file name prefix>: It's not used now. Rewrite ModelPhi::savePhi and ModelTheta::saveTheta as you need it.<br />

CuLDA_CGS outputs the number of processed token per sec and the loglikelyhood after each iteration.

culda_cgs's People

Contributors

xlxie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

culda_cgs's Issues

User Manual

Hi,

Could you provide the user manual or some explanation about the input arguments and the training dataset format.

Regards,
Ali

request for code update

Hi,
I read your paper and you did a perfect job. Can you update your readme file and give me an instruction on how to test the code? Besides, can you write some annotations in your code? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.