GithubHelp home page GithubHelp logo

experimental-lda's Introduction

Distributed Machine Learning Common Codebase

Build Status Documentation Status GitHub license

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel Join the chat at https://gitter.im/dmlc/dmlc-core

What's New

Contents

Known Issues

  • RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

  • DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
  • Try to introduce minimum dependency when possible

CheckList before submit code

  • Type make lint and fix all the style problems.
  • Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

experimental-lda's People

Contributors

manzilzaheer avatar mli avatar tankle avatar zigmoidzed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

experimental-lda's Issues

make issue

make gives me this, what could be the cause & solution for this?

bnosac@bnosac-workstation:~/Desktop/experimental-lda$ make
Setting up directories
make[1]: Entering directory `/home/bnosac/Desktop/experimental-lda/src/commons'
Nothing here -- just common files
make[1]: Leaving directory `/home/bnosac/Desktop/experimental-lda/src/commons'
make[1]: Entering directory `/home/bnosac/Desktop/experimental-lda/src/datagen'
make[1]: Nothing to be done for `gcc'.
make[1]: Leaving directory `/home/bnosac/Desktop/experimental-lda/src/datagen'
make[1]: Entering directory `/home/bnosac/Desktop/experimental-lda/src/ngram'
g++ -DNDEBUG -I"../commons" -O3 -ffloat-store -pthread -std=c++14 -c /home/bnosac/Desktop/experimental-lda/src/ngram/model.cpp -o /home/bnosac/Desktop/experimental-lda/build/ngram/model.o
In file included from /home/bnosac/Desktop/experimental-lda/src/ngram/model.h:12:0,
                 from /home/bnosac/Desktop/experimental-lda/src/ngram/model.cpp:1:
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h: In member function ‘double xorshift128plus::rand_norm()’:
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:56:48: error: ‘log’ was not declared in this scope
             double multiplier = sqrt(-2 * log(s) / s);
                                                ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:56:53: error: ‘sqrt’ was not declared in this scope
             double multiplier = sqrt(-2 * log(s) / s);
                                                     ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h: In member function ‘double xorshift128plus::rand_norm(double, double)’:
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:75:48: error: ‘log’ was not declared in this scope
             double multiplier = sqrt(-2 * log(s) / s);
                                                ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:75:53: error: ‘sqrt’ was not declared in this scope
             double multiplier = sqrt(-2 * log(s) / s);
                                                     ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h: In member function ‘double xorshift128plus::rand_gamma(double)’:
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:92:40: error: ‘log’ was not declared in this scope
             result = -log(rand_double());
                                        ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:100:43: error: ‘pow’ was not declared in this scope
                 xx = pow(rand_double(), cc);
                                           ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:105:40: error: ‘log’ was not declared in this scope
             result = -log(rand_double()) * xx / yy;
                                        ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:130:29: error: ‘sqrt’ was not declared in this scope
             cc=1./sqrt(9.*bb);
                             ^
/home/bnosac/Desktop/experimental-lda/src/ngram/../commons/my_rand.h:139:66: error: ‘log’ was not declared in this scope
                     if( (uu<=1.-.0331*(xx*xx)*(xx*xx)) || (log(uu)<=0.5*xx*xx+bb*(1.-vv+log(vv))) )
                                                                  ^
make[1]: *** [/home/bnosac/Desktop/experimental-lda/build/ngram/model.o] Error 1
make[1]: Leaving directory `/home/bnosac/Desktop/experimental-lda/src/ngram'
make: *** [/home/bnosac/Desktop/experimental-lda/src/ngram] Error 2

Threading unsafe updater?

It seems that updater function in parallelLDA is threading unsafe. The task of updating words-topic distribution is split by word label, which makes the same words not be operated in the same time. But the updating function also contains topic array updating. Could it make the same topic operated by different thread in the same time?

The updating queue is pushed in model::sampling with the following code:

cbuff[nst*(w%ntt)+i].push(delta(w,old_topic, topic));

And then the updating function:

virtual int updater(int i)                  // updating sufficient statistics, can be outsourced to children
{
    do
    {
        for (int tn = 0; tn<nst; ++tn)
        {
            if (!(cbuff[i*nst + tn].empty()))
            {
                delta temp = cbuff[i*nst + tn].front();
                cbuff[i*nst + tn].pop();
                n_wk[temp.word][temp.old_topic] -= 1;
                n_wk[temp.word][temp.new_topic] += 1;
                n_k[temp.old_topic] -= 1;
                n_k[temp.new_topic] += 1;
                //n_wk[temp.word][temp.old_topic].fetch_add(-1);
                //n_wk[temp.word][temp.new_topic].fetch_add(+1);
                //n_k[temp.old_topic].fetch_add(-1);
                //n_k[temp.new_topic].fetch_add(+1);
            }
        }
    } while (!done[i]);

    return 0;
}       

Both word and topic related array are operated in the same thread.

Two small bugs in the code

I have found two very small bugs, not influencing the performance or the accuracy. They are in the singleLDA\model.cpp file:

(1). sorting list of pairs based on the second value of a pair in model::save_model_twords().

image

should be changed to:
image

I tested on my own code, and it is working now to save the top n words for each topic

(2). A typo in the model:~model():

image

The code is super cool and fast, and well designed and documented in details so easy to use. Thanks for sharing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.