GithubHelp home page GithubHelp logo

Comments (5)

beniz avatar beniz commented on September 26, 2024

@vasants thanks.

BOW is build-in the 'txt' connector, a tutorial for training from text is available here: http://www.deepdetect.com/tutorials/txt-training/

W2V is not yet built-in but can be used as well, though a bit less easily. Here is an application to real data: https://github.com/beniz/quick_cdiscount

In practice my experience is that W2V accuracy is often below that of BOW, and this is corroborated by http://arxiv.org/abs/1509.01626.

Since W2V is however very useful in some settings and typically when the dimensionality of BOW is too high to be optimized easily, it is my plan to include it into the text connector at some point.
Let me know if built-in W2V is a feature of interest.

from deepdetect.

vasants avatar vasants commented on September 26, 2024

@beniz great!

Yep! I would be interested in W2V (atleast for comparison purposes), but I will check out the less easier way and see if there are any benefits (If I do build W2V out, will send you a pull request).

Regarding caffe layers - I see you have a custom Caffe version running. Do you use any special layers for processing text?
Do you info on accuracies you see with standard datasets (test results for the new20 dataset)?

Just trying to get a feel for the type of convnet implemented and any info regarding that would be helpful.

from deepdetect.

beniz avatar beniz commented on September 26, 2024

No special layers yet. Conv1d at character level is coming up for my own purpose and I ll report somewhere.

Built-in text support is all word based, no ngrams. Tfidf is implemented but results are poor very much due to the lack of rescaling. If you want to test the later, the rescaling code of the CSV connector could be imported (or even better, reused).

Results with BOW are on par with random forests and NB on a range of mid size datasets I grew over the year.

from deepdetect.

beniz avatar beniz commented on September 26, 2024

Also beware that the w2v C++ implementation I've pointed you to is GPL and can't really be linked up as is, unfortunately.

from deepdetect.

vasants avatar vasants commented on September 26, 2024

Yup! Thanks!

from deepdetect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.