Comments (5)
@vasants thanks.
BOW is build-in the 'txt' connector, a tutorial for training from text is available here: http://www.deepdetect.com/tutorials/txt-training/
W2V is not yet built-in but can be used as well, though a bit less easily. Here is an application to real data: https://github.com/beniz/quick_cdiscount
In practice my experience is that W2V accuracy is often below that of BOW, and this is corroborated by http://arxiv.org/abs/1509.01626.
Since W2V is however very useful in some settings and typically when the dimensionality of BOW is too high to be optimized easily, it is my plan to include it into the text
connector at some point.
Let me know if built-in W2V is a feature of interest.
from deepdetect.
@beniz great!
Yep! I would be interested in W2V (atleast for comparison purposes), but I will check out the less easier way and see if there are any benefits (If I do build W2V out, will send you a pull request).
Regarding caffe layers - I see you have a custom Caffe version running. Do you use any special layers for processing text?
Do you info on accuracies you see with standard datasets (test results for the new20 dataset)?
Just trying to get a feel for the type of convnet implemented and any info regarding that would be helpful.
from deepdetect.
No special layers yet. Conv1d at character level is coming up for my own purpose and I ll report somewhere.
Built-in text support is all word based, no ngrams. Tfidf is implemented but results are poor very much due to the lack of rescaling. If you want to test the later, the rescaling code of the CSV connector could be imported (or even better, reused).
Results with BOW are on par with random forests and NB on a range of mid size datasets I grew over the year.
from deepdetect.
Also beware that the w2v C++ implementation I've pointed you to is GPL and can't really be linked up as is, unfortunately.
from deepdetect.
Yup! Thanks!
from deepdetect.
Related Issues (20)
- Inconsistent predictons using refinedet model HOT 12
- Memory leak on constant /predict requests HOT 8
- Refinedet Tensorrt prediction fails HOT 7
- Memory leak on compressed predict requests with oatpp HOT 7
- Different prediction with tensorrt on refinedet model for the version v0.18.0 HOT 3
- getting error while training, .solverstate HOT 23
- Chain predictions swapped between images HOT 2
- Simsearch query segfault when using IVF indexes, but not default/flat index HOT 6
- On object detect training call, missing either test or train list causes a segfault
- dd_client not find in this path anyone help HOT 2
- How do I do a face recognition using this? HOT 2
- DeepDetect full rewrite in Pure Java
- 'OCR' object has no attribute 'histogram_equalization' HOT 13
- "best: -1" in predict behaves differently in torch models HOT 2
- Torch v1.12 requires libcupti* but nvidia/cuda:11.6.0-cudnn8-runtime-ubuntu20.04 doesn't include it
- Race condition / pthread error when predicting
- I have error build xgboost HOT 1
- Using `true` or `false` instead of `1` or `0` for query params for status or labels returns a internal server error HOT 1
- Question about hosting the docker image HOT 4
- Graphics problem with tsne algorithm HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepdetect.