Comments (19)
@tianjianjiang All due respect to the author of CRFSuite (did really great job) but it would take a while to get your improvement merged in. Perhaps the best bet for you would be to fork the project and work there. Thanks for your contribution.
from crfsuite.
@usptact , I think he already did. https://github.com/tianjianjiang/crfsuite-openmp
from crfsuite.
In my experiences, CRFsuite and libLBFGS are not OpenMP friendly. Of course there are other ways to have multi-core support, but for OpenMP, it might even require fundamental changes, which is probably an unacceptable cost, in CRFsuite.
from crfsuite.
@kmike @chokkan @ogrisel What do you guys think about it ?
from crfsuite.
I just submitted a pull request #68 for it but with difference loops annotated.
from crfsuite.
@usptact Not a problem at all.
@bratao Thanks for the clarification.
In fact, it's rather a good idea to wait for a while. I've noticed that in different OS with different compilers and on certain data set, the calculation can be inefficient or even hanging (0% CPU time).
from crfsuite.
The pull request #68 has just been updated to improve the performance. It seems finally faster than original version now.
from crfsuite.
@tianjianjiang thanks for the work. Can you add some test scripts for benchmarking the performance. An ipython notebook would be a very good option.
from crfsuite.
Hii, I am new to the field of multi processing and I just want to know how to run CRFsuite using the library openMP as without it, it's extremely slow for big data sets?
Thank you in advance
from crfsuite.
@CSabty If you need speed for learning from very large datasets, please take a look at Wapiti or use Vowpal Wabbit in learning to search mode. I use the latter when I need to train a NER model very quickly.
from crfsuite.
@usptact could you please share what command line you used for ner with Vowpal? I was never able to come with a working command line for taggging.
from crfsuite.
@bratao Sure, here you go:
vw --data train.feat \
--learning_rate 0.5 \
--cache --kill_cache \
--threads \
--passes 10 \
--search_task sequence \
--search $NUM_LABELS \
--search_rollin=policy \
--search_rollout=none \
--named_labels "$(< labels)" \
-b 28 \
--l1=1e-7 \
-f $MODEL \
--readable_model $MODEL.txt
You will need the training file ("train.feat") in multi-line format (see doc) and a file "labels" with string labels that are BIO tags (in my case). If there are only few, you can list the tags as comma-separated list in console.
from crfsuite.
@usptact Thank you so much for your reply, I am working on NER training as well. Do you think Wapiti or Vowpal Wabbit are better in performance (speed wise) than CRF++ ? As I was planning to use CRF++ using multi-core because I feel it has more recourses online and maybe simpler compared to the other ones.
from crfsuite.
@CSabty In my experience, performance-wise, the CRF is still the best although I did not do thorough comparison.
from crfsuite.
You will need the training file ("train.feat") in multi-line format (see doc) and a file "labels" with string labels that are BIO tags (in my case). If there are only few, you can list the tags as comma-separated list in console.
In POS task, can i use the same feature with crfsuite when training by Vowpal Wabbit tool? And features can follows with a " : " and then a float scaling value in crfstuite train dateset, but it seems like the ':' is used to set the feature value rather than feature importance in Vowpal Wabbit.
it's too painful to use Vowpal Wabbit, do you have write some sequence search related blog?
thanks ~~
from crfsuite.
Both in CRFSuite and VW, the ":" character is special. In former you can escape it like this "\:" but in latter you can't. Assuming you don't want to change default weight of 1.0.
from crfsuite.
I wonder if this development of multicore CRF has been dead or not. I am dying for such feature.
from crfsuite.
@jbkoh If you are looking for multi CPU training of CRFs, take a look at https://github.com/zhongkaifu/CRFSharp
from crfsuite.
@usptact @tianjianjiang Thanks for the information! I wish I could have exploited the cores with PyCRFSuite, but I can switch to the pointer. Thank you all.
from crfsuite.
Related Issues (20)
- Exclude sentence with only O HOT 1
- Character n-grams HOT 2
- R wrapper available at https://github.com/bnosac/crfsuite HOT 1
- lib/cqdb/src/cqdb.c and Wstringop-truncation HOT 3
- mersenne twister HOT 1
- meaning of min_freq HOT 3
- Old lookup3.c file, `k8` undeclared
- Deprecate Python SWIG binding and make python-crfsuite the canonical binding?
- Unable to compile a very simple Tagger with the C++ API HOT 2
- Comparison with SimString
- How do I use glove on crfsuite with python? HOT 2
- Different results across platforms (Windows, Ubuntu, etc...) HOT 2
- Why are my results so different on identical runs?
- *deleted*
- Are transition features conditioned on observations supported?
- How to install lib on the M1 MacBooks? HOT 1
- Results difference between command-line CRFsuite vs Python CRFsuite
- Interpreting the CRFsuite Model File
- Hindi Language NER Training format HOT 15
- start with CRF suite in windows HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crfsuite.