GithubHelp home page GithubHelp logo

Comments (19)

usptact avatar usptact commented on July 22, 2024 2

@tianjianjiang All due respect to the author of CRFSuite (did really great job) but it would take a while to get your improvement merged in. Perhaps the best bet for you would be to fork the project and work there. Thanks for your contribution.

from crfsuite.

bratao avatar bratao commented on July 22, 2024 1

@usptact , I think he already did. https://github.com/tianjianjiang/crfsuite-openmp

from crfsuite.

tianjianjiang avatar tianjianjiang commented on July 22, 2024 1

In my experiences, CRFsuite and libLBFGS are not OpenMP friendly. Of course there are other ways to have multi-core support, but for OpenMP, it might even require fundamental changes, which is probably an unacceptable cost, in CRFsuite.

from crfsuite.

napsternxg avatar napsternxg commented on July 22, 2024

@kmike @chokkan @ogrisel What do you guys think about it ?

from crfsuite.

tianjianjiang avatar tianjianjiang commented on July 22, 2024

I just submitted a pull request #68 for it but with difference loops annotated.

from crfsuite.

tianjianjiang avatar tianjianjiang commented on July 22, 2024

@usptact Not a problem at all.
@bratao Thanks for the clarification.

In fact, it's rather a good idea to wait for a while. I've noticed that in different OS with different compilers and on certain data set, the calculation can be inefficient or even hanging (0% CPU time).

from crfsuite.

tianjianjiang avatar tianjianjiang commented on July 22, 2024

The pull request #68 has just been updated to improve the performance. It seems finally faster than original version now.

from crfsuite.

napsternxg avatar napsternxg commented on July 22, 2024

@tianjianjiang thanks for the work. Can you add some test scripts for benchmarking the performance. An ipython notebook would be a very good option.

from crfsuite.

CSabty avatar CSabty commented on July 22, 2024

Hii, I am new to the field of multi processing and I just want to know how to run CRFsuite using the library openMP as without it, it's extremely slow for big data sets?
Thank you in advance

from crfsuite.

usptact avatar usptact commented on July 22, 2024

@CSabty If you need speed for learning from very large datasets, please take a look at Wapiti or use Vowpal Wabbit in learning to search mode. I use the latter when I need to train a NER model very quickly.

from crfsuite.

bratao avatar bratao commented on July 22, 2024

@usptact could you please share what command line you used for ner with Vowpal? I was never able to come with a working command line for taggging.

from crfsuite.

usptact avatar usptact commented on July 22, 2024

@bratao Sure, here you go:

vw  --data train.feat \
    --learning_rate 0.5 \
    --cache --kill_cache \
    --threads \
    --passes 10 \
    --search_task sequence \
    --search $NUM_LABELS \
    --search_rollin=policy \
    --search_rollout=none \
    --named_labels "$(< labels)" \
    -b 28 \
    --l1=1e-7 \
    -f $MODEL \
    --readable_model $MODEL.txt

You will need the training file ("train.feat") in multi-line format (see doc) and a file "labels" with string labels that are BIO tags (in my case). If there are only few, you can list the tags as comma-separated list in console.

from crfsuite.

CSabty avatar CSabty commented on July 22, 2024

@usptact Thank you so much for your reply, I am working on NER training as well. Do you think Wapiti or Vowpal Wabbit are better in performance (speed wise) than CRF++ ? As I was planning to use CRF++ using multi-core because I feel it has more recourses online and maybe simpler compared to the other ones.

from crfsuite.

usptact avatar usptact commented on July 22, 2024

@CSabty In my experience, performance-wise, the CRF is still the best although I did not do thorough comparison.

from crfsuite.

yiqingyang2012 avatar yiqingyang2012 commented on July 22, 2024

@usptact

You will need the training file ("train.feat") in multi-line format (see doc) and a file "labels" with string labels that are BIO tags (in my case). If there are only few, you can list the tags as comma-separated list in console.

In POS task, can i use the same feature with crfsuite when training by Vowpal Wabbit tool? And features can follows with a " : " and then a float scaling value in crfstuite train dateset, but it seems like the ':' is used to set the feature value rather than feature importance in Vowpal Wabbit.

it's too painful to use Vowpal Wabbit, do you have write some sequence search related blog?
thanks ~~

from crfsuite.

usptact avatar usptact commented on July 22, 2024

Both in CRFSuite and VW, the ":" character is special. In former you can escape it like this "\:" but in latter you can't. Assuming you don't want to change default weight of 1.0.

from crfsuite.

jbkoh avatar jbkoh commented on July 22, 2024

I wonder if this development of multicore CRF has been dead or not. I am dying for such feature.

from crfsuite.

usptact avatar usptact commented on July 22, 2024

@jbkoh If you are looking for multi CPU training of CRFs, take a look at https://github.com/zhongkaifu/CRFSharp

from crfsuite.

jbkoh avatar jbkoh commented on July 22, 2024

@usptact @tianjianjiang Thanks for the information! I wish I could have exploited the cores with PyCRFSuite, but I can switch to the pointer. Thank you all.

from crfsuite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.