GithubHelp home page GithubHelp logo

Feature for training CRF about crfsuite HOT 4 OPEN

chokkan avatar chokkan commented on July 22, 2024
Feature for training CRF

from crfsuite.

Comments (4)

usptact avatar usptact commented on July 22, 2024

Beyond gazetteer features, adding Brown or Clark cluster features also improve performances. I experimented a lot with Brown cluster features and got consistent improvement across various models I built. The nice property of Brown clusters is their hierarchical nature. You can include the whole path as features and let the algorithm figure out (e.g set "-p c1=0.1" option) which are important.

from crfsuite.

niharikagupta92 avatar niharikagupta92 commented on July 22, 2024

I understand. I also tried including various features specific to my application. My question is slightly different. Why Baseline features+Word Embedding give good accuracy and only Word Embedding doesn't give good accuracy for CRF?

from crfsuite.

borissmidt avatar borissmidt commented on July 22, 2024

My guess is that the word embeddings is highly variant and require many training examples. While the other features are not. However the other features might be ambigue. Like if it starts with a capital letter is it the first word of the sentence, a name or a location?

This is where the word embedding helps to increase the accuracy because if it has a certain 'shape' or value. Thus for example if it was the first word of the sentence then the algorithm can see from the word embedding that it is a normal woord. While the other feature disagree.

Update: The word embeddings also have a high probability to find synonyms for words or words with a simular meaning. Thus it can make the rules more general then with the hand picked features alone.

from crfsuite.

usptact avatar usptact commented on July 22, 2024

I would say that baseline features work as advertised - you know what information they carry. This is because those are hand-crafted features. The word embedding features encode information about a specific word being in some context. It might capture some of the information the baseline features does but you don't know that for sure (beauty of deep learning, eh?). It is safe to say that the two are complimentary.

from crfsuite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.