Comments (4)
Beyond gazetteer features, adding Brown or Clark cluster features also improve performances. I experimented a lot with Brown cluster features and got consistent improvement across various models I built. The nice property of Brown clusters is their hierarchical nature. You can include the whole path as features and let the algorithm figure out (e.g set "-p c1=0.1" option) which are important.
from crfsuite.
I understand. I also tried including various features specific to my application. My question is slightly different. Why Baseline features+Word Embedding give good accuracy and only Word Embedding doesn't give good accuracy for CRF?
from crfsuite.
My guess is that the word embeddings is highly variant and require many training examples. While the other features are not. However the other features might be ambigue. Like if it starts with a capital letter is it the first word of the sentence, a name or a location?
This is where the word embedding helps to increase the accuracy because if it has a certain 'shape' or value. Thus for example if it was the first word of the sentence then the algorithm can see from the word embedding that it is a normal woord. While the other feature disagree.
Update: The word embeddings also have a high probability to find synonyms for words or words with a simular meaning. Thus it can make the rules more general then with the hand picked features alone.
from crfsuite.
I would say that baseline features work as advertised - you know what information they carry. This is because those are hand-crafted features. The word embedding features encode information about a specific word being in some context. It might capture some of the information the baseline features does but you don't know that for sure (beauty of deep learning, eh?). It is safe to say that the two are complimentary.
from crfsuite.
Related Issues (20)
- Exclude sentence with only O HOT 1
- Character n-grams HOT 2
- R wrapper available at https://github.com/bnosac/crfsuite HOT 1
- lib/cqdb/src/cqdb.c and Wstringop-truncation HOT 3
- mersenne twister HOT 1
- meaning of min_freq HOT 3
- Old lookup3.c file, `k8` undeclared
- Deprecate Python SWIG binding and make python-crfsuite the canonical binding?
- Unable to compile a very simple Tagger with the C++ API HOT 2
- Comparison with SimString
- How do I use glove on crfsuite with python? HOT 2
- Different results across platforms (Windows, Ubuntu, etc...) HOT 2
- Why are my results so different on identical runs?
- *deleted*
- Are transition features conditioned on observations supported?
- How to install lib on the M1 MacBooks? HOT 1
- Results difference between command-line CRFsuite vs Python CRFsuite
- Interpreting the CRFsuite Model File
- Hindi Language NER Training format HOT 15
- start with CRF suite in windows HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crfsuite.