Comments (5)
Hi akazer2
Were you able to resolve the issue?
I was also trying to model a text file (10 MB) but crfsuite gives segmentation fault.
Thanks, in advance
from crfsuite.
Did anyone manage to resolve this ?
from crfsuite.
The thing is that during training much more memory is quested than just fitting your dataset in the memory.
For this big datasets, I suggest to use online algorithms. I found the Vowpal Wabbit to be not only very versatile but also scaling very well. Yes, including sequence tagging as CRFSuite does. I can show how to do sequence tagging with VW.
from crfsuite.
@usptact , could you please provide an example of sequence tagging in Vowpal ? What command line and input format ?
from crfsuite.
The data format is similar to that of CRFSuite, except spaces are used to separate features. VW also introduces feature spaces. The following is a training example for sequence tagging in VW format (notice the empty line between the two examples; I am using only one feature space, called "f"):
label1 |f f1 f2 f3
label2 |f f2 f3 f4
label3 |f f4 f5 f1
label2 |f f2 f4
label3 |f f1 f3
The sequence tagging model can be trained with this command:
vw --data train.feat \
--cache \
--passes 10 \ # keep this small
--search_task sequence \ # the task is sequence tagging
--search $NUM_LABELS \ # number of possible labels
--search_rollin=policy \
--search_rollout=none \
--named_labels "$(< labels)" \ # provide a comma-separated list of string labels if integer labels are not used
-b 28 \ # number of bits for feature hashing - more is better
--l2=1e-5 \ # per-example regularization
--l1=1e-7 \
-f $MODEL \ # store the model
--readable_model $MODEL.txt # store the model in readable format
from crfsuite.
Related Issues (20)
- Exclude sentence with only O HOT 1
- Character n-grams HOT 2
- R wrapper available at https://github.com/bnosac/crfsuite HOT 1
- lib/cqdb/src/cqdb.c and Wstringop-truncation HOT 3
- mersenne twister HOT 1
- meaning of min_freq HOT 3
- Old lookup3.c file, `k8` undeclared
- Deprecate Python SWIG binding and make python-crfsuite the canonical binding?
- Unable to compile a very simple Tagger with the C++ API HOT 2
- Comparison with SimString
- How do I use glove on crfsuite with python? HOT 2
- Different results across platforms (Windows, Ubuntu, etc...) HOT 2
- Why are my results so different on identical runs?
- *deleted*
- Are transition features conditioned on observations supported?
- How to install lib on the M1 MacBooks? HOT 1
- Forced decoding support for partial labelled sequence ? HOT 2
- Interpreting the CRFsuite Model File
- Hindi Language NER Training format HOT 15
- start with CRF suite in windows HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crfsuite.