Comments (4)
BTW, I consider nametag to be very weak currently -- it is not very accurate (it is unchanged since ~2014) and requires a tagger+lemmatizer to work; we use it only for Czech.
As for extracting the tagger -- the released UDPipe models actually contain two MorphoDita models -- one is a tagger predicting UPOS, XPOS & Feats, and the other one is a lemmatizer predicting UPOS & Lemmas. I do not think it is possible to extract the models using existing binaries, but it would be trivial to write one, if you want it.
from nametag.
I have some .udpipe models where the parts of speech and the lemmatizer was trained with 1 morphodita model for which I can still use the tagger external now to test NameTag out.
Some background:
- I'm working on 15th-19th century corpora with text consisting of a combination of Dutch dialects with French & Latin and
- which are obtained by either manual transcription of images or automated (full of errors) extraction of text from images based on Transcribus or Tesseract.
I don't mind using pre-deep learning machine learning techniques, my laptop is still from 2013 and the users of the models are historians which have no clue about computer programming.
Free free to provide any advise on tooling that would be more suitable. The requirements that I have are
- a named entity recognition model can be trained and scored on a regular CPU-only computer in decent time
- the toolkit should not assume pretrained embeddings exist
- preferably written in C++ without any very complex Makefile wizardry so that I can easily wrap it up in an R package in 1 day instead of 1 week
For example I couldn't find any open-source biLSTM-CRF model which matches the above requirements. Would be interested in pointers to tooling you advise.
from nametag.
I do not really have any suggestions -- NameTag generally fulfils the "not much required computational performance". The disadvantages are the required morphological model (but if you already have it, it is not a problem) and lower than state-of-the-art performance (it does not even use a CRF layer -- it uses a MEMM with dynamic decoding only; and the implemented feature templates are not that strong). But I do not have any low-resource alternative (we are still using it for Czech)...
When the new UDPipe appears (yes, it is bordering with vaporware at this moment, I am unfortunately aware), we plan a NER + NEL modules too; but they will require substantially more computational resources (especially for training)...
from nametag.
Thanks for the messages and the advice. Looking forward to the vaporware announcements :)
from nametag.
Related Issues (18)
- Wrong token ranges when sentences are in vertical input HOT 3
- Python bindings doesn't work with Python 3.8+ HOT 5
- Unexpected category in czech-cnec2.0-200831 model HOT 4
- Duplicity rows in nametag output HOT 7
- Missing category HOT 1
- Integrate with CLARIN LR Switchboard HOT 7
- Invalid and incorrect JSON responses for some Python runs for Py 3.5 and lower [nametag2] HOT 4
- Nametag REST server fails when compiled in debug mode HOT 1
- NameTag2 returns code 400 + internal error for specific sentences [nametag2] HOT 3
- Support Tensorflow 2.x and Python 3.8+ in NameTag 2 HOT 2
- Conditional jump or move depends on uninitialised value HOT 5
- std::iterator deprecated in C++17 HOT 2
- Memory Leak in Java Binding HOT 2
- Why can't two words have same brown cluster representation? HOT 2
- Server returns invalid json when output is set to "vertical" and no entities were found. HOT 2
- Server returns invalid json when there is no data. HOT 1
- Enhancement: Accept data from request body HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nametag.