Comments (8)
The training data mostly contains full sentences. So the model is good at translating sentences. But here the input is a single word which is a different task. If you want the model to perform well on these inputs, you should add such examples in the training data.
(I'm the author of CTranslate2. Feel free to tag me if you have any questions or issues. We are here to help.)
from libretranslate.
I added a Wiktionary scraping script so hopefully future models will do this better.
from libretranslate.
Looks like it likes salad.
This is an Argos Translate issue. I reproduced it and the sentence boundary detection and tokenization look fine. Argos Translate uses a Transformer as its sequence to sequence model. The model is a black box that can sometimes have weird outputs. If you post on the OpenNMT forum you might get a better answer. The PyTorch port for the training scripts is almost done which will have a larger model and more training resources but I'm not sure when an updated Spanish model will get trained. The new model will likely fix this specific issue and hopefully have fewer similar ones.
from libretranslate.
Hi @PJ-Finlay , sorry to tag you here, but for some reason my post was flagged as spam in the community forums: https://community.libretranslate.com/t/odd-translation-behavior-repeating-words/827
If I understand correctly, I need to release a more recent model for a language that includes the wiktionary data? How do I know if a language uses the Wiktextract data? (Based on this: Argos Open Tech , I cannot tell). The data-index.json seems to be outdated (can't find some languages there).
Thanks in advance
from libretranslate.
salad ?
from libretranslate.
Thanks, I'll try posting about this there.
As a remark, it seems the text output in the Argos Translate you shown, it looks slightly different. Note the "sala sala sala sala sala" (without d) is not the same count and positioning. I guess the training data or iteration count were not the same.
Update: I've found the following post, not sure if related, with some proposals: https://forum.opennmt.net/t/repeated-phrases-in-the-translation/4155
from libretranslate.
In general I don't think Argos Translate has deterministic translations the model itself was only trained once but for some reason sometimes comes up with different results. Based on the CTranslate Python docs it doesn't look like CTranslate supports the lock_ngram_repeat
param they're talking about in the linked forum post.
from libretranslate.
lol, I had a giggle at this :)
Hey @guillaumekln ✋ glad to have you here! CTranslate2 is pretty amazing.
from libretranslate.
Related Issues (20)
- Run with cuda HOT 1
- Numbers in Translator are a bit wonky-doodle
- Bad translation from russian HOT 1
- ValueError: substring not found in site-packages/stanza/models/tokenize/utils.py HOT 1
- Hashtag translation Russian to English: `#` becomes `♪` HOT 3
- Chinese to chinese(traditional) translate output funny result HOT 3
- TypeError: 'NoneType' object is not iterable [7036] Failed to execute script 'Main' due to unhandled exception screen translatre
- Setup LibreTranslate on Azure - Issue HOT 1
- TRANSLATE FILEs not work HOT 1
- Downloaded models are not compatible with installed version of libretranslate HOT 3
- Provide a way to setup the webinterface in a subdirectory HOT 3
- Docker Swarm "OSError: [Errno 97] Address family not supported by protocol" HOT 7
- Handling of lists or enumerations
- Use Translation API tip 403 What parameters are missing HOT 1
- (URlError(gaierror(-3, 'Temporary failure in name resolution')),) HOT 2
- Unable to install on Ubuntu 24.04 LTS HOT 2
- English -> Spanish translator sometimes stops translating HOT 1
- Norwegian to English
- Latest release (1.5.7) missing in pip HOT 1
- Wrong country name translations English to German
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libretranslate.