GithubHelp home page GithubHelp logo

Comments (8)

guillaumekln avatar guillaumekln commented on May 18, 2024 3

The training data mostly contains full sentences. So the model is good at translating sentences. But here the input is a single word which is a different task. If you want the model to perform well on these inputs, you should add such examples in the training data.

(I'm the author of CTranslate2. Feel free to tag me if you have any questions or issues. We are here to help.)

from libretranslate.

PJ-Finlay avatar PJ-Finlay commented on May 18, 2024 3

I added a Wiktionary scraping script so hopefully future models will do this better.

from libretranslate.

PJ-Finlay avatar PJ-Finlay commented on May 18, 2024 1

Looks like it likes salad.

This is an Argos Translate issue. I reproduced it and the sentence boundary detection and tokenization look fine. Argos Translate uses a Transformer as its sequence to sequence model. The model is a black box that can sometimes have weird outputs. If you post on the OpenNMT forum you might get a better answer. The PyTorch port for the training scripts is almost done which will have a larger model and more training resources but I'm not sure when an updated Spanish model will get trained. The new model will likely fix this specific issue and hopefully have fewer similar ones.

image

from libretranslate.

bruno-kakele avatar bruno-kakele commented on May 18, 2024 1

Hi @PJ-Finlay , sorry to tag you here, but for some reason my post was flagged as spam in the community forums: https://community.libretranslate.com/t/odd-translation-behavior-repeating-words/827

If I understand correctly, I need to release a more recent model for a language that includes the wiktionary data? How do I know if a language uses the Wiktextract data? (Based on this: Argos Open Tech , I cannot tell). The data-index.json seems to be outdated (can't find some languages there).

Thanks in advance

from libretranslate.

randallmoraes avatar randallmoraes commented on May 18, 2024

salad ?

from libretranslate.

fdelapena avatar fdelapena commented on May 18, 2024

Thanks, I'll try posting about this there.
As a remark, it seems the text output in the Argos Translate you shown, it looks slightly different. Note the "sala sala sala sala sala" (without d) is not the same count and positioning. I guess the training data or iteration count were not the same.

Update: I've found the following post, not sure if related, with some proposals: https://forum.opennmt.net/t/repeated-phrases-in-the-translation/4155

from libretranslate.

PJ-Finlay avatar PJ-Finlay commented on May 18, 2024

In general I don't think Argos Translate has deterministic translations the model itself was only trained once but for some reason sometimes comes up with different results. Based on the CTranslate Python docs it doesn't look like CTranslate supports the lock_ngram_repeat param they're talking about in the linked forum post.

from libretranslate.

pierotofy avatar pierotofy commented on May 18, 2024

lol, I had a giggle at this :)

Hey @guillaumekln ✋ glad to have you here! CTranslate2 is pretty amazing.

from libretranslate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.