argosopentech / argos-translate Goto Github PK
View Code? Open in Web Editor NEWOpen-source offline translation library written in Python
Home Page: https://www.argosopentech.com
License: MIT License
Open-source offline translation library written in Python
Home Page: https://www.argosopentech.com
License: MIT License
My target platforms are:
Hybrid Apps made with: Monaca or Cordova
React Native App made with Expo.io
Thanks
Hi,
Thank you for the great project!
I am using argos translate in my project. I have created a customized sklearn transformer where I call the argos models for translation. My customized transformer is part of sklearn pipeline. However, when I set the pipeline hyperparameter n_jobs to a value higher than 1, I receive the error:
TypeError: can't pickle ctranslate2.translator.Translator objects
Any ideas/advice how I can solve this issue? Are you planning to make ctranslate2 objects picklable?
Thanks again!
[1] 91775 segmentation fault sudo argos-translate-cli --from-lang en --to-lang ru "Hello."
No problems with installation of models. Models enumerate properly, etc.
Trying to install argostranslate on Debian unstable with pip I get the error message:
Package sentencepiece was not found in the pkg-config search path.
Perhaps you should add the directory containing `sentencepiece.pc'
to the PKG_CONFIG_PATH environment variable
No package 'sentencepiece' found
Failed to find sentencepiece pkgconfig
Is there anything I can do?
Continuing the discussion from this thread.
My plan is to train a Japanese model next with 10,000 epochs (I've used ~30,000 for all the existing ones). Japanese is somewhat similar to Chinese and has a similar amount of data available so it'll be a good test bed plus we can add a new language in the process.
Is it possible to add support for Persian language?
Using emojis within texts at best gets dropped, and in some cases changes translations to something else.
I know this is a training matter...
But it came to my mind (after some testing and trial-error), that maybe by using something like .encode("unicode_escape")
* we could let them stay the same (as it often will, so far that I tested) and then afterwards we get it decoded back...
Basically, since we never have to "translate" those characters, I'm thinking maybe we could filter/keep them...
*P.S. not exactly this encode
statement, but to be figured out 😅
i want to change title
but i am changing in GUI.py file but not updated
Support connecting to a remote LibreTranslate server for translations you don't have locally.
It would be useful to have a flatpak for argos-translate at https://flathub.org/home .
This is probably most a test environment issue, but this can happen:
../../../venv/lib/python3.8/site-packages/argostranslate/package.py:6: in <module>
from argostranslate import settings
../../../venv/lib/python3.8/site-packages/argostranslate/settings.py:17: in <module>
for package_dir in content_snap_packages.iterdir():
/home/mike/.pyenv/versions/3.8.6/lib/python3.8/pathlib.py:1121: in iterdir
for name in self._accessor.listdir(self):
E FileNotFoundError: [Errno 2] No such file or directory: '/snap/pycharm-community/223/snap_custom/content_snap_packages'
PR: #19
While investigating using argos-translate as a library, I have noticed non-deterministic results when translating a short test string "Hello world!" using your pre-trained models. For English -> Russian, it returns "Здравствуй мир!" on some hosts, and "Здравствуй!" on others. The results on a given host are deterministic on repeated runs and environments (at least in my testing so far).
I first tried to follow the advice here thinking it could be a random seed issue to no avail:
OpenNMT/OpenNMT-py#392
pytorch/pytorch#7068 (comment)
I was not able to determine any significant differences between hosts (both running on cpu), and the output of ct2_verbose is identical:
[ct2_verbose] CPU: GenuineIntel (SSE4.1=true, AVX=true, AVX2=true)
[ct2_verbose] Selected CPU ISA: AVX2
[ct2_verbose] Use Intel MKL: true
[ct2_verbose] SGEMM CPU backend: MKL
[ct2_verbose] GEMM_S16 CPU backend: MKL
[ct2_verbose] GEMM_S8 CPU backend: MKL (u8s8 preferred: true)
[ct2_verbose] Use packed GEMM: false
Manually setting num_hypotheses=2
in the ctranslate2 Translator shows that it appears to be a score difference:
Host #1:
{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '!'], 'score': -2.7840166091918945}
{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '▁мир', '!'], 'score': -2.841048240661621}
Host #2:
{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '▁мир', '!'], 'score': -2.7670412063598633}
{'tokens': ['▁З', 'д', 'рав', 'ству', 'й', '!'], 'score': -2.7944717407226562}
Setting beam_size=1
so it uses greedy search did produce the same result on both hosts, but I don't think that is a valid solution.
I created a gist to provide some debugging output, and didn't notice any difference in the actual argos-translate parsing logic, so it seems to be much deeper: https://gist.github.com/mikemoritz/a5bf76193ccb16d018a1af9ec584fb41
My questions are:
Thanks!
Given is the English text: "Well done 👍"
The text itself gets translated perfectly in any language. However, depending on the target language the emoji is translated to "" or "?" or "Benachrichtigung" (in German).
Would it be possible to detect the emoji and leave that character as it is?
Hint: in Unicode 13.0 there are 4 character ranges allocated for emojis: U+1F300 (127744) to U+1FAD6 (129750), 126980 to 127569, 169 to 174 and 8205 to 12953
Using the snap, the cursor does not follow the system theme. It's more obvious if you change the cursor theme to something that looks different, like "redglass", but even the Ubuntu default Yaru theme and size is not followed. One Qt app snap that this does work in is KeePassXC. It looks like their snapcraft.yaml has some additions plugs for theming.
Would it be possible to support command-line usage? I searched the documentation but found nothing. I would like to automate translating texts and also text files into multiple languages.
As an example I suggest the following:
argos-translate -text "Hello World!" -from en -to de
argos-translate -file Novel.txt -from en -to de
The training scripts have lots of room for improvement. The long term plan is to rewrite them in OpenNMT for PyTorch in a fully automatable way but there are other potential improvements:
Please allow for Argos Translate to close into system tray rather than take up room in the panel.
Currently the Snapcraft image is ~1GB, ~700MB of this is a torch cuda shared object file. If this could be removed automatically in the Snapcraft build process somehow (or maybe on option for all python installs?) then the download and startup time for Snapcraft would greatly improve (these are currently both issues).
When packages are deleted the packages table shrinks but the window doesn't.
Is there any plan to support above mentioned languages ? If so, how can I help ?
I see that this repository is licensed under the MIT license, but the language training models are hosted outside of this repository that can be downloaded with HTTPS, IPFS or torrent.
Does the same MIT license apply to the models as well, or are they distributed under a different license?
This should probably be listed somewhere.
We currently don't have any tests, but it would be nice to. Not being able to include a .argostranslate file in the tests easily will make this more difficult but at least having some tests would be good.
$ pip install argostranslate
Collecting argostranslate
Using cached argostranslate-1.0.5-py3-none-any.whl (13 kB)
Using cached argostranslate-1.0.3-py3-none-any.whl (12 kB)
Using cached argostranslate-1.0-py3-none-any.whl (12 kB)
ERROR: Cannot install argostranslate==1.0, argostranslate==1.0.3 and argostranslate==1.0.5 because these package versions have conflicting dependencies.
The conflict is caused by:
argostranslate 1.0.5 depends on ctranslate2==1.14.0
argostranslate 1.0.3 depends on ctranslate2==1.14.0
argostranslate 1.0 depends on ctranslate2
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
Configuration/environment:
--------------------------------------------------------------------------------
Date: Wed Dec 23 15:59:00 2020 CET
OS : Darwin
CPU(s) : 12
Machine : x86_64
Architecture : 64bit
Environment : Python
Python 3.7.7 (default, Mar 23 2020, 17:31:31) [Clang 4.0.1
(tags/RELEASE_401/final)]
numpy : 1.19.4
IPython : 7.19.0
scooby : 0.5.6
--------------------------------------------------------------------------------
pip 20.3.3
The plan is to use GitLab for CI. The first step is to update scripts/update_to_pypi.sh to automate uploading a new version to PyPI.
Fix packages_changed = pyqtSignal()
in gui.py to correctly update all views when the state of packages has changed.
This won't install on a Librem phone...
$snap install argos-translate
error: snap "argos-translate" is not available on stable for this architecture (arm64) but exists on other architectures (amd64)
This seems a very handy library and why does it not run on arm computers?
I'm using argos-translate via libretranslate, so if this is the wrong place for this, I'll move it.
I'm testing out the english -> japanese translations and I think some bad data might have gotten into the training data.
"Hello" is being translated as "お問い合わせ" which translates to "Contact Us" (something you'd expect to see at the bottom of a webpage used for training?)
"Goodbye" is being translated as "フィードバック" (feedback). Again, something you'd expect to see at the bottom of a webpage).
"Help me!" is also being translated as "お問い合わせ".
Not exactly sure how I help, but I figured I'd point out the issue.
Screenshots promoted by argos are token from a GNOME display environment. Gnome itself has not the best integration with Qt. Gnome uses Gtk.
Maybe take these screenshots on a display environment with a better Qt integration like KDE Plasma.
The Python version doesn't show the application (this does work in Snapcraft because it uses a separate icon file):
argos-translate/argostranslate/gui.py
Lines 194 to 197 in 4f5396a
Currently there is no custom localization but this would be nice to have. Qt provides some nice tools for doing this and the apps strings could be translated using the app itself.
The plan for this was to train a model using the existing infrastructure that maps from input text to a language code. This would require adding a way to generate this data in the training scripts and what is hopefully a pretty small code change to support this. I'd be pretty optimistic about this just working pretty well out of the box but it may take some tweaking.
Hi,
Thanks for the amazing project. This was a fresh install on a new python 3.8.3 virtual environment using pip install and launching straight away. The web app launches, but after a few key strokes, the process crashes with the following error logged to the terminal:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Abort trap: 6
Any advice appreciated here, thanks again.
The easiest are probably MacOS and Windows using py2app and py2exe but other platforms to consider could be mobile, BSD, Debian, Red Hat, FlatPak, or BSD. I'd like to be able to run builds on Linux as much as possible but this may not be possible for some platforms.
There's also a decision to be made if we want to use tools like py2app/py2exe or go all in on pyqtdeploy.
There are probably some challenges for doing local translation on mobile so a better strategy may be to build/port simple mobile apps that connect to the LibreTranslate API.
Currently models are distributed by Google Drive (not ideal) and a slow BitTorrent, so there's lots of room for improvement:
The plan was to make a separate repo for storing model distribution information so let me know if your interested.
There is now a package index that can be updated, and packages can be automatically downloaded from Python. GUI support would be nice.
When I first wrote this CTranslate, which does inference, didn't support GPU translation from PyPI. This has since changed and this would be a nice feature to have. All this may take is updating the CTranslate version in requirements.txt and adding documentation but if someone with more CUDA knowledge could look into this I would appreciate it. Also it would be nice to support open-source alternatives to CUDA.
Argos Translate also prints an error message about torch not being able to connect to CUDA:
/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
Torch is only used by Stanza which does sentence boundary detection so it supporting GPU inference isn't as important as CTranslate supporting GPUs for performance but this error message should be supressed.
i want to deploy this in windows
can you pls guide in details
Training language models is currently very manual. Opus has an API to gather data: https://pypi.org/project/opus-api/
Installing a large number of models requires unzipping them which can take a while. A loading indicator should be shown so that users don't think the program has frozen.
I want to fix some translation issues and add new strings, but not understand how can I contribute to the project. Is there any instructions how to change dictionary database? As understand I should operate with https://github.com/argosopentech/onmt-models repository?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.