GithubHelp home page GithubHelp logo

mozilla / translation-service Goto Github PK

View Code? Open in Web Editor NEW
20.0 14.0 5.0 32 KB

This is the repo that hosts the code for Mozilla's translation service

License: Mozilla Public License 2.0

CMake 7.35% Dockerfile 7.57% Shell 9.13% C++ 49.18% Makefile 5.68% Python 21.10%

translation-service's Introduction

Translation service

HTTP service that uses bergamot-translator and compressed neural machine translation models for fast inference on CPU.

Running locally

  1. Install Git LFS https://git-lfs.github.com/
  2. git clone this repo
  3. make setup-models
  4. make build-docker
  5. make run
  6. make call

Calling the service

$ curl --header "Content-Type: application/json" \
      --request POST \
      --data '{"from":"es", "to":"en", "text": "Hola Mundo"}' \
      http://0.0.0.0:8080/v1/translate
> {"result": "Hello World"}

Service configuration

Directory that contains models ('esen', 'ende' etc.) should be mounted to /models in Docker container.

Environment variables to set in container:

PORT - service port (default is 8000)

LOGGING_LEVEL - ERROR, WARNING, INFO or DEBUG (default is INFO)

WORKERS - number of bergamot-translator workers (default is 1). 0 - automatically set as number of available CPUs. It is recommended to minimize workers and scale horizontaly with k8s means.

Testing

make python-env - install pip packages

make test - to run integration API tests

make load-test - to run a stress test (requires more models to download that unit tests)

translation-service's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translation-service's Issues

Segfault on loading of some models

It happens on loading of bgen, enbg, nben, nnen. All other models are loaded correctly. Update of moz-bergamot-translator module didn't help.

docker run --name translation-service -it --rm -v $(pwd)/tmp:/models -p 8080:8080 -e PORT=8080 translation-service
[2022-02-23 20:42:24] [data] Loading SentencePiece vocabulary from file /models/bgen/vocab.bgen.spm
[2022-02-23 20:42:24] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file.
[2022-02-23 20:42:24] [data] Loading binary shortlist as /models/bgen/lex.50.50.bgen.s2t.bin true
[2022-02-23 20:42:24] [data] Lexical short list firstNum 100 and bestNum 100
[2022-02-23 20:42:24] [memory] Extending reserved space to 128 MB (device cpu0)
[2022-02-23 20:42:24] Loaded model config
[2022-02-23 20:42:24] Loading scorer of type transformer as feature F0
[2022-02-23 20:42:24] [memory] Reserving 31 MB, device cpu0
[2022-02-23 20:42:24] [memory] Reserving 8 MB, device cpu0
Model bgen is loaded
[2022-02-23 20:42:24] [data] Loading SentencePiece vocabulary from file /models/enbg/vocab.bgen.spm
[2022-02-23 20:42:24] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file.
[2022-02-23 20:42:24] [data] Loading binary shortlist as /models/enbg/lex.50.50.enbg.s2t.bin true
[2022-02-23 20:42:24] [data] Lexical short list firstNum 100 and bestNum 100
[2022-02-23 20:42:24] Error: Error: shortlist indices are out of bounds
[2022-02-23 20:42:24] Error: Aborted from void marian::data::BinaryShortlistGenerator::contentCheck() in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/data/shortlist.cpp:160

[CALL STACK]
[0x562d1c41c68c]                                                       + 0x2ad68c
[0x562d1c41d968]                                                       + 0x2ae968
[0x562d1c41e25b]                                                       + 0x2af25b
[0x562d1c41fcef]                                                       + 0x2b0cef
[0x562d1c2e8514]                                                       + 0x179514
[0x562d1c2e42ad]                                                       + 0x1752ad
[0x562d1c2c6f0c]                                                       + 0x157f0c
[0x562d1c289037]                                                       + 0x11a037
[0x562d1c2663ae]                                                       + 0xf73ae
[0x7fa9b709f0b3]    __libc_start_main                                  + 0xf3
[0x562d1c286ace]                                                       + 0x117ace

[2022-02-23 20:42:24] Error: Segmentation fault
[2022-02-23 20:42:24] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/common/logging.cpp:134

[CALL STACK]
[0x562d1c34ee60]                                                       + 0x1dfe60
[0x562d1c34f0af]                                                       + 0x1e00af
[0x7fa9b75cb3c0]                                                       + 0x153c0
[0x7fa9b709d941]    abort                                              + 0x213
[0x562d1c22714d]                                                       + 0xb814d
[0x562d1c41d968]                                                       + 0x2ae968
[0x562d1c41e25b]                                                       + 0x2af25b
[0x562d1c41fcef]                                                       + 0x2b0cef
[0x562d1c2e8514]                                                       + 0x179514
[0x562d1c2e42ad]                                                       + 0x1752ad
[0x562d1c2c6f0c]                                                       + 0x157f0c
[0x562d1c289037]                                                       + 0x11a037
[0x562d1c2663ae]                                                       + 0xf73ae
[0x7fa9b709f0b3]    __libc_start_main                                  + 0xf3
[0x562d1c286ace]                                                       + 0x117ace

make: *** [run] Error 127

Charset isues with russian?

STR:

  • Install and setup the service following the README
  • Try to translate to russiain:
macbookpro:translation-service anatal$ curl -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8"--header "Content-Type: application/json"       --request POST       --data '{"from":"es", "to":"ru", "text": "Buenos días"}'       http://0.0.0.0:8080/v1/translate
curl: (3) URL using bad/illegal format or missing URL
{"result":"\u00d0\u00a4\u00d0\u00ce\u00d0\u00c1\u00e1\u0080\u00d0\u00ce\u00d0\u00c5 \u00e1\u0093\u00e1\u0092\u00e1\u0080\u00d0\u00ce"}

enpl enbg nben causing "Error: shortlist indices are out of bounds"

Worked well via Docker but I had to delete enpl enbg nben from /models/prod/ first otherwise I'd get e.g

[2022-06-07 20:16:32] [data] Loading SentencePiece vocabulary from file /models/nben/vocab.nben.spm
[2022-06-07 20:16:32] Missing list of protected prefixes for sentence splitting. Set with --ssplit-prefix-file.
[2022-06-07 20:16:32] [data] Loading binary shortlist as /models/nben/lex.50.50.nben.s2t.bin true
[2022-06-07 20:16:32] [data] Lexical short list firstNum 50 and bestNum 50
[2022-06-07 20:16:32] Error: Error: shortlist indices are out of bounds
[2022-06-07 20:16:32] Error: Aborted from void marian::data::BinaryShortlistGenerator::contentCheck() in /app/3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/data/shortlist.cpp:160

Server will not start - Aborted (core dumped)

Hi!

I tried to give this server a try. I followed the instructions to build the Docker image which worked well, but then when I try to start the server, I get the following error:

docker run --name translation-service -it --rm -v $(pwd)/models:/models -p 8000:8000 -e LOGGING_LEVEL=DEBUG translation-service

Looking for models in "/models/uken"
Adding file trgvocab.uken.spm
Adding file srcvocab.uken.spm
Adding file model.uken.intgemm8.bin
Adding file lex.uken.s2t.bin
Building models config for uken
Aborted (core dumped)

I didn't see any errors in the console when building the docker image. I get the same error when I run the latest image from Docker hub mozilla/translation-service.

The problem seems to be the uken / enuk models - if I remove those folders from the model directory the server appears work well !

Probably this is fixed in PR #22 ?

Reconfigure CircleCI integration

Following the CircleCI security incident, all CircleCI secrets and SSH keys were removed.

To continue using this repo, we should reconfigure:

  • Deployment to Dockerhub

Error compiling encoder_decoder

I'm getting an error during compilation inside of building the docker image. Does anyone know what might be the issue?

Here are the final lines of the command-line output. The script stalls for about 10 mins when compiling encode_decoder and then errors out.

108.4 [ 78%] Building CXX object 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/optimizers/clippers.cpp.o
108.8 [ 78%] Building CXX object 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/optimizers/optimizers.cpp.o
110.5 [ 80%] Building CXX object 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/models/model_factory.cpp.o
116.8 [ 80%] Building CXX object 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/models/encoder_decoder.cpp.o
720.6 c++: fatal error: Killed signal terminated program cc1plus
720.6 compilation terminated.
720.6 make[2]: *** [3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/build.make:609: 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/graph/expression_operators.cpp.o] Error 1
720.6 make[2]: *** Waiting for unfinished jobs....
767.8 make[1]: *** [CMakeFiles/Makefile2:530: 3rd_party/moz-bergamot-translator/3rd_party/marian-dev/src/CMakeFiles/marian.dir/all] Error 2
767.8 make: *** [Makefile:152: all] Error 2
------
Dockerfile:46
--------------------
  44 |     ADD ./CMakeLists.txt ./CMakeLists.txt
  45 |     
  46 | >>> RUN bash compile.sh
  47 |     
  48 |     ENV PORT=8000
--------------------
ERROR: failed to solve: process "/bin/sh -c bash compile.sh" did not complete successfully: exit code: 2
make: *** [build-docker] Error 1

No PubKey for Intel MKL libraries

Not sure if it's a universal issue (or a universal workaround)...

I ran into the following error when building the docker image. As a workaround, I was able to proceed by modifying Dockerfile and changing the PubKey from GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB to GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB.

If this is a proper solution/workaround, I can also repost this issue as a pull request.

0.720 Get:4 https://apt.repos.intel.com/mkl all InRelease [4438 B]
0.741 Hit:5 http://security.ubuntu.com/ubuntu focal-security InRelease
0.849 Err:4 https://apt.repos.intel.com/mkl all InRelease
0.849   The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BAC6F0C353D04109
0.947 Reading package lists...
1.890 W: GPG error: https://apt.repos.intel.com/mkl all InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BAC6F0C353D04109
1.890 E: The repository 'https://apt.repos.intel.com/mkl all InRelease' is not signed.
------
Dockerfile:23
--------------------
  22 |         wget -qO- 'https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB' |  apt-key add -
  23 | >>> RUN sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' && \
  24 | >>>     apt-get update && \
  25 | >>>     apt-get install -y intel-mkl-64bit-2020.0-088
  26 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' &&     apt-get update &&     apt-get install -y intel-mkl-64bit-2020.0-088" did not complete successfully: exit code: 100
make: *** [build-docker] Error 1

Dockerhub image doesn't run on MBP with Apple Silicon

docker run --name translation-service -it --rm -v $(pwd)/firefox-translations-models/models/prod:/models -p 8080:8080 -e PORT=8080 mozilla/translation-service:latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
qemu: uncaught target signal 4 (Illegal instruction) - core dumped

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.