winstxnhdw / nllb-api Goto Github PK

A low-memory high-performance CPU-based API for Meta's No Language Left Behind (NLLB) using CTranslate2, hosted on Hugging Face Spaces.

Home Page: https://huggingface.co/spaces/winstxnhdw/nllb-api

Dockerfile 0.59% Python 98.43% Makefile 0.98%

huggingface fastapi docker ctranslate2 transformers huggingface-spaces caddy nllb nllb200 machine-translation

nllb-api's Introduction

nllb-api

A fast CPU-based API for Meta's No Language Left Behind distilled 1.3B 8-bit quantised variant, hosted on Hugging Face Spaces. To achieve faster executions, we are using CTranslate2 as our inference engine. Requests are cached and then served at the reverse proxy layer to reduce server load.

Warning

NLLB has a max input length of 1024 tokens. This limit is imposed by the model's architecture and cannot be changed. If you need to translate longer texts, consider splitting your input into smaller chunks.

Usage

Simply cURL the endpoint like in the following. The source and target languages must be specified using FLORES-200 codes.

List of FLORES-200 Codes

Language	FLORES-200 Code
Acehnese (Arabic script)	ace_Arab
Acehnese (Latin script)	ace_Latn
Mesopotamian Arabic	acm_Arab
Ta’izzi-Adeni Arabic	acq_Arab
Tunisian Arabic	aeb_Arab
Afrikaans	afr_Latn
South Levantine Arabic	ajp_Arab
Akan	aka_Latn
Amharic	amh_Ethi
North Levantine Arabic	apc_Arab
Modern Standard Arabic	arb_Arab
Modern Standard Arabic (Romanized)	arb_Latn
Najdi Arabic	ars_Arab
Moroccan Arabic	ary_Arab
Egyptian Arabic	arz_Arab
Assamese	asm_Beng
Asturian	ast_Latn
Awadhi	awa_Deva
Central Aymara	ayr_Latn
South Azerbaijani	azb_Arab
North Azerbaijani	azj_Latn
Bashkir	bak_Cyrl
Bambara	bam_Latn
Balinese	ban_Latn
Belarusian	bel_Cyrl
Bemba	bem_Latn
Bengali	ben_Beng
Bhojpuri	bho_Deva
Banjar (Arabic script)	bjn_Arab
Banjar (Latin script)	bjn_Latn
Standard Tibetan	bod_Tibt
Bosnian	bos_Latn
Buginese	bug_Latn
Bulgarian	bul_Cyrl
Catalan	cat_Latn
Cebuano	ceb_Latn
Czech	ces_Latn
Chokwe	cjk_Latn
Central Kurdish	ckb_Arab
Crimean Tatar	crh_Latn
Welsh	cym_Latn
Danish	dan_Latn
German	deu_Latn
Southwestern Dinka	dik_Latn
Dyula	dyu_Latn
Dzongkha	dzo_Tibt
Greek	ell_Grek
English	eng_Latn
Esperanto	epo_Latn
Estonian	est_Latn
Basque	eus_Latn
Ewe	ewe_Latn
Faroese	fao_Latn
Fijian	fij_Latn
Finnish	fin_Latn
Fon	fon_Latn
French	fra_Latn
Friulian	fur_Latn
Nigerian Fulfulde	fuv_Latn
Scottish Gaelic	gla_Latn
Irish	gle_Latn
Galician	glg_Latn
Guarani	grn_Latn
Gujarati	guj_Gujr
Haitian Creole	hat_Latn
Hausa	hau_Latn
Hebrew	heb_Hebr
Hindi	hin_Deva
Chhattisgarhi	hne_Deva
Croatian	hrv_Latn
Hungarian	hun_Latn
Armenian	hye_Armn
Igbo	ibo_Latn
Ilocano	ilo_Latn
Indonesian	ind_Latn
Icelandic	isl_Latn
Italian	ita_Latn
Javanese	jav_Latn
Japanese	jpn_Jpan
Kabyle	kab_Latn
Jingpho	kac_Latn
Kamba	kam_Latn
Kannada	kan_Knda
Kashmiri (Arabic script)	kas_Arab
Kashmiri (Devanagari script)	kas_Deva
Georgian	kat_Geor
Central Kanuri (Arabic script)	knc_Arab
Central Kanuri (Latin script)	knc_Latn
Kazakh	kaz_Cyrl
Kabiyè	kbp_Latn
Kabuverdianu	kea_Latn
Khmer	khm_Khmr
Kikuyu	kik_Latn
Kinyarwanda	kin_Latn
Kyrgyz	kir_Cyrl
Kimbundu	kmb_Latn
Northern Kurdish	kmr_Latn
Kikongo	kon_Latn
Korean	kor_Hang
Lao	lao_Laoo
Ligurian	lij_Latn
Limburgish	lim_Latn
Lingala	lin_Latn
Lithuanian	lit_Latn
Lombard	lmo_Latn
Latgalian	ltg_Latn
Luxembourgish	ltz_Latn
Luba-Kasai	lua_Latn
Ganda	lug_Latn
Luo	luo_Latn
Mizo	lus_Latn
Standard Latvian	lvs_Latn
Magahi	mag_Deva
Maithili	mai_Deva
Malayalam	mal_Mlym
Marathi	mar_Deva
Minangkabau (Arabic script)	min_Arab
Minangkabau (Latin script)	min_Latn
Macedonian	mkd_Cyrl
Plateau Malagasy	plt_Latn
Maltese	mlt_Latn
Meitei (Bengali script)	mni_Beng
Halh Mongolian	khk_Cyrl
Mossi	mos_Latn
Maori	mri_Latn
Burmese	mya_Mymr
Dutch	nld_Latn
Norwegian Nynorsk	nno_Latn
Norwegian Bokmål	nob_Latn
Nepali	npi_Deva
Northern Sotho	nso_Latn
Nuer	nus_Latn
Nyanja	nya_Latn
Occitan	oci_Latn
West Central Oromo	gaz_Latn
Odia	ory_Orya
Pangasinan	pag_Latn
Eastern Panjabi	pan_Guru
Papiamento	pap_Latn
Western Persian	pes_Arab
Polish	pol_Latn
Portuguese	por_Latn
Dari	prs_Arab
Southern Pashto	pbt_Arab
Ayacucho Quechua	quy_Latn
Romanian	ron_Latn
Rundi	run_Latn
Russian	rus_Cyrl
Sango	sag_Latn
Sanskrit	san_Deva
Santali	sat_Olck
Sicilian	scn_Latn
Shan	shn_Mymr
Sinhala	sin_Sinh
Slovak	slk_Latn
Slovenian	slv_Latn
Samoan	smo_Latn
Shona	sna_Latn
Sindhi	snd_Arab
Somali	som_Latn
Southern Sotho	sot_Latn
Spanish	spa_Latn
Tosk Albanian	als_Latn
Sardinian	srd_Latn
Serbian	srp_Cyrl
Swati	ssw_Latn
Sundanese	sun_Latn
Swedish	swe_Latn
Swahili	swh_Latn
Silesian	szl_Latn
Tamil	tam_Taml
Tatar	tat_Cyrl
Telugu	tel_Telu
Tajik	tgk_Cyrl
Tagalog	tgl_Latn
Thai	tha_Thai
Tigrinya	tir_Ethi
Tamasheq (Latin script)	taq_Latn
Tamasheq (Tifinagh script)	taq_Tfng
Tok Pisin	tpi_Latn
Tswana	tsn_Latn
Tsonga	tso_Latn
Turkmen	tuk_Latn
Tumbuka	tum_Latn
Turkish	tur_Latn
Twi	twi_Latn
Central Atlas Tamazight	tzm_Tfng
Uyghur	uig_Arab
Ukrainian	ukr_Cyrl
Umbundu	umb_Latn
Urdu	urd_Arab
Northern Uzbek	uzn_Latn
Venetian	vec_Latn
Vietnamese	vie_Latn
Waray	war_Latn
Wolof	wol_Latn
Xhosa	xho_Latn
Eastern Yiddish	ydd_Hebr
Yoruba	yor_Latn
Yue Chinese	yue_Hant
Chinese (Simplified)	zho_Hans
Chinese (Traditional)	zho_Hant
Standard Malay	zsm_Latn
Zulu	zul_Latn

curl 'https://winstxnhdw-nllb-api.hf.space/api/v3/translate?text=Hello&source=eng_Latn&target=spa_Latn'

You can also determine the source language by querying the following API.

curl 'https://winstxnhdw-nllb-api.hf.space/api/v3/detect_language?text=Hello'

Self-Hosting

You can self-host the API and access the Swagger UI at localhost:7860/api/docs with the following minimal configuration

Note

The internal server is running at port 5000. If you wish to switch the APP_PORT to 5000, you will need to set the SERVER_PORT environment variable to a different port.

docker run --rm \
  -e APP_PORT=7860 \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

Optimisation

You can pass the following environment variables to optimise the API for your own uses. The value of OMP_NUM_THREADS increases the number of threads used to translate a given batch of inputs, while WORKER_COUNT increases the number of workers used to handle requests in parallel.

Important

OMP_NUM_THREADS $\times$ WORKER_COUNT should not exceed the physical number of cores on your machine.

docker run --rm \
  -e APP_PORT=7860 \
  -e OMP_NUM_THREADS=6 \
  -e WORKER_COUNT=1 \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

CUDA Support

You can accelerate your inference with CUDA by building and using Dockerfile.cuda-build instead.

docker build -f Dockerfile.cuda-build -t nllb-api .

After building the image, you can run the image with the following.

docker run --rm --gpus all \
  -e APP_PORT=7860 \
  -p 7860:7860 \
  nllb-api

Development

First, install the required dependencies for your editor with the following.

poetry install

Now, you can access the Swagger UI at localhost:7860/api/docs after spinning the server up locally with the following.

docker build -f Dockerfile.build -t nllb-api .
docker run --rm -e APP_PORT=7860 -p 7860:7860 nllb-api

nllb-api's People

Contributors

Stargazers

Watchers

Forkers

sebastianschramm heng-xiu wizd tostsolutions hotsource59 hamkido sirfuwh manniru khaleelhabeeb yvankob mvandermeulen

nllb-api's Issues

I can translate from English to Japanese, but I can't translate from Japanese to English.

I can translate from English to Japanese, but I can't translate from Japanese to English.
Is there any solution?

Language Detection

Is it possible to make a request to determine the language of my sended text?

Some text is missed

Hi there,

When running this json through swagger:

{
  "text": "Hi George. Roy, got to talk to you.",
  "source": "eng_Latn",
  "target": "dan_Latn"
}

I get this result: Roy, jeg skal tale med dig.
The Hi George part is missing.

Screenshot:

Any ideas?

I get an error on docker desktop

docker run --rm
-e APP_PORT=7860
-p 7860:7860
ghcr.io/winstxnhdw/nllb-api:main

This content can be translated and executed without any problems.

Download the source code from github,
Based on nllb-api-main,
docker build -t nllb-api .
and
docker run --name my_nllb -e APP_PORT=7860 -p 7860:7860 nllb-api
When I run it, I get an error.

It's quite a long error, but
ValueError: The CPU does not support AVX512
And so on.

Is there anything you can think of?
It is a WINDOWS environment.

I'm having a problem using CUDA

I have the latest file
docker build -f Dockerfile.cuda-build -t nllb-api .
I ran it, but an error occurs during installation.
I was able to install it without any problems before, but now I can't use it.

RuntimeError
58.68
58.68 Unable to find installation candidates for ctranslate2 (4.2.0)
58.68
58.68 at /opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/chooser.py:74 in choose_for
58.72 70│
58.72 71│ links.append(link)
58.72 72│
58.73 73│ if not links:
58.73 → 74│ raise RuntimeError(f"Unable to find installation candidates for {package}")
58.73 75│
58.74 76│ # Get the best link
58.74 77│ chosen = max(links, key=lambda link: self._sort_key(package, link))
58.74 78│
58.75
58.75 Cannot install ctranslate2.

I felt like I could only access the API server once.
It feels like it is waiting for the next access until the translation process is finished.

I am testing using GTX1060, but it only works at about 30% of its performance.
Is there a better way to resolve the slow translation speed?

The readme.md may be a bit confusing

Nice work！ This is the best and fastest repo I've tried so far. I'm very grateful, even though it took me a while to figure it out, because the README.md file maybe a bit confusing. The section https://github.com/winstxnhdw/nllb-api#model-caching mentions the need to create a custom cache directory, which can be confusing, as the various Hugging Face download methods may result in different directory structures.
In reality, don't need to create your own cache directory. You can simply point to the default Hugging Face cache directory (e.g., /root/.cache for the root user) instead.
btw： I want to use this model and API, and pair it with a browser extension. Where can I find a plugin like that to modify and adapt to this repo's API? thx！

start container offline

Hello,
I can't start the docker container successfully when I try to use the system completely offline. I don't get any error, but the swagger page is not reachable. As soon as I do an activation on huggingface.co, it works. What is the reason for this and can I work around it?

Maximum content length is limited

I've noticed that when I try to translate a long story, the output gets truncated. How can I increase the length of the translated text?

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. on WSL2

Freshly installed docker with wsl2 engine also set wsl integration on but still same.

Add docker image with model

Nice work ❤️

Would it be possible to add the model to the docker image?
I've already downloaded the 1.4 GB model three times today...

change model

Hi,
I am not satisfied with the translation from English to German. For some reason, individual sentences get lost. Is it possible to use your API with a different model?
Specifically, it would be the following model: https://huggingface.co/Helsinki-NLP/opus-mt-en-de

Translation truncated below 1024 tokens

I am using the Self hosting option from your most recent version for the docker image with following config:

    image: 'ghcr.io/winstxnhdw/nllb-api:main'
    container_name: 'nllb'
    ports:
      - '7860:7860'
    environment:
      - APP_PORT=7860 
      - OMP_NUM_THREADS=4
      - WORKER_COUNT=2
      - CT2_USE_EXPERIMENTAL_PACKED_GEMM=1
      - CT2_FORCE_CPU_ISA=AVX2

Ive tested it and it works as expected but only generates a full response for inputs around 300 tokens or something near that.
As far as Ive read, it should be able to handle text inputs up to 1024 tokens but the translated text is always incomplete, missing out on information or stops mid sentence.

I have seen in the other issues that setting the MAX_INPUT_LENGTH env variable should help but should only be the case for more than 1024 tokens. Ive still tried it but with no difference in result. What am I missing?

Rate limits

Hi,

I don't know what your intentions are with the new change introducing rate limits at cb0e401c17a372a9422dbb56d5dcdb0d522a7f71

But it broke all of my translate workflows.. Now my scripts get rate limited and quit.

Is it possible that this could be an environment variable or optional feature?

Cheers!

nllb-api support GPU

Is it possible to support GPU? I found out that CTranslate2 supports GPU. Does it support GPU to respond faster?

Library libcublas.so.12 is not found

Hi,
I recently tried to build this with GPU support. Building the docker image was no problem. When accessing the API there is an error

| Traceback (most recent call last):
    |   File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 260, in wrap
    |     await func()
    |   File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 249, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/usr/local/lib/python3.12/site-packages/starlette/concurrency.py", line 65, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, as_iterator)
    |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    |     return await future
    |            ^^^^^^^^^^^^
    |   File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    |     result = context.run(func, *args)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/usr/local/lib/python3.12/site-packages/starlette/concurrency.py", line 54, in _next
    |     return next(iterator)
    |            ^^^^^^^^^^^^^^
    |   File "/home/user/app/server/features/translator.py", line 62, in <genexpr>
    |     return (
    |            ^
    |   File "/usr/local/lib/python3.12/site-packages/ctranslate2/extensions.py", line 82, in translator_translate_iterable
    |     yield from _process_iterable(
    |   File "/usr/local/lib/python3.12/site-packages/ctranslate2/extensions.py", line 554, in _process_iterable
    |     yield queue.popleft().result()
    |           ^^^^^^^^^^^^^^^^^^^^^^^^
    | RuntimeError: Library libcublas.so.12 is not found or cannot be loaded

I traced it down to a missing Cuda 12 library. The image explicitly uses Cuda 11.8. I tried updating the image to Cuda 12 but that results in other errors.

How do I make it work?

This is my nvidia-smi output. The Nvidia docker toolkit is also installed.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:3B:00.0 Off |                  N/A |
| 30%   29C    P5              37W / 320W |    674MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1288      G   /usr/lib/xorg/Xorg                          324MiB |
|    0   N/A  N/A      1761      G   /usr/bin/kwin_x11                            77MiB |
|    0   N/A  N/A      1820      G   /usr/bin/plasmashell                         94MiB |
|    0   N/A  N/A      1868      G   ...irefox/3836/usr/lib/firefox/firefox      149MiB |
+---------------------------------------------------------------------------------------+

Some issues with the self-hosting option

I experience the following issue when I try to run main tagged image locally.

docker run --rm \
  -e APP_PORT=7860 \
  -p 7860:7860 \
  -v ./cache:/home/user/.cache \
  ghcr.io/winstxnhdw/nllb-api:main
...
[2024-06-03 10:18:30 +0000] [37] [INFO] Waiting for application startup.
Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]
2024-06-03 10:18:37,897 INFO success: server entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2024-06-03 10:18:37,898 INFO success: caddy entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

It seems like server starts before whatever files the other process is trying to fetch become available.

Then when I try to open Swagger server throws this error

{"level":"info","ts":1717410704.252975,"logger":"http.handlers.cache","msg":"Internal server error on endpoint /docs: [0xc0004b20a8]"}
{"level":"error","ts":1717410704.2530937,"logger":"http.log.error","msg":"context deadline exceeded","request": ...}

Several days ago I was able to start the container and open Swagger, but any request with more than ~100 tokens would result in a similar error context deadline exceeded.

It would be great if you could resolve these issues and push a stable, versioned image.
Many thanks in advance!

how to use facebook/nllb-200-distilled-1.3B model

how to use facebook/nllb-200-distilled-1.3B model，i want to change model for use it