opennmt / ctranslate2 Goto Github PK

Fast inference engine for Transformer models

Home Page: https://opennmt.net/CTranslate2

License: MIT License

CMake 0.84% C++ 80.67% Shell 0.41% C 0.05% Python 12.22% Cuda 5.49% Dockerfile 0.31%

neural-machine-translation cpp mkl quantization cuda thrust opennmt deep-neural-networks openmp onednn

ctranslate2's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

Speed and memory optimizations for high-performance GPU training.
Simple general-purpose interface, only requires and source/target data files.
C++ implementation of the translator for easy deployment.
Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

Install Torch
Install additional packages:

luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

Preprocess the data.

th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Train the model.

th train.lua -data data/demo-train.t7 -save_model model

Translate sentences.

th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

ctranslate2's People

Contributors

Stargazers

Watchers

Forkers

guillaumekln jhnwnd gitter-badger parkchanjun fangtanchen panosk maciejmacko oibenben alvations deep-learning-trader nilboy clementchouteau aj7tesh apurvnagvenkar chengduozh eduamf shixing dengxuedx markoarezinaimrsv hx4572 whatyouknow123 jimbeck jackzhou121 dagamies kobeche celestialized keichi zealfory cocaer suzhoushr pavellitvinko francoishernandez kundjanasith alrudak laputa1031 copperdong ishine kobikun scotfang sarwar3328 vigneshmj1997 haitong blgnksy funboarder13920 shuai-gao muditchandaliya97 brightxiaohan nataliaeliza rogervaas metopedia bigdatasciencegroup ojallc999 andrewlesson yywing bjargud nettnikl 00mjk infotun2 melody-rain shamilcm raphaelmerx lcsouzamenezes cgbahk btt46 jiwonseo1212 landert neurotech-hq strogo mfkiwl sandeepchittilla lost-libra a2va linxiao-zeng-zm zhouwenhui24 bhagyesh-chokhawala nickchomey fillipedem kmc07 fightseed yas atharvou23 keimaruo kengnemabou anylee2021 whistlegum abeusher yxl20190811 jordimas vcip2015 pykeio hennerm williamtambellini janekb04 zengyijie ailxcds palladium123 donstang trholding vinayreddy100 sagewhocodes

ctranslate2's Issues

The example of converting opennmt-py model does not work.

The script in ( QuickState -> 2. Convert a model) fails.

pip install OpenNMT-py

wget https://s3.amazonaws.com/opennmt-models/transformer-ende-wmt-pyOnmt.tar.gz
tar xf transformer-ende-wmt-pyOnmt.tar.gz

ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --model_spec TransformerBase \
    --output_dir ende_ctranslate2

Traceback (most recent call last):
  File "/mnt/f/python-venv/onmt/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main      converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/converter.py", line 40, in convert_from_args
    force=args.force)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/converter.py", line 52, in convert       src_vocab, tgt_vocab = self._load(model_spec)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/opennmt_py.py", line 22, in _load        checkpoint = torch.load(self._model_path, map_location="cpu")
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torch/serialization.py", line 702, in _legacy_load
    result = unpickler.load()
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

How to install Ctranslate2 without Docker

Hi,
I'd like to install the Ctranslate2 module without using a Docker. Is it possible?
Are there any scripts for this? I've tried generating a shell script from the dockerfile but it gives me some errors.
Thanks

Statically link to Intel MKL

We currently generate a custom shared library for Intel MKL. Instead, we should consider statically link against it.

Pros:

no need to move the custom library around
allow linking to the GNU OpenMP library gomp instead of iomp5

Cons:

larger binary size (some symbols could be duplicated in libctranslate2.so and libmkldnn.so if the later also statically links against MKL)

Save a data type identifier in converted models

When loading a model variable, the code currently deduces the data type from the size in bytes of one item (it typically does if itemsize == 4 then float32). This is a weak test. We should instead save an identifier that unambiguously define a data type.

Current fields:

item_size
data_size

Suggested fields:

dtype_id
nbytes

Dynamic loading of NVIDIA libraries

We should investigate the dynamic loading of NVIDIA libraries. This would be helpful to publish a ctranslate2 Python package that is compatible with both CPU and GPU while allowing execution on a CPU-only system.

If that proves to be too complex, we might need to publish a separate package for GPU support.

Are there any plan on TransformerAAN?

Hi,

I was trying to use TransformerAAN to train a translation model. But I found that CTranslate2 does not support TransformerAAN for now.
Any plan on this kind of architecture?
Many thanks.

Regards

pip install ctranslate2: no package found on macOS

Hi,

Running pip install ctranslate2 with the latest pip as per the installation instructions results in the following:

ERROR: Could not find a version that satisfies the requirement ctranslate2== (from versions: none)
ERROR: No matching distribution found for ctranslate2==

> pip --version
pip 20.0.2 from [...]/lib/python3.8/site-packages/pip (python 3.8)

> conda --version
conda 4.7.12

This is on macOS Mojave 10.14.6 (18G2022)

python3 docker image?

All docker images in docker hub are python2 environemnt. What should i do if i want to build a docker image including python3 environment ?

Use pybind11 for the Python wrapper

In 1976f45, we already tried to move to pybind11 but there were some compatibility issues with other pybind extensions that use a different toolchain.

There seems to be improvement in pybind11 2.4 on this issue:

https://pybind11.readthedocs.io/en/master/changelog.html#v2-4-0-sep-19-2019

suggestion to add function for changing models without deleting/creating new translators

Hi @guillaumekln ,

As far as I can see, if we create an instance of Translator, we can't change the model without destroying the object and creating a new one, as the model can only be defined in the constructor --unless I missed sth. Wouldn't it make sense to have a function to change the current model? Even if deleteing and making new translators is trivial, IMO it would improve the already excellent interface. If this makes sense, I could work on that soon, when I have some time.

Support ONNX graphs

This is a general issue to discuss and track ONNX support.

The current limitation of the project is that only weights are extracted from pretrained models and the computation graph is redefined in the code itself. This could be mitigated by loading and executing ONNX graphs.

Better catch of CUDA OOMs

Hi,
While using the python ctranslate2.Translator API, it seems that an OOM can cause the whole python session to crash.

>>> import ctranslate2
>>> translator = ctranslate2.Translator("ende_ctranslate2/")
>>> translator.translate_batch([["a"]*20000]) # very long dummy batch to force OOM for reproducibility
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to allocate memory
Aborted (core dumped)

Would it be possible to better catch such exceptions so that we can handle them python side?
Thanks!

segmentation fault at the finish of a translation when using TensorRT v6.0.1

Same system configuration with TensorRT v5.1.5 does not have this issue.
I am using Ubuntu 18.04, and other than these two things, am using the same configuration as in the Centos7-gpu Docker file.

Note there are warnings of deprecated nvinfer function use when building.

gdb output:

[Switching to Thread 0x7fffc68d3700 (LWP 3773)]
0x00007fffe339d604 in nvinfer1::rt::SafeExecutionContext::~SafeExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
(gdb) bt
#0  0x00007fffe339d604 in nvinfer1::rt::SafeExecutionContext::~SafeExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
#1  0x00007fffe31b5449 in nvinfer1::rt::ExecutionContext::~ExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
#2  0x00007ffff79093e8 in ctranslate2::cuda::TensorRTLayer::clear (this=0x7fffc68d3438) at /home/ubuntu/CTranslate2/src/cuda/utils.cc:189
#3  0x00007ffff790923c in ctranslate2::cuda::TensorRTLayer::~TensorRTLayer (this=0x7fffc68d3438, __in_chrg=<optimized out>) at /home/ubuntu/CTranslate2/src/cuda/utils.cc:165
#4  0x00007ffff79bf604 in ctranslate2::ops::TopKLayer::~TopKLayer (this=0x7fffc68d3438, __in_chrg=<optimized out>) at /home/ubuntu/CTranslate2/src/ops/topk_gpu.cu:8
#5  0x00007ffff6b1c8af in __GI___call_tls_dtors () at cxa_thread_atexit_impl.c:155
#6  0x00007ffff74726e9 in start_thread (arg=0x7fffc68d3700) at pthread_create.c:470
#7  0x00007ffff6bfa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Release Python package on PyPI

This would make the installation easier for users but this could make the packaging more complex, especially for GPU support.

This issue is to track progress on this front.

Much slower with CUDA 10.1 than with 10.0

I haven't tested this extensively but a small test seems to indicate slower times when using CUDA 10.1 (Update 2 - i.e., latest) vs CUDA 10.0 (as is used in the Docker file). It's around 1.5 times slower. Have you tried using CUDA 10.1 and have you seen similar results?

CTranslate2 layer_norm_gpu.cu:32: cuDNN failed with status CUDNN_STATUS_BAD_PARAM

if sp=='in' or sp=='out' or sp=='inout':
    s = spm.SentencePieceProcessor()
    s.Load(modelPath + 'all.en.shuffled.filtered.spiece.model')
@app.route('/translate', methods=['Post'])
def trans():
    try:
        line = request.values.get('src')
        
        if sp=='in' or sp=='inout':
            sentence = s.EncodeAsPieces(line)
        else:
            sentence = list(line)

        results = translator.translate_batch([sentence], beam_size=1, max_decoding_length=250, num_hypotheses=1, length_penalty=0, min_decoding_length=1, use_vmap=False, return_attention=False)

        itemResult = ''

        for itemStr in results:
            item = itemStr[0]['tokens']

            if sp=='out' or sp=='inout':
                itemResult = s.DecodePieces(item)
            else:
                itemResult = str(''.join(item))

            # print(result)

        resultHtml = json.dumps([{"tgt": itemResult}], ensure_ascii=False)
    except Exception as e:
        resultHtml = json.dumps(({"error": 1, "message": str(e)}), ensure_ascii=False)

    return resultHtml, 200

server = WSGIServer((args.ip, args.port), app)
print('Server ready!')

server.serve_forever()

When I make a lot of requests, it's a mistake

terminate called after throwing an instance of 'std::runtime_error'
what(): /root/ctranslate2-dev/src/ops/layer_norm_gpu.cu:32: cuDNN failed with status CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

https://github.com/OpenNMT/CTranslate2/blob/master/src/ops/layer_norm_gpu.cu

FP16 support

We should support FP16 execution on compatible GPU.

Use int64_t for dimension values

Dimensions are currently represented with size_t. There are at least 2 issues with that:

platform-dependent size
negative values are sometimes useful:
- for loops converging to 0
- -1 support in reshape (to not explicitly set a value for a dimension)

Support execution without Intel MKL

Intel MKL is currently required to use the project on CPU. However, it is not always a good fit especially on non Intel hardware. It is likely that MKL checks the CPU vendor ID before activating some fast execution paths.

See for example this performance analysis on AMD Epyc where Intel MKL has poor results.

1. Integrate an alternative GEMM

The main requirements are:

multi-threading support (ideally with OpenMP)
runtime dispatch to architecture-specific code (ideally including AMD and ARM)
bonus: integer-based GEMM

BLIS appears to be a good candidate.

2. Dynamically select a GEMM backend

We should consider compiling with multiple backend and select one at runtime (e.g. on GenuineIntel call Intel MKL, otherwise call BLIS).

3. (optional) Integrate an alternative caching allocator

We also rely on MKL to provide a caching allocator via mkl_malloc and mkl_free. We should measure the performance cost of disabling those and possibly find alternatives.

Improve performance of target prefix in batch mode

Hi @guillaumekln ,

Why did you make the assumption "interactive mode" in this discussion OpenNMT/OpenNMT-py#1800 ?

It would really be helpful to use this feature in batch mode.

Read model and vocabs from memory

Hi @guillaumekln

I would like to load the model file from memory (in a std::vector<unsigned char>) but I think it's not possible as all related methods use at some point the model directory as an std::string. I can see the necessity in this, as the vocabularies and the vmap are also loaded from this directory.

Still, do you think there could be a use case (apart from mine obviously :)) for some overrides with arguments that will accept std::strings pointing directly to the model and the vocabularies?

compilation error with MKL-DNN

Hi, I try to compile with MKL-DNN, but the following error occurs：
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc: In static member function 'static void ctranslate2::primitives::gemm(const In, const In, bool, bool, size_t, size_t, size_t, float, float, Out) [with In = signed char; Out = int; ctranslate2::Device D = (ctranslate2::Device)0; size_t = long unsigned int]':
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char' to 'char' [-fpermissive]
c, &ldc, &co),
^
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char' to 'char' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char' to 'char' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: cannot convert 'float' to 'float' for argument '7' to 'dnnl_status_t dnnl_gemm_s8s8s32(char, char, char, dnnl_dim_t, dnnl_dim_t, dnnl_dim_t, float, const int8_t, dnnl_dim_t, int8_t, const int8_t, dnnl_dim_t, int8_t, float, int32_t, dnnl_dim_t, const int32_t)'
CMakeFiles/ctranslate2.dir/build.make:758: recipe for target 'CMakeFiles/ctranslate2.dir/src/primitives/cpu.cc.o' failed
make[2]: * [CMakeFiles/ctranslate2.dir/src/primitives/cpu.cc.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/ctranslate2.dir/all' failed
make[1]: * [CMakeFiles/ctranslate2.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

The related infos. are :
MKL > 2019.5
MKL-DNN: 1.1.1

ARM support

It would be nice to provide an efficient execution on ARM. This architecture is widespread on mobile devices and will be used for future Apple Mac CPUs. AWS also provides instances based on ARM.

To do:

Figure out what is required to cross-compile to ARM.
Look into the ARM Compute Library which has GEMM primitives optimized for ARM NEON.
Add ARM NEON vectorization for CPU kernels, and update automatic ISA dispatch accordingly.

cuda memory leak with python api?

I tranied a model which size is about 460M. The cuda memory was allocated about 669M to this model when it was loaded into python environment :
import ctranslate2
translator = ctranslate2.Translator("/data/ende_ctranslate2/", device="cuda")
My first quesiton is why the loaded model occupyied much more memory than the model size?

When i tried to translate the frist one batch of sentence:
translator.translate_batch([["▁H", "ello", "▁world", "!"]])
The cuda memory occpuied by this model gradually increaed but suddenly reached about 2600M and quickly falled to 800M finally. I quiet want know what happend during this peorid as this behavior always lead to my other programs running on the smme gpu cuda out memory error .

Besides, when i translate some longer sentences, the memory occpuied by this model will always increase and never decrease to the previous size. This is quiet abnormal and i wonder whether these phenomens are resulted by memory leak? Thanks.

compiled client doesn't work as expected in Windows

So I managed to compile everything with MSVC but I can't figure out why the client doesn't translate as expected. With short sentences containing only a few words (~10), it seems to be working fine. With longer sentences, I get very short, truncated, and irrelevant translations or just a single irrelevant word. Under OS X, it works wonderfully, no matter the length of the sentence. In both systems I'm using the same converted tf model and the same sentencepiece model.
The only weird thing I can notice is that the special underscore character from sentencepiece in shared_vocabulary.txt has encoding issues under Windows and appears as an empty box.

module 'ctranslate2' has no attribute 'Translator'

Command line installation results in:

module 'ctranslate2' has no attribute 'Translator'

Implement GPU TopK without TensorRT

We should look into implementing the TopK layer with a custom CUDA kernel instead of using TensorRT. The motivation is to remove the TensorRT and cuDNN dependencies (cuDNN is a dependency of TensorRT).

The benefits are:

make it easier to build Python wheels with GPU support (cuBLAS would be the only external NVIDIA dependency);
reduce the total installation size.

Plans to support model trained in fairseq

Can you please support a model trained in fairseq, else since it is torch can it be imported to infer and quantized.

Also the model sizes are of transformer_big? Since if it is transformer _base it would be around half of the score.
Please consider distilling the model into smaller model that would help for inference and size.

Moving model/translator object between devices

I've started making adaptations to the OpenNMT-py rest server to allow the use of CTranslate2 models.
I'm thinking of some wrapping object in onmt.translate.translation_server, that would provide a similar API to onmt.translate.translator:

class CTranslate2Translator(object):
    """
    This should reproduce the onmt.translate.translator API.
    """

    def __init__(self, model_path, device, device_index, beam_size, n_best):
        import ctranslate2
        self.translator = ctranslate2.Translator(
            model_path,
            device=device,
            device_index=device_index,
            inter_threads=1,
            intra_threads=1,
            compute_type="default")
        self.beam_size = beam_size
        self.n_best = n_best

    def translate(self, texts_to_translate, batch_size=8):
        batch = [item.split(" ") for item in texts_to_translate]
        print(batch)
        preds = self.translator.translate_batch(
            batch,
            beam_size=self.beam_size,
            num_hypotheses=self.n_best
        )
        scores = [[item["score"] for item in ex] for ex in preds]
        predictions = [[" ".join(item["tokens"]) for item in ex] for ex in preds]
        return scores, predictions

This works fine for the translation API part.
Only remaining issue is that there is some logic in the server that requires models to move back and forth from/to CPU/cuda (to_cpu // to_gpu methods that call some .to(device) on the model).
Is this something we could easily add in the ctranslate2.Translator API?

About coverage penalty optional

I'm curious that why the translation not have coverage penalty optional in ctranslate2.

Invalid resource handle when deleting ctranslate2.Translator

Hi @guillaumekln
There seems to be an issue when deleting a model from a device other than the 0th one.

import ctranslate2
translator = ctranslate2.Translator(
    "enes_general_medium_ctranslate2",
    device="cuda",
    device_index=0)
del translator

--> OK

import ctranslate2
translator = ctranslate2.Translator(
    "enes_general_medium_ctranslate2",
    device="cuda",
    device_index=1)
del translator

--> ERROR

terminate called after throwing an instance of 'std::runtime_error'
  what():  /root/ctranslate2-dev/src/primitives/cuda.cu:72: CUDA failed with error invalid resource handle
Aborted (core dumped)

(Inference works fine though, it's only when deleting the object that it fails.)

EDIT: This also happens when using the cli entrypoint ctranslate2/bin/translate.

Support Transformer models with shared embeddings

The model converters should accept Transformer models sharing embeddings and/or softmax weights.

Conversion breaks in some shared parameters setups.

Hey @guillaumekln

If we take a shared embeddings setup between encoder and decoder for instance, some aliases are made here:

CTranslate2/python/ctranslate2/specs/model_spec.py

Lines 83 to 99 in 9379e2f

 def _alias_variables(self): 

 """Find duplicate variables in spec and create aliases.""" 

 # When a variable is duplicated, keep the version that comes first in 

 # the alphabetical order and alias the others. 

 variables = self.variables(ordered=True) 

 for name, value in reversed(variables): 

 for other_name, other_value in variables: 

 if name == other_name: 

 break 

 # Because variables can be transformed on load (e.g. transposed), 

 # we use an element-wise equality check. 

 if value.dtype == other_value.dtype and np.array_equal(value, other_value): 

 # Replace variable value by the alias name. 

 scope, attr_name = _parent_scope(name) 

 spec = index_spec(self, scope) 

 setattr(spec, attr_name, other_name) 

 break

which is called when .validate() is called.

Here, we .validate() before getting the vocabulary sizes:

CTranslate2/python/ctranslate2/converters/converter.py

Lines 59 to 61 in 9379e2f

 model_spec.validate() 

 self._check_vocabulary_size("source", src_vocab, model_spec.source_vocabulary_size) 

 self._check_vocabulary_size("target", tgt_vocab, model_spec.target_vocabulary_size)

But, these {source,target}_vocabulary_size property/methods do not handle aliases:

CTranslate2/python/ctranslate2/specs/transformer_spec.py

Lines 34 to 40 in 9379e2f

 @property 

 def source_vocabulary_size(self): 

 return self.encoder.embeddings.weight.shape[0] 

 @property 

 def target_vocabulary_size(self): 

 return self.decoder.embeddings.weight.shape[0]

--->

MODEL_SPEC AFTER VALIDATE {'weight': 'decoder/embeddings/weight', 'multiply_by_sqrt_depth': 'decoder/embeddings/multiply_by_sqrt_depth'}
Traceback (most recent call last):
  File "/home/moses/CTranslate2/env_onmt/bin/onmt_release_model", line 8, in <module>
    sys.exit(main())
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/onmt/bin/release_model.py", line 52, in main
    converter.convert(opt.output, model_spec, force=True)
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 74, in convert
    self._check_vocabulary_size("source", src_vocab, model_spec.source_vocabulary_size)
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/ctranslate2/specs/transformer_spec.py", line 32, in source_vocabulary_size
    return self.encoder.embeddings.weight.shape[0]
AttributeError: 'str' object has no attribute 'shape'

Am I missing something here?

pip install ctranslate2: It seems that the ctranslate2 doesn't contain intel_mkl lib

It seems that the ctranslate2 doesn't contain intel_mkl lib when I install ctranslate2 by pip. So it will be faster if the computer has intel_mkl, right?

The Demo in the ReadMe doesn't work.

When I run the demo of ReadMe, I got an error:

Traceback (most recent call last):
  File "/root/miniconda3/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main
    converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/converter.py", line 39, in convert_from_args
    force=args.force)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/converter.py", line 53, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/opennmt_py.py", line 22, in _load
    checkpoint = torch.load(self._model_path, map_location="cpu")
  File "/root/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
    result = unpickler.load()
  File "/root/miniconda3/lib/python3.7/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

The version of the torch is 1.4.0 and ctranslate2 is 1.5.1 on my development machine. And I add 'unk_index' not in state or in "/root/miniconda3/lib/python3.7/site-packages/torchtext/vocab.py:199", this test is passed.

Query int8 support on GPU once

Checking int8 support currently involves creating and destroying a TensorRT builder. This is expensive. To avoid this overhead in future calls, we could cache the result.

Approach: use std::call_once and store the result in a static variable.

Shared variables are duplicated in converted models

If a model is sharing some variables (e.g. embeddings), the current serialization will duplicate them in the converted model. It should be improved to only save one copy of the variable.

Support relative positions transformers

would be great.

compilation needs the <algorithm> header for std::max with MSVC

Hi @guillaumekln,

I was trying to compile under Visual Studio 2019 and I got an error that 'max': is not a member of 'std' in layer_norm_cpu.cc (line 30). Adding the <algorithm> header does the trick. After a bit of searching it seems this is because some Windows headers (WinDef.h) define their own macros for max and min.
Maybe it would be better to fix this in the CMakeLists.txt instead of adding the header just for Windows, so I tried adding a block

if(MSVC)
  add_definitions(-D_USE_MATH_DEFINES)
  add_definitions(-DNOMINMAX)
endif()

but it won't work --to be more specific, the error disappears but the build is not fully successful and no libraries are created.

The example of converting opennmt-tf model does not work.

The script in ( QuickState -> 2. Convert a model) fails.

$ ct2-opennmt-tf-converter --model_path averaged-ende-export500k-v2 --model_spec TransformerBase --output_dir ende_ctranslate2 --force

...

File ".local/lib/python3.6/site-packages/ctranslate2/bin/opennmt_tf_converter.py", line 19, in main
tgt_vocab=args.tgt_vocab).convert_from_args(args)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 40, in convert_from_args
force=args.force)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 52, in convert
src_vocab, tgt_vocab = self._load(model_spec)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 126, in _load
tgt_vocab=self._tgt_vocab)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 66, in load_model
src_vocab = _get_asset_path(imported.examples_inputter.features_inputter)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 51, in _get_asset_path
asset = getattr(lookup_table._initializer, "_filename", None)
AttributeError: '_RestoredResource' object has no attribute '_initializer'

error during model conversion

On OS X Catalina, now I get this error when I try to convert a model:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/bin/opennmt_tf_converter.py", line 23, in <module>
    main()
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/bin/opennmt_tf_converter.py", line 19, in main
    tgt_vocab=args.tgt_vocab).convert_from_args(args)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/converter.py", line 39, in convert_from_args
    force=args.force)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/converter.py", line 53, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/opennmt_tf.py", line 107, in _load
    tgt_vocab=self._tgt_vocab)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/opennmt_tf.py", line 57, in load_model
    src_vocab = _get_asset_path(imported.examples_inputter.features_inputter)
AttributeError: 'AutoTrackable' object has no attribute 'examples_inputter'

OpenNMT-tf 2.0 supported?

I trained a Transformer model using OpenNMT-tf 2.0. The converter ran well but the translation result became weird. Does CTranslate2 support OpenNMT-tf 2.0?
Here are versions:
OpenNMT-tf == 2.3.0
tensorflow-gpu == 2.0.0

CUDA implementation for Multinomial op

The Multinomial op currently falls back on the CPU. This issue tracks the future addition of a dedicated CUDA implementation in multinomial_gpu.cu.

Proper configuration for server

Hi,
I've been digging around for a while in code integration but it is not clear to me which argumets are necessary. I guess "model" and "ct2_model" are not required at the same time...
Thanks

Update all Docker images to Python 3

Now that Python 2 is EOL, we should update all Docker images to use Python 3 by default.

Placing a Translator on GPU N > 0 allocates memory on GPU 0

The code below will allocate some memory on GPU 0 even if the Translator is placed on another device:

import ctranslate2
translator = ctranslate2.Translator("ende_transformer", device="cuda", device_index=1)

Ideally, it should only allocate on GPU 1.

Limit work queue size when translating large files

The current TranslatorPool implementation is using a producer/consumer approach. The producer reads batches from the file and pushes them in a queue. Each consumer dequeues a batch and translates it.

As reading batches is commonly much faster than translating, batches quickly pile up in the work queue. This increases memory usage, especially when translating large files.

A basic fix is to limit the queue size. If the maximum size is reached, the producer should wait and be notified when a consumer dequeues a batch.

Link error/warning in OS X with --start-group and --end-group

The linker in OS X (LLVM 10) doesn't understand the --start-group and --end-group linking options. When building with the default Apple's toolset, removing these options allows building the project, although with a ton of warnings due to linking order and particularly related to boost::program_options. At least it builds and runs fine, as far as I have tested it.
If I change the compiler to gcc-9, it won't link at all.
I tried but I couldn't find a solution (maybe ordering the libraries manually?)

Improve int8 quantization performance on GPU

The current quantization code is based on thrust::reduce_by_key to get the absolute maximum of each row. However, this approach appears to be very slow in this context. It should be improved for better INT8 performance on GPU.

$ ./tests/benchmark_ops quantize cuda int8
benchmarking quantize_op(x, y, scale)
avg   0.186348 ms

$ ./tests/benchmark_ops quantize cpu int8
benchmarking quantize_op(x, y, scale)
avg   0.0024638 ms

OpenNMT-py model conversion failed because of KeyError

I'm trying to convert OpenNMT-py model to CTranslate2 format, but it fails because of KeyError. The model that I'm trying to convert is available here (it is named paracrawl.pt but it was renamed during uploading).

When I try to run conversion:

ct2-opennmt-py-converter --model_path paracrawl.pt --model_spec TransformerBase --output_dir paracrawl

It fails with KeyError:

Traceback (most recent call last):
  File "/usr/local/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main
    converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 35, in convert_from_args
    return self.convert(
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 52, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 27, in _load
    set_transformer_spec(model_spec, variables)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 39, in set_transformer_spec
    set_transformer_encoder(spec.encoder, variables, relative=spec.with_relative_position)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 43, in set_transformer_encoder
    set_input_layers(spec, variables, "encoder", relative=relative)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 59, in set_input_layers
    set_position_encodings(
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 136, in set_position_encodings
    spec.encodings = _get_variable(variables, "%s.pe" % scope).squeeze()
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 141, in _get_variable
    return variables[name].numpy()
KeyError: 'encoder.embeddings.make_embedding.pe.pe'

I'm using Python 3.8 on my custom python:buster Docker image with theese Python packages installed:

Package              Version
-------------------- ----------
absl-py              0.9.0
cachetools           4.0.0
certifi              2019.11.28
chardet              3.0.4
click                7.1.1
ConfigArgParse       1.0
ctranslate2          1.8.0
Flask                1.1.1
future               0.18.2
google-auth          1.11.3
google-auth-oauthlib 0.4.1
grpcio               1.27.2
idna                 2.9
itsdangerous         1.1.0
Jinja2               2.11.1
Markdown             3.2.1
MarkupSafe           1.1.1
numpy                1.18.1
oauthlib             3.1.0
OpenNMT-py           1.0.2
pip                  19.3.1
protobuf             3.11.3
pyasn1               0.4.8
pyasn1-modules       0.2.8
pyonmttok            1.18.3
requests             2.23.0
requests-oauthlib    1.3.0
rsa                  4.0
setuptools           41.6.0
six                  1.14.0
tensorboard          2.1.1
torch                1.4.0
torchtext            0.4.0
tqdm                 4.30.0
urllib3              1.25.8
waitress             1.4.3
Werkzeug             1.0.0
wheel                0.33.6

	def _alias_variables(self):
	"""Find duplicate variables in spec and create aliases."""
	# When a variable is duplicated, keep the version that comes first in
	# the alphabetical order and alias the others.
	variables = self.variables(ordered=True)
	for name, value in reversed(variables):
	for other_name, other_value in variables:
	if name == other_name:
	break
	# Because variables can be transformed on load (e.g. transposed),
	# we use an element-wise equality check.
	if value.dtype == other_value.dtype and np.array_equal(value, other_value):
	# Replace variable value by the alias name.
	scope, attr_name = _parent_scope(name)
	spec = index_spec(self, scope)
	setattr(spec, attr_name, other_name)
	break

	model_spec.validate()
	self._check_vocabulary_size("source", src_vocab, model_spec.source_vocabulary_size)
	self._check_vocabulary_size("target", tgt_vocab, model_spec.target_vocabulary_size)

	@property
	def source_vocabulary_size(self):
	return self.encoder.embeddings.weight.shape[0]

	@property
	def target_vocabulary_size(self):
	return self.decoder.embeddings.weight.shape[0]