GithubHelp home page GithubHelp logo

opennmt / ctranslate2 Goto Github PK

View Code? Open in Web Editor NEW
2.8K 55.0 248.0 13.65 MB

Fast inference engine for Transformer models

Home Page: https://opennmt.net/CTranslate2

License: MIT License

CMake 0.84% C++ 80.67% Shell 0.41% C 0.05% Python 12.22% Cuda 5.49% Dockerfile 0.31%
neural-machine-translation cpp mkl quantization cuda thrust opennmt deep-neural-networks openmp onednn

ctranslate2's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

Build Status codecov

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

  • Speed and memory optimizations for high-performance GPU training.
  • Simple general-purpose interface, only requires and source/target data files.
  • C++ implementation of the translator for easy deployment.
  • Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

  1. Install Torch
  2. Install additional packages:
luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

  1. Preprocess the data.
th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
  1. Train the model.
th train.lua -data data/demo-train.t7 -save_model model
  1. Translate sentences.
th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

ctranslate2's People

Contributors

amrrs avatar anterart avatar brightxiaohan avatar cdockes avatar chengduozh avatar chiiyeh avatar clementchouteau avatar dependabot[bot] avatar ebraraktas avatar flyingleafe avatar funboarder13920 avatar guillaumekln avatar homink avatar jgcb00 avatar jhnwnd avatar jordimas avatar keichi avatar michaelfeil avatar minhthuc2502 avatar natsegal avatar panosk avatar raphaelmerx avatar scotfang avatar sebastianbodza avatar shas3011 avatar vadi2 avatar vakkov avatar vince62s avatar yc-wang00 avatar zxdvd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctranslate2's Issues

The example of converting opennmt-py model does not work.

The script in ( QuickState -> 2. Convert a model) fails.

pip install OpenNMT-py

wget https://s3.amazonaws.com/opennmt-models/transformer-ende-wmt-pyOnmt.tar.gz
tar xf transformer-ende-wmt-pyOnmt.tar.gz

ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --model_spec TransformerBase \
    --output_dir ende_ctranslate2
Traceback (most recent call last):
  File "/mnt/f/python-venv/onmt/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main      converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/converter.py", line 40, in convert_from_args
    force=args.force)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/converter.py", line 52, in convert       src_vocab, tgt_vocab = self._load(model_spec)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/ctranslate2/converters/opennmt_py.py", line 22, in _load        checkpoint = torch.load(self._model_path, map_location="cpu")
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torch/serialization.py", line 702, in _legacy_load
    result = unpickler.load()
  File "/mnt/f/python-venv/onmt/lib/python3.5/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

How to install Ctranslate2 without Docker

Hi,
I'd like to install the Ctranslate2 module without using a Docker. Is it possible?
Are there any scripts for this? I've tried generating a shell script from the dockerfile but it gives me some errors.
Thanks

Statically link to Intel MKL

We currently generate a custom shared library for Intel MKL. Instead, we should consider statically link against it.

Pros:

  • no need to move the custom library around
  • allow linking to the GNU OpenMP library gomp instead of iomp5

Cons:

  • larger binary size (some symbols could be duplicated in libctranslate2.so and libmkldnn.so if the later also statically links against MKL)

Save a data type identifier in converted models

When loading a model variable, the code currently deduces the data type from the size in bytes of one item (it typically does if itemsize == 4 then float32). This is a weak test. We should instead save an identifier that unambiguously define a data type.

Current fields:

  • item_size
  • data_size

Suggested fields:

  • dtype_id
  • nbytes

Dynamic loading of NVIDIA libraries

We should investigate the dynamic loading of NVIDIA libraries. This would be helpful to publish a ctranslate2 Python package that is compatible with both CPU and GPU while allowing execution on a CPU-only system.

If that proves to be too complex, we might need to publish a separate package for GPU support.

Are there any plan on TransformerAAN?

Hi,

I was trying to use TransformerAAN to train a translation model. But I found that CTranslate2 does not support TransformerAAN for now.
Any plan on this kind of architecture?
Many thanks.

Regards

pip install ctranslate2: no package found on macOS

Hi,

Running pip install ctranslate2 with the latest pip as per the installation instructions results in the following:

ERROR: Could not find a version that satisfies the requirement ctranslate2== (from versions: none)
ERROR: No matching distribution found for ctranslate2==
> pip --version
pip 20.0.2 from [...]/lib/python3.8/site-packages/pip (python 3.8)
> conda --version
conda 4.7.12

This is on macOS Mojave 10.14.6 (18G2022)

python3 docker image?

All docker images in docker hub are python2 environemnt. What should i do if i want to build a docker image including python3 environment ?

suggestion to add function for changing models without deleting/creating new translators

Hi @guillaumekln ,

As far as I can see, if we create an instance of Translator, we can't change the model without destroying the object and creating a new one, as the model can only be defined in the constructor --unless I missed sth. Wouldn't it make sense to have a function to change the current model? Even if deleteing and making new translators is trivial, IMO it would improve the already excellent interface. If this makes sense, I could work on that soon, when I have some time.

Support ONNX graphs

This is a general issue to discuss and track ONNX support.

The current limitation of the project is that only weights are extracted from pretrained models and the computation graph is redefined in the code itself. This could be mitigated by loading and executing ONNX graphs.

Better catch of CUDA OOMs

Hi,
While using the python ctranslate2.Translator API, it seems that an OOM can cause the whole python session to crash.

>>> import ctranslate2
>>> translator = ctranslate2.Translator("ende_ctranslate2/")
>>> translator.translate_batch([["a"]*20000]) # very long dummy batch to force OOM for reproducibility
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to allocate memory
Aborted (core dumped)

Would it be possible to better catch such exceptions so that we can handle them python side?
Thanks!

segmentation fault at the finish of a translation when using TensorRT v6.0.1

Same system configuration with TensorRT v5.1.5 does not have this issue.
I am using Ubuntu 18.04, and other than these two things, am using the same configuration as in the Centos7-gpu Docker file.

Note there are warnings of deprecated nvinfer function use when building.

gdb output:

[Switching to Thread 0x7fffc68d3700 (LWP 3773)]
0x00007fffe339d604 in nvinfer1::rt::SafeExecutionContext::~SafeExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
(gdb) bt
#0  0x00007fffe339d604 in nvinfer1::rt::SafeExecutionContext::~SafeExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
#1  0x00007fffe31b5449 in nvinfer1::rt::ExecutionContext::~ExecutionContext() () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.6
#2  0x00007ffff79093e8 in ctranslate2::cuda::TensorRTLayer::clear (this=0x7fffc68d3438) at /home/ubuntu/CTranslate2/src/cuda/utils.cc:189
#3  0x00007ffff790923c in ctranslate2::cuda::TensorRTLayer::~TensorRTLayer (this=0x7fffc68d3438, __in_chrg=<optimized out>) at /home/ubuntu/CTranslate2/src/cuda/utils.cc:165
#4  0x00007ffff79bf604 in ctranslate2::ops::TopKLayer::~TopKLayer (this=0x7fffc68d3438, __in_chrg=<optimized out>) at /home/ubuntu/CTranslate2/src/ops/topk_gpu.cu:8
#5  0x00007ffff6b1c8af in __GI___call_tls_dtors () at cxa_thread_atexit_impl.c:155
#6  0x00007ffff74726e9 in start_thread (arg=0x7fffc68d3700) at pthread_create.c:470
#7  0x00007ffff6bfa88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Release Python package on PyPI

This would make the installation easier for users but this could make the packaging more complex, especially for GPU support.

This issue is to track progress on this front.

Much slower with CUDA 10.1 than with 10.0

I haven't tested this extensively but a small test seems to indicate slower times when using CUDA 10.1 (Update 2 - i.e., latest) vs CUDA 10.0 (as is used in the Docker file). It's around 1.5 times slower. Have you tried using CUDA 10.1 and have you seen similar results?

CTranslate2 layer_norm_gpu.cu:32: cuDNN failed with status CUDNN_STATUS_BAD_PARAM

if sp=='in' or sp=='out' or sp=='inout':
    s = spm.SentencePieceProcessor()
    s.Load(modelPath + 'all.en.shuffled.filtered.spiece.model')
@app.route('/translate', methods=['Post'])
def trans():
    try:
        line = request.values.get('src')
        
        if sp=='in' or sp=='inout':
            sentence = s.EncodeAsPieces(line)
        else:
            sentence = list(line)

        results = translator.translate_batch([sentence], beam_size=1, max_decoding_length=250, num_hypotheses=1, length_penalty=0, min_decoding_length=1, use_vmap=False, return_attention=False)

        itemResult = ''

        for itemStr in results:
            item = itemStr[0]['tokens']

            if sp=='out' or sp=='inout':
                itemResult = s.DecodePieces(item)
            else:
                itemResult = str(''.join(item))

            # print(result)

        resultHtml = json.dumps([{"tgt": itemResult}], ensure_ascii=False)
    except Exception as e:
        resultHtml = json.dumps(({"error": 1, "message": str(e)}), ensure_ascii=False)

    return resultHtml, 200

server = WSGIServer((args.ip, args.port), app)
print('Server ready!')

server.serve_forever()

When I make a lot of requests, it's a mistake

terminate called after throwing an instance of 'std::runtime_error'
what(): /root/ctranslate2-dev/src/ops/layer_norm_gpu.cu:32: cuDNN failed with status CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

https://github.com/OpenNMT/CTranslate2/blob/master/src/ops/layer_norm_gpu.cu
image

FP16 support

We should support FP16 execution on compatible GPU.

Use int64_t for dimension values

Dimensions are currently represented with size_t. There are at least 2 issues with that:

  • platform-dependent size
  • negative values are sometimes useful:
    • for loops converging to 0
    • -1 support in reshape (to not explicitly set a value for a dimension)

Support execution without Intel MKL

Intel MKL is currently required to use the project on CPU. However, it is not always a good fit especially on non Intel hardware. It is likely that MKL checks the CPU vendor ID before activating some fast execution paths.

See for example this performance analysis on AMD Epyc where Intel MKL has poor results.

1. Integrate an alternative GEMM

The main requirements are:

  • multi-threading support (ideally with OpenMP)
  • runtime dispatch to architecture-specific code (ideally including AMD and ARM)
  • bonus: integer-based GEMM

BLIS appears to be a good candidate.

2. Dynamically select a GEMM backend

We should consider compiling with multiple backend and select one at runtime (e.g. on GenuineIntel call Intel MKL, otherwise call BLIS).

3. (optional) Integrate an alternative caching allocator

We also rely on MKL to provide a caching allocator via mkl_malloc and mkl_free. We should measure the performance cost of disabling those and possibly find alternatives.

Read model and vocabs from memory

Hi @guillaumekln

I would like to load the model file from memory (in a std::vector<unsigned char>) but I think it's not possible as all related methods use at some point the model directory as an std::string. I can see the necessity in this, as the vocabularies and the vmap are also loaded from this directory.

Still, do you think there could be a use case (apart from mine obviously :)) for some overrides with arguments that will accept std::strings pointing directly to the model and the vocabularies?

compilation error with MKL-DNN

Hi, I try to compile with MKL-DNN, but the following error occurs:
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc: In static member function 'static void ctranslate2::primitives::gemm(const In*, const In*, bool, bool, size_t, size_t, size_t, float, float, Out*) [with In = signed char; Out = int; ctranslate2::Device D = (ctranslate2::Device)0; size_t = long unsigned int]':
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char*' to 'char' [-fpermissive]
c, &ldc, &co),
^
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char*' to 'char' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'const char*' to 'char' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int*' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int*' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: invalid conversion from 'int*' to 'dnnl_dim_t {aka long int}' [-fpermissive]
/data/work/c-translator/CTranslate2-1.2.0/src/primitives/cpu.cc:546:39: error: cannot convert 'float*' to 'float' for argument '7' to 'dnnl_status_t dnnl_gemm_s8s8s32(char, char, char, dnnl_dim_t, dnnl_dim_t, dnnl_dim_t, float, const int8_t*, dnnl_dim_t, int8_t, const int8_t*, dnnl_dim_t, int8_t, float, int32_t*, dnnl_dim_t, const int32_t*)'
CMakeFiles/ctranslate2.dir/build.make:758: recipe for target 'CMakeFiles/ctranslate2.dir/src/primitives/cpu.cc.o' failed
make[2]: *** [CMakeFiles/ctranslate2.dir/src/primitives/cpu.cc.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/ctranslate2.dir/all' failed
make[1]: *** [CMakeFiles/ctranslate2.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

The related infos. are :
MKL > 2019.5
MKL-DNN: 1.1.1

ARM support

It would be nice to provide an efficient execution on ARM. This architecture is widespread on mobile devices and will be used for future Apple Mac CPUs. AWS also provides instances based on ARM.

To do:

  1. Figure out what is required to cross-compile to ARM.
  2. Look into the ARM Compute Library which has GEMM primitives optimized for ARM NEON.
  3. Add ARM NEON vectorization for CPU kernels, and update automatic ISA dispatch accordingly.

cuda memory leak with python api?

I tranied a model which size is about 460M. The cuda memory was allocated about 669M to this model when it was loaded into python environment :
import ctranslate2
translator = ctranslate2.Translator("/data/ende_ctranslate2/", device="cuda")
My first quesiton is why the loaded model occupyied much more memory than the model size?

When i tried to translate the frist one batch of sentence:
translator.translate_batch([["▁H", "ello", "▁world", "!"]])
The cuda memory occpuied by this model gradually increaed but suddenly reached about 2600M and quickly falled to 800M finally. I quiet want know what happend during this peorid as this behavior always lead to my other programs running on the smme gpu cuda out memory error .

Besides, when i translate some longer sentences, the memory occpuied by this model will always increase and never decrease to the previous size. This is quiet abnormal and i wonder whether these phenomens are resulted by memory leak? Thanks.

compiled client doesn't work as expected in Windows

So I managed to compile everything with MSVC but I can't figure out why the client doesn't translate as expected. With short sentences containing only a few words (~10), it seems to be working fine. With longer sentences, I get very short, truncated, and irrelevant translations or just a single irrelevant word. Under OS X, it works wonderfully, no matter the length of the sentence. In both systems I'm using the same converted tf model and the same sentencepiece model.
The only weird thing I can notice is that the special underscore character from sentencepiece in shared_vocabulary.txt has encoding issues under Windows and appears as an empty box.

Implement GPU TopK without TensorRT

We should look into implementing the TopK layer with a custom CUDA kernel instead of using TensorRT. The motivation is to remove the TensorRT and cuDNN dependencies (cuDNN is a dependency of TensorRT).

The benefits are:

  • make it easier to build Python wheels with GPU support (cuBLAS would be the only external NVIDIA dependency);
  • reduce the total installation size.

Plans to support model trained in fairseq

Can you please support a model trained in fairseq, else since it is torch can it be imported to infer and quantized.

Also the model sizes are of transformer_big? Since if it is transformer _base it would be around half of the score.
Please consider distilling the model into smaller model that would help for inference and size.

Moving model/translator object between devices

I've started making adaptations to the OpenNMT-py rest server to allow the use of CTranslate2 models.
I'm thinking of some wrapping object in onmt.translate.translation_server, that would provide a similar API to onmt.translate.translator:

class CTranslate2Translator(object):
    """
    This should reproduce the onmt.translate.translator API.
    """

    def __init__(self, model_path, device, device_index, beam_size, n_best):
        import ctranslate2
        self.translator = ctranslate2.Translator(
            model_path,
            device=device,
            device_index=device_index,
            inter_threads=1,
            intra_threads=1,
            compute_type="default")
        self.beam_size = beam_size
        self.n_best = n_best

    def translate(self, texts_to_translate, batch_size=8):
        batch = [item.split(" ") for item in texts_to_translate]
        print(batch)
        preds = self.translator.translate_batch(
            batch,
            beam_size=self.beam_size,
            num_hypotheses=self.n_best
        )
        scores = [[item["score"] for item in ex] for ex in preds]
        predictions = [[" ".join(item["tokens"]) for item in ex] for ex in preds]
        return scores, predictions

This works fine for the translation API part.
Only remaining issue is that there is some logic in the server that requires models to move back and forth from/to CPU/cuda (to_cpu // to_gpu methods that call some .to(device) on the model).
Is this something we could easily add in the ctranslate2.Translator API?

Invalid resource handle when deleting ctranslate2.Translator

Hi @guillaumekln
There seems to be an issue when deleting a model from a device other than the 0th one.

import ctranslate2
translator = ctranslate2.Translator(
    "enes_general_medium_ctranslate2",
    device="cuda",
    device_index=0)
del translator

--> OK

import ctranslate2
translator = ctranslate2.Translator(
    "enes_general_medium_ctranslate2",
    device="cuda",
    device_index=1)
del translator

--> ERROR

terminate called after throwing an instance of 'std::runtime_error'
  what():  /root/ctranslate2-dev/src/primitives/cuda.cu:72: CUDA failed with error invalid resource handle
Aborted (core dumped)

(Inference works fine though, it's only when deleting the object that it fails.)

EDIT: This also happens when using the cli entrypoint ctranslate2/bin/translate.

Conversion breaks in some shared parameters setups.

Hey @guillaumekln

If we take a shared embeddings setup between encoder and decoder for instance, some aliases are made here:

def _alias_variables(self):
"""Find duplicate variables in spec and create aliases."""
# When a variable is duplicated, keep the version that comes first in
# the alphabetical order and alias the others.
variables = self.variables(ordered=True)
for name, value in reversed(variables):
for other_name, other_value in variables:
if name == other_name:
break
# Because variables can be transformed on load (e.g. transposed),
# we use an element-wise equality check.
if value.dtype == other_value.dtype and np.array_equal(value, other_value):
# Replace variable value by the alias name.
scope, attr_name = _parent_scope(name)
spec = index_spec(self, scope)
setattr(spec, attr_name, other_name)
break

which is called when .validate() is called.

Here, we .validate() before getting the vocabulary sizes:

model_spec.validate()
self._check_vocabulary_size("source", src_vocab, model_spec.source_vocabulary_size)
self._check_vocabulary_size("target", tgt_vocab, model_spec.target_vocabulary_size)

But, these {source,target}_vocabulary_size property/methods do not handle aliases:

@property
def source_vocabulary_size(self):
return self.encoder.embeddings.weight.shape[0]
@property
def target_vocabulary_size(self):
return self.decoder.embeddings.weight.shape[0]

--->

MODEL_SPEC AFTER VALIDATE {'weight': 'decoder/embeddings/weight', 'multiply_by_sqrt_depth': 'decoder/embeddings/multiply_by_sqrt_depth'}
Traceback (most recent call last):
  File "/home/moses/CTranslate2/env_onmt/bin/onmt_release_model", line 8, in <module>
    sys.exit(main())
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/onmt/bin/release_model.py", line 52, in main
    converter.convert(opt.output, model_spec, force=True)
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 74, in convert
    self._check_vocabulary_size("source", src_vocab, model_spec.source_vocabulary_size)
  File "/home/moses/CTranslate2/env_onmt/lib/python3.6/site-packages/ctranslate2/specs/transformer_spec.py", line 32, in source_vocabulary_size
    return self.encoder.embeddings.weight.shape[0]
AttributeError: 'str' object has no attribute 'shape'

Am I missing something here?

The Demo in the ReadMe doesn't work.

When I run the demo of ReadMe, I got an error:

Traceback (most recent call last):
  File "/root/miniconda3/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main
    converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/converter.py", line 39, in convert_from_args
    force=args.force)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/converter.py", line 53, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/root/miniconda3/lib/python3.7/site-packages/ctranslate2/converters/opennmt_py.py", line 22, in _load
    checkpoint = torch.load(self._model_path, map_location="cpu")
  File "/root/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
    result = unpickler.load()
  File "/root/miniconda3/lib/python3.7/site-packages/torchtext/vocab.py", line 119, in __setstate__
    if state['unk_index'] is None:
KeyError: 'unk_index'

The version of the torch is 1.4.0 and ctranslate2 is 1.5.1 on my development machine. And I add 'unk_index' not in state or in "/root/miniconda3/lib/python3.7/site-packages/torchtext/vocab.py:199", this test is passed.

Query int8 support on GPU once

Checking int8 support currently involves creating and destroying a TensorRT builder. This is expensive. To avoid this overhead in future calls, we could cache the result.

Approach: use std::call_once and store the result in a static variable.

compilation needs the <algorithm> header for std::max with MSVC

Hi @guillaumekln,

I was trying to compile under Visual Studio 2019 and I got an error that 'max': is not a member of 'std' in layer_norm_cpu.cc (line 30). Adding the <algorithm> header does the trick. After a bit of searching it seems this is because some Windows headers (WinDef.h) define their own macros for max and min.
Maybe it would be better to fix this in the CMakeLists.txt instead of adding the header just for Windows, so I tried adding a block

if(MSVC)
  add_definitions(-D_USE_MATH_DEFINES)
  add_definitions(-DNOMINMAX)
endif()

but it won't work --to be more specific, the error disappears but the build is not fully successful and no libraries are created.

The example of converting opennmt-tf model does not work.

The script in ( QuickState -> 2. Convert a model) fails.

$ ct2-opennmt-tf-converter --model_path averaged-ende-export500k-v2 --model_spec TransformerBase --output_dir ende_ctranslate2 --force

...

File ".local/lib/python3.6/site-packages/ctranslate2/bin/opennmt_tf_converter.py", line 19, in main
tgt_vocab=args.tgt_vocab).convert_from_args(args)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 40, in convert_from_args
force=args.force)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/converter.py", line 52, in convert
src_vocab, tgt_vocab = self._load(model_spec)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 126, in _load
tgt_vocab=self._tgt_vocab)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 66, in load_model
src_vocab = _get_asset_path(imported.examples_inputter.features_inputter)
File ".local/lib/python3.6/site-packages/ctranslate2/converters/opennmt_tf.py", line 51, in _get_asset_path
asset = getattr(lookup_table._initializer, "_filename", None)
AttributeError: '_RestoredResource' object has no attribute '_initializer'

error during model conversion

On OS X Catalina, now I get this error when I try to convert a model:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/bin/opennmt_tf_converter.py", line 23, in <module>
    main()
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/bin/opennmt_tf_converter.py", line 19, in main
    tgt_vocab=args.tgt_vocab).convert_from_args(args)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/converter.py", line 39, in convert_from_args
    force=args.force)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/converter.py", line 53, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/opennmt_tf.py", line 107, in _load
    tgt_vocab=self._tgt_vocab)
  File "/Users/panos/Development/CTranslate2/python/ctranslate2/converters/opennmt_tf.py", line 57, in load_model
    src_vocab = _get_asset_path(imported.examples_inputter.features_inputter)
AttributeError: 'AutoTrackable' object has no attribute 'examples_inputter'

OpenNMT-tf 2.0 supported?

I trained a Transformer model using OpenNMT-tf 2.0. The converter ran well but the translation result became weird. Does CTranslate2 support OpenNMT-tf 2.0?
Here are versions:
OpenNMT-tf == 2.3.0
tensorflow-gpu == 2.0.0

Proper configuration for server

Hi,
I've been digging around for a while in code integration but it is not clear to me which argumets are necessary. I guess "model" and "ct2_model" are not required at the same time...
Thanks

Placing a Translator on GPU N > 0 allocates memory on GPU 0

The code below will allocate some memory on GPU 0 even if the Translator is placed on another device:

import ctranslate2
translator = ctranslate2.Translator("ende_transformer", device="cuda", device_index=1)

Ideally, it should only allocate on GPU 1.

Limit work queue size when translating large files

The current TranslatorPool implementation is using a producer/consumer approach. The producer reads batches from the file and pushes them in a queue. Each consumer dequeues a batch and translates it.

As reading batches is commonly much faster than translating, batches quickly pile up in the work queue. This increases memory usage, especially when translating large files.

A basic fix is to limit the queue size. If the maximum size is reached, the producer should wait and be notified when a consumer dequeues a batch.

Link error/warning in OS X with --start-group and --end-group

The linker in OS X (LLVM 10) doesn't understand the --start-group and --end-group linking options. When building with the default Apple's toolset, removing these options allows building the project, although with a ton of warnings due to linking order and particularly related to boost::program_options. At least it builds and runs fine, as far as I have tested it.
If I change the compiler to gcc-9, it won't link at all.
I tried but I couldn't find a solution (maybe ordering the libraries manually?)

Improve int8 quantization performance on GPU

The current quantization code is based on thrust::reduce_by_key to get the absolute maximum of each row. However, this approach appears to be very slow in this context. It should be improved for better INT8 performance on GPU.

$ ./tests/benchmark_ops quantize cuda int8
benchmarking quantize_op(x, y, scale)
avg   0.186348 ms

$ ./tests/benchmark_ops quantize cpu int8
benchmarking quantize_op(x, y, scale)
avg   0.0024638 ms

OpenNMT-py model conversion failed because of KeyError

I'm trying to convert OpenNMT-py model to CTranslate2 format, but it fails because of KeyError. The model that I'm trying to convert is available here (it is named paracrawl.pt but it was renamed during uploading).


When I try to run conversion:

ct2-opennmt-py-converter --model_path paracrawl.pt --model_spec TransformerBase --output_dir paracrawl

It fails with KeyError:

Traceback (most recent call last):
  File "/usr/local/bin/ct2-opennmt-py-converter", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/bin/opennmt_py_converter.py", line 11, in main
    converters.OpenNMTPyConverter(args.model_path).convert_from_args(args)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 35, in convert_from_args
    return self.convert(
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 52, in convert
    src_vocab, tgt_vocab = self._load(model_spec)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 27, in _load
    set_transformer_spec(model_spec, variables)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 39, in set_transformer_spec
    set_transformer_encoder(spec.encoder, variables, relative=spec.with_relative_position)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 43, in set_transformer_encoder
    set_input_layers(spec, variables, "encoder", relative=relative)
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 59, in set_input_layers
    set_position_encodings(
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 136, in set_position_encodings
    spec.encodings = _get_variable(variables, "%s.pe" % scope).squeeze()
  File "/usr/local/lib/python3.8/site-packages/ctranslate2/converters/opennmt_py.py", line 141, in _get_variable
    return variables[name].numpy()
KeyError: 'encoder.embeddings.make_embedding.pe.pe'

I'm using Python 3.8 on my custom python:buster Docker image with theese Python packages installed:

Package              Version
-------------------- ----------
absl-py              0.9.0
cachetools           4.0.0
certifi              2019.11.28
chardet              3.0.4
click                7.1.1
ConfigArgParse       1.0
ctranslate2          1.8.0
Flask                1.1.1
future               0.18.2
google-auth          1.11.3
google-auth-oauthlib 0.4.1
grpcio               1.27.2
idna                 2.9
itsdangerous         1.1.0
Jinja2               2.11.1
Markdown             3.2.1
MarkupSafe           1.1.1
numpy                1.18.1
oauthlib             3.1.0
OpenNMT-py           1.0.2
pip                  19.3.1
protobuf             3.11.3
pyasn1               0.4.8
pyasn1-modules       0.2.8
pyonmttok            1.18.3
requests             2.23.0
requests-oauthlib    1.3.0
rsa                  4.0
setuptools           41.6.0
six                  1.14.0
tensorboard          2.1.1
torch                1.4.0
torchtext            0.4.0
tqdm                 4.30.0
urllib3              1.25.8
waitress             1.4.3
Werkzeug             1.0.0
wheel                0.33.6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.