GithubHelp home page GithubHelp logo

fast-transformers's Introduction

Fast Transformers

Transformers are very successful models that achieve state of the art performance in many natural language tasks. However, it is very difficult to scale them to long sequences due to the quadratic scaling of self-attention.

This library was developed for our research on fast attention for transformers. You can find a list of our papers in the docs as well as related papers and papers that we have implemented.

Quick-start

The following code builds a transformer with softmax attention and one with linear attention and compares the time required by each to encode a sequence with 1000 elements.

import torch
from fast_transformers.builders import TransformerEncoderBuilder

# Create the builder for our transformers
builder = TransformerEncoderBuilder.from_kwargs(
    n_layers=8,
    n_heads=8,
    query_dimensions=64,
    value_dimensions=64,
    feed_forward_dimensions=1024
)

# Build a transformer with softmax attention
builder.attention_type = "full"
softmax_model = builder.get()

# Build a transformer with linear attention
builder.attention_type = "linear"
linear_model = builder.get()

# Construct the dummy input
X = torch.rand(10, 1000, 8*64)

# Prepare everythin for CUDA
X = X.cuda()
softmax_model.cuda()
softmax_model.eval()
linear_model.cuda()
linear_model.eval()

# Warmup the GPU
with torch.no_grad():
    softmax_model(X)
    linear_model(X)
torch.cuda.synchronize()

# Measure the execution time
softmax_start = torch.cuda.Event(enable_timing=True)
softmax_end = torch.cuda.Event(enable_timing=True)
linear_start = torch.cuda.Event(enable_timing=True)
linear_end = torch.cuda.Event(enable_timing=True)

with torch.no_grad():
    softmax_start.record()
    y = softmax_model(X)
    softmax_end.record()
    torch.cuda.synchronize()
    print("Softmax: ", softmax_start.elapsed_time(softmax_end), "ms")
    # Softmax: 144 ms (on a GTX1080Ti)

with torch.no_grad():
    linear_start.record()
    y = linear_model(X)
    linear_end.record()
    torch.cuda.synchronize()
    print("Linear: ", linear_start.elapsed_time(linear_end), "ms")
    # Linear: 68 ms (on a GTX1080Ti)

Dependencies & Installation

The fast transformers library has the following dependencies:

  • PyTorch
  • C++ toolchain
  • CUDA toolchain (if you want to compile for GPUs)

For most machines installation should be as simple as:

pip install --user pytorch-fast-transformers

Note: macOS users should ensure they have llvm and libomp installed. Using the homebrew package manager, this can be accomplished by running brew install llvm libomp.

Documentation

There exists a dedicated documentation site but you are also encouraged to read the source code.

Research

Ours

To read about the theory behind some attention implementations in this library we encourage you to follow our research.

  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (2006.16236)
  • Fast Transformers with Clustered Attention (2007.04825)

If you found our research helpful or influential please consider citing

@inproceedings{katharopoulos_et_al_2020,
    author = {Katharopoulos, A. and Vyas, A. and Pappas, N. and Fleuret, F.},
    title = {Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention},
    booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
    year = {2020}
}

@article{vyas_et_al_2020,
    author={Vyas, A. and Katharopoulos, A. and Fleuret, F.},
    title={Fast Transformers with Clustered Attention},
    booktitle = {Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS)},
    year={2020}
}

By others

  • Efficient Attention: Attention with Linear Complexities (1812.01243)
  • Linformer: Self-Attention with Linear Complexity (2006.04768)
  • Reformer: The Efficient Transformer (2001.04451)

Support, License and Copyright

This software is distributed with the MIT license which pretty much means that you can use it however you want and for whatever reason you want. All the information regarding support, copyright and the license can be found in the LICENSE file in the repository.

fast-transformers's People

Contributors

bionicles avatar bratao avatar hadaev8 avatar jdemouth-nvidia avatar loicgrobol avatar qibinc avatar tariqahassan avatar zhiyuanchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-transformers's Issues

Relative Position Representations

Hi,
Thanks for your great work!
I want to know how to do the relative position representations in causal-linear model
Thanks for your help

Segment-Level Recurrence with State Reuse

Hi,
Thanks for your great work!
I have some questions, if I want to use segment-level recurrence with state reuse like Transformer-XL in language model,how to do this ,Should I rewrite the code in causal_product_cuda.cu
Thanks for your help.

Training

Thanks for this great project!

Could you please share training code for models in colab?

Thanks!

Image generation/completion

Hello,

I can't seem to find the code corresponding to image generation/completion part of the paper. My question would be if you treat images as flattened rgb values, and wether image completion works by having N pixel values and predicting the N+1th pixel value and repeating this step until the whole image is completed.

Thank you in advance!

Local Product CUDA Kernel

Nice library. I have a question regarding the local product (longformer sliding window) kernel you have implemented. If I am correctly interpreting the implementation here, the KQ^T operation is decomposed into blocks of size 64 along the num_queries dimension which are then dotted via the gemm implementation in cuBlas with a window of 64 +- context_window/2 keys. The local_context window for each query is then copied out with a custom copy kernel.

With this implementation, the dot products for a much larger context window than local_context are computed but then subsequently ignored. Since these computations already happen, is it true that setting local_context to any value in [2,64] would essentially not alter the latency of the implementation but likely improve the generalization ability of the end transformer (due to the larger context window of each layer)?

Thanks!

Image Generation/Completion Training-Code

Hi,

I am looking for the training code of the image-completion/generation examples. I found the code to run your pre-trained model and it works nicely, but now I would like to understand the training process and was wondering where I could find this code.

Thank you for your help.

install error

pytorch1.5, python3.6

error: invalid static_cast from type const torch::OrderedDict<std::basic_string<char>, at::Tensor> to type const torch::OrderedDict<std::basic_string<char>, at::Tensor>&

Any chance for pre-built binaries?

Build times for this package can be quite slow (>15 minutes), and sometimes this is not ideal, like when this package is used in a ephemeral environment such as Colab or Kaggle.

So, as the title says, are you planning to do some automated builds for this package?

Ensure all forward() methods have a proper description of input tensors

We still have plenty of modules where the forward method lacks a docstring. Although the description might seem trivial, for instance the forward method in linear attention obviously implements the linear attention, however, the tensor shape and argument description is quite important.

This should also cover #8 .

What is best way to perform recurrent sampling while training?

In general, I want to have teacher forcing pas and self-generated (free-running generative) pass aka professor forcing.

For now, looks like I need to merge FullAttention RecurrentFullAttention RecurrentCrossFullAttention into one class.
And use it with flags like recurrent = true
And the same for layers and encoder/decoder class.
Seems inconvenient.
Am I right? Or here is a better way?

Masking extension

As pointed out by #22 the masking system can be extended to allow for more options. I will mention some below but possibly more could be added:

  • Key only masks for attention mask (this allows for an attn_mask that is used with linear attention for instance)
  • Different attn_masks per element in the batch

RuntimeError: CUDA error: an illegal memory access was encountered

Hi, thanks for the great work!

I install the package successfully but encounter an error during training:

File "/home/annahung/189nas/2020/fast_remi/linear_transformer/model.py", line 294, in train predict = model(batch_x) File "/home/annahung/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/annahung/189nas/2020/fast_remi/linear_transformer/modules.py", line 108, in forward decoder_y_ = self.transformer(batch_x_embed, attn_mask=attn_mask) #shape=(bs, num_heads*value_dim) File "/home/annahung/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/annahung/.local/lib/python3.6/site-packages/fast_transformers/transformers.py", line 131, in forward x = layer(x, attn_mask=attn_mask, length_mask=length_mask) File "/home/annahung/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/annahung/.local/lib/python3.6/site-packages/fast_transformers/transformers.py", line 77, in forward key_lengths=length_mask File "/home/annahung/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/annahung/.local/lib/python3.6/site-packages/fast_transformers/attention/attention_layer.py", line 98, in forward key_lengths File "/home/annahung/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/annahung/.local/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 74, in forward values File "/home/annahung/.local/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 24, in causal_linear return V_new.permute(0,2,1,3).contiguous() RuntimeError: CUDA error: an illegal memory access was encountered

I can run other PyTorch models but not with fast-transformer...
I read issue #12 and consider this might also be caused by the CUDA version (CUDA 10.1, V10.1.105).
Any suggestions to solve this? Thanks a lot.

Error with recurrent attention ValueError: too many values to unpack (expected 2)

Colab Link:
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=cflC2xVxKb5M&line=8&uniqifier=1

Full trace:

<ipython-input-20-cd7d3f9fcf71> in forward(self, batch)
     59         src = self.encoder(batch['inp'])
     60         src = self.pos_encoder(src)
---> 61         src = self.transformer_encoder(src)
     62 
     63         trg = self.decoder(batch['out'][:,:-1])

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
    131         # Apply all the transformers
    132         for i, layer in enumerate(self.layers):
--> 133             x, s = layer(x, state[i])
    134             state[i] = s
    135 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
     77 
     78         # Run the self attention and add it to the input
---> 79         x2, state = self.attention(x, x, x, state)
     80         x = x + self.dropout(x2)
     81 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/attention/self_attention/attention_layer.py in forward(self, query, key, value, state, memory)
     83 
     84         # Reshape them into many heads and compute the attention
---> 85         N, D = query.shape
     86         H = self.n_heads
     87         new_value, state = self.inner_attention(

ValueError: too many values to unpack (expected 2)

Support for 16-bit Floats

Hi,

I tried running CausalLinear attention while using PyTorch Automatic Mixed Precision. I got an error, saying "line 44, in forward CausalDotProduct.dot[device.type](
RuntimeError: expected scalar type Float but found Half"

Is this a bug? Or does your library not offer support for 16-bit precision floats?

Thank you for your time.

Expected usage of `length_masks` in `TransformerEncoder.forward`

Hi, unsure how to use length_masks for either softmax/full attention or linear attention TransformerEncoder models. In the event that this parameter is not supported for these models, it would be great to get an informative error message. Usage:

import torch
from fast_transformers.builders import TransformerEncoderBuilder
from fast_transformers.masking import LengthMask

# Create the builder for our transformers
builder = TransformerEncoderBuilder.from_kwargs(n_layers=8,
                                                n_heads=8,
                                                query_dimensions=64,
                                                value_dimensions=64,
                                                feed_forward_dimensions=1024)

# Build a transformer with softmax attention
builder.attention_type = "full"
softmax_model = builder.get()

# Build a transformer with linear attention
builder.attention_type = "linear"
linear_model = builder.get()

# Construct the dummy input
X = torch.rand(10, 128, 8 * 64)

# Construct the length array corresponding to all elements being length 64
lengths = torch.Tensor([64] * 10).long()  # tensor([64, 64, 64, 64, 64, 64, 64, 64, 64, 64])
length_mask = LengthMask(lengths)

y = softmax_model(X, length_mask=length_mask)

Results in the error:

.../fast_transformers/attention/full_attention.py in forward(self, queries, keys, values, attn_mask, query_lengths, key_lengths)
     66         if not attn_mask.all_ones:
     67             QK = QK + attn_mask.additive_matrix
---> 68         QK = QK + key_lengths.additive_matrix[:, None, None]
     69 
     70         # Compute the attention and the weighted average

RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 3

And ditto with y = linear_model(X, length_mask=length_mask).

Regarding arbitrary mask

Hi,

I observe that only non-masked and causally-masked linear attentions have been implemented, and it's not clear to me why arbitrary mask can be difficult to implement.

Let m be the mask of the input sequence, where m_i = 0 means the token i is masked out (the inverse of m is the same as the src_key_padding_mask argument in torch.nn.Transformer). Then Equation (5) in your paper becomes:

Is that correct?

By the way, in your causal mask implementation, the mask doesn't seem to be applied anywhere (sorry if I've overlooked that).

Thank you in advance.

Experimental Code - CTC Loss issue.

Firstly, great paper and accompanying repo, thanks.

I am evaluating the fast-transformer on a problem similar to section 4.3 in the paper where I have a non-autoregressive task trained using CTCLoss. I define my new model as per below but run into a problem during training where the loss converges to 1.0 and gets stuck?

class Model(Module):

    def __init__(self, classes=5, stride=3, n=5, d=64):
        super().__init__()

        Transformer = TransformerEncoderBuilder.from_kwargs(
            n_layers=n,
            n_heads=n,
            query_dimensions=d,
            value_dimensions=d,
            feed_forward_dimensions=256,
            activation='gelu',
            dropout=0.05,
            attention_type = "linear",
        )

        self.layers = Sequential(
            # N, C, W
            Conv1d(1, n * d, self.k, padding=4, stride=stride),
            Transpose(1, 2),
            # N, W, C
            Transformer.get(),
            Linear(n * d, classes),
            LogSoftmax(2),
        )

    def forward(self, x):
        return self.layers(x)

Is the code used in section 4 also going to be made available?

Thanks

Please Add Pytorch to Be Automatically Installed

Hi,
I ran into to problem that you need to manually install pytorch before running pip install --user pytorch-fast-transformers. It would be very convenient if pytorch were installed automatically in the setup.py

No module named 'fast_transformers.causal_product.causal_product_cpu' (solved: needed to at CUDA to the PATH)

Hi there,

I am having some trouble using this library. I cloned this repo (July 19th) and ran the setup file, the setup ran but now I am getting this error (the same error occurs with pip install):

  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/builders/__init__.py", line 29, in <module>
    from .transformer_encoder_builder import TransformerEncoderBuilder
  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/builders/transformer_encoder_builder.py", line 31, in <module>
    from ..attention import AttentionLayer, FullAttention, \
  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/attention/causal_linear_attention.py", line 12, in <module>
    from fast_transformers.causal_product import causal_dot_product 
  File "/usr/local/lib/python3.6/dist-No module named 'fast_transformers.causal_product.causal_product_cpu'packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ModuleNotFoundError: 

When I comment out importing this file (above), I get an import error on the hashing files instead, so I think the issues is these CUDA files. I am using Ubuntu 18.04 and PyTorch 1.5.1 with CUDA 10.2. However using the exact same setup procedure on Google Colab, I have no issues - Colab uses PyTorch 1.5.1 but CUDA 10.1.

Could the CUDA version difference be the issue?

Thanks :)

hash_cuda.cu: No such file or directory

pip install --user pytorch-fast-transformers
....

gcc: error: fast_transformers/hashing/hash_cuda.cu: No such file or directory

ubuntu 18.04
Python 3.6.7
CUDA Version: 10.1

Encoder-decoder setup?

Thanks for all the work!

Is there anyway to use this library for a task that would typically require an encoder-decoder architecture, like machine translation?

I see the BERT example in the docs, but no mention of a decoder anywhere.

Thanks again :)

causal_product_cuda.cu๏ผŒError compiling objects for extension

Hi,

Thank you for your great work!
I have troubles installing the package:
when compile causal_product_cuda.cu, it failed.

System setup:

 Ubuntu 18.04
 Python 3.7.3 
 Cuda V10.1
 Pytorch  1.5.0

log:
Segmentation fault (core dumped)
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "torch/utils/cpp_extension.py", line 1402, in _run_ninja_build
check=True)
File "/python3.7/subprocess.py", line 487, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

File "/torch/utils/cpp_extension.py", line 425, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "torch/utils/cpp_extension.py", line 1142, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/torch/utils/cpp_extension.py", line 1415, in _run_ninja_build
raise RuntimeError(message)
RuntimeError: Error compiling objects for extension

Thanks for your help.

Feature Maps without using builders

Hi,

I have been trying to use feature maps without using builders to construct the model but I experience an error during training.

I have tried the following:

attention_layer = AttentionLayer(
            LinearAttention(d_query, feature_map=Favor.factory(n_dims=120)), 
            d_model, 
            n_heads, 
            d_query, 
            d_values)

transformer = TransformerEncoder(
            [
                TransformerEncoderLayer(
                    attention_layer,
                    d_model,
                    d_ff,
                    dropout,
                    activation
                ) for l in range(n_layers)
            ],
            norm_layer = LayerNorm(d_model)
            )

I have also tried to define only the self-attention module using builders as follows:

attention_module = AttentionBuilder.from_kwargs(
                query_dimensions=d_query, 
                feature_map=Favor.factory(n_dims=120)).get('linear')

attention_layer = AttentionLayer(attention_module, 
                                 d_model, 
                                 n_heads, 
                                 d_query, 
                                 d_values)

But both ways give rise to the following error (in my case d_query=60)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [60, 60]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Error detected in MmBackward. The traceback of the forward call that caused the error terminates at:

File "/fast_transformers/feature_maps/fourier_features.py", line 185, in forward u = x.unsqueeze(-2).matmul(self.omega).squeeze(-2)

If I switch to using entirely builders to define the model the problem does not appear. But I was wondering if the random fourier features can be used outside builders? (as I personally prefer the vanilla interface).

Many thanks in advance!

Positional embedding

Hello,
First of all, great project and paper!

I tried but couldn't find out how positional embedding is being handled. Maybe it's a silly question, but should it be provided externally or is it implicit?

What I would like to do is to have a relative embedding not on the time axis, but on X and Y for each token as they are spatially related in a 2-D plane. These tokens could be pixels in an image or words in a semi-structured document such as a table or a form, where important information is contained in the vertical and horizontal alignments.

I am also thinking the problem could be approached with a boolean mask that only allows vertically or horizontally aligned tokens to see each other.

Thank you

forward() got multiple values for argument 'state'

Sorry for disturbing, I can't understand is it me or error in lib.
I'm doing sampling like this:

with torch.no_grad():
    trg_tensor = torch.LongTensor([p2idx['SOS'], ]).unsqueeze(0).to(device)
    state = None
    out_token = trg_tensor
    for i in range(max_len):
        # decoder_mask = TriangularCausalMask(trg_tensor.size(1), device=device)
        # decoder_len_mask = LengthMask(trg_tensor.new_full((trg_tensor.shape[0],), trg_tensor.shape[1], dtype=torch.int64))

        output = model.pos_decoder(model.decoder(out_token), i)
        output, state = model.fc_out(model.transformer_decoder_rnn(output.squeeze(1), memory, memory_length_mask=encoder_len_mask, state=state))
        out_token = output.argmax(-1)[:,-1].unsqueeze(0)
        trg_tensor = torch.cat([trg_tensor, out_token], axis=-1)
        if out_token == p2idx['EOS']:
            break

Whole code and trace
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?usp=sharing

Unable to install extensions

Hello

I had a bit of trouble installing this package via pip.
I've included the steps I took to resolve these problem below.

Resolution

  1. First I encountered this somewhat common pytorch error:
Your compiler (g++) is not compatible with the compiler Pytorch was
built with for this platform, which is clang++ on darwin. Please
use clang++ to to compile your extension. Alternatively, you may
compile PyTorch from source using g++, and then you can also use
g++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.

I solved this by dropping the binary and installing pytorch from source.

  1. Next I encountered an issue with the -fopenmp flag used to install the C++ extensions.
    I tried to solve this by running
brew install llvm libomp

and replacing each

extra_compile_args=["-fopenmp", ...]

with

extra_compile_args=["-Xpreprocessor", "-fopenmp", ...]

then installing the normal way:

git clone [email protected]:idiap/fast-transformers.git
cd fast-transformers
# modify setup.py as shown above
python setup.py install
  1. From there I have been able to run a simple forward pass through a model:
import torch
from fast_transformers.builders import TransformerEncoderBuilder

# Create the builder for our transformers
builder = TransformerEncoderBuilder.from_kwargs(
    n_layers=8,
    n_heads=8,
    query_dimensions=64,
    value_dimensions=64,
    feed_forward_dimensions=1024
)

# Build a transformer with linear attention
builder.attention_type = "linear"
linear_model = builder.get()

# Construct the dummy input
X = torch.rand(10, 1000, 8*64)

with torch.no_grad():
    out = linear_model(X)
    
assert isinstance(out, torch.Tensor)  # True

Unfortunately it will be a few days before I can run a proper test on a GPU (I have some data preprocessing to do first :)),
but I didn't want to wait until then to post this.

System information

  • OS: macOS 10.15.5
  • Python: Anaconda 3.7.7
  • Conda: conda 4.8.3
  • gcc: 11.0.3

Thank you for your wonderful work!

A different windows installation error.

possibly related to the #48

While installing python-performer which leverages fast-transformers.

(RAT) C:\Users\codeninja\Dev\quantfolio\RAT>pip install pytorch-fast-transformers
Collecting pytorch-fast-transformers
  Using cached https://files.pythonhosted.org/packages/03/9b/38905999695b381a1e239b91afce219892a23614248fc024e04558f36237/pytorch-fast-transformers-0.3.0.tar.gz
Requirement already satisfied: torch in c:\users\codeninja\anaconda3\envs\rat\lib\site-packages (from pytorch-fast-transformers) (1.7.0+cu110)
Requirement already satisfied: future in c:\users\codeninja\anaconda3\envs\rat\lib\site-packages (from torch->pytorch-fast-transformers) (0.18.2)
Requirement already satisfied: dataclasses in c:\users\codeninja\anaconda3\envs\rat\lib\site-packages (from torch->pytorch-fast-transformers) (0.6)
Requirement already satisfied: typing-extensions in c:\users\codeninja\anaconda3\envs\rat\lib\site-packages (from torch->pytorch-fast-transformers) (3.7.4.3)
Requirement already satisfied: numpy in c:\users\codeninja\anaconda3\envs\rat\lib\site-packages (from torch->pytorch-fast-transformers) (1.19.4)
Building wheels for collected packages: pytorch-fast-transformers
  Building wheel for pytorch-fast-transformers (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\codeninja\Anaconda3\envs\RAT\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\CODENI~1\\AppData\\Local\\Temp\\pip-install-ft9sxu3x\\pytorch-fast-transformers\\setup.py'"'"'; __file__='"'"'C:\\Users\\CODENI~1\\AppData\\Local\\Temp\\pip-install-ft9sxu3x\\pytorch-fast-transformers\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\CODENI~1\AppData\Local\Temp\pip-wheel-1q41zbkd' --python-tag cp37
       cwd: C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\
  Complete output (275 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.7
  creating build\lib.win-amd64-3.7\fast_transformers
  copying fast_transformers\masking.py -> build\lib.win-amd64-3.7\fast_transformers
  copying fast_transformers\transformers.py -> build\lib.win-amd64-3.7\fast_transformers
  copying fast_transformers\utils.py -> build\lib.win-amd64-3.7\fast_transformers
  copying fast_transformers\weight_mapper.py -> build\lib.win-amd64-3.7\fast_transformers
  copying fast_transformers\__init__.py -> build\lib.win-amd64-3.7\fast_transformers
  creating build\lib.win-amd64-3.7\fast_transformers\aggregate
  copying fast_transformers\aggregate\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\aggregate
  creating build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\attention_layer.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\causal_linear_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\clustered_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\conditional_full_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\exact_topk_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\full_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\improved_clustered_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\improved_clustered_causal_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\linear_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\local_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\reformer_attention.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  copying fast_transformers\attention\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\attention
  creating build\lib.win-amd64-3.7\fast_transformers\attention_registry
  copying fast_transformers\attention_registry\registry.py -> build\lib.win-amd64-3.7\fast_transformers\attention_registry
  copying fast_transformers\attention_registry\spec.py -> build\lib.win-amd64-3.7\fast_transformers\attention_registry
  copying fast_transformers\attention_registry\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\attention_registry
  creating build\lib.win-amd64-3.7\fast_transformers\builders
  copying fast_transformers\builders\attention_builders.py -> build\lib.win-amd64-3.7\fast_transformers\builders
  copying fast_transformers\builders\base.py -> build\lib.win-amd64-3.7\fast_transformers\builders
  copying fast_transformers\builders\transformer_builders.py -> build\lib.win-amd64-3.7\fast_transformers\builders
  copying fast_transformers\builders\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\builders
  creating build\lib.win-amd64-3.7\fast_transformers\causal_product
  copying fast_transformers\causal_product\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\causal_product
  creating build\lib.win-amd64-3.7\fast_transformers\clustering
  copying fast_transformers\clustering\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\clustering
  creating build\lib.win-amd64-3.7\fast_transformers\events
  copying fast_transformers\events\event.py -> build\lib.win-amd64-3.7\fast_transformers\events
  copying fast_transformers\events\event_dispatcher.py -> build\lib.win-amd64-3.7\fast_transformers\events
  copying fast_transformers\events\filters.py -> build\lib.win-amd64-3.7\fast_transformers\events
  copying fast_transformers\events\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\events
  creating build\lib.win-amd64-3.7\fast_transformers\feature_maps
  copying fast_transformers\feature_maps\base.py -> build\lib.win-amd64-3.7\fast_transformers\feature_maps
  copying fast_transformers\feature_maps\fourier_features.py -> build\lib.win-amd64-3.7\fast_transformers\feature_maps
  copying fast_transformers\feature_maps\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\feature_maps
  creating build\lib.win-amd64-3.7\fast_transformers\hashing
  copying fast_transformers\hashing\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\hashing
  creating build\lib.win-amd64-3.7\fast_transformers\local_product
  copying fast_transformers\local_product\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\local_product
  creating build\lib.win-amd64-3.7\fast_transformers\recurrent
  copying fast_transformers\recurrent\transformers.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent
  copying fast_transformers\recurrent\_utils.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent
  copying fast_transformers\recurrent\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent
  creating build\lib.win-amd64-3.7\fast_transformers\sparse_product
  copying fast_transformers\sparse_product\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\sparse_product
  creating build\lib.win-amd64-3.7\fast_transformers\clustering\hamming
  copying fast_transformers\clustering\hamming\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\clustering\hamming
  creating build\lib.win-amd64-3.7\fast_transformers\recurrent\attention
  copying fast_transformers\recurrent\attention\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention
  creating build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\cross_attention
  copying fast_transformers\recurrent\attention\cross_attention\attention_layer.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\cross_attention
  copying fast_transformers\recurrent\attention\cross_attention\full_attention.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\cross_attention
  copying fast_transformers\recurrent\attention\cross_attention\linear_attention.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\cross_attention
  copying fast_transformers\recurrent\attention\cross_attention\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\cross_attention
  creating build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\self_attention
  copying fast_transformers\recurrent\attention\self_attention\attention_layer.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\self_attention
  copying fast_transformers\recurrent\attention\self_attention\full_attention.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\self_attention
  copying fast_transformers\recurrent\attention\self_attention\linear_attention.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\self_attention
  copying fast_transformers\recurrent\attention\self_attention\__init__.py -> build\lib.win-amd64-3.7\fast_transformers\recurrent\attention\self_attention
  running build_ext
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py:274: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
    warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
  building 'fast_transformers.hashing.hash_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\hashing
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\hashing\hash_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing/hash_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=hash_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_hash_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing/hash_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\hashing\hash_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing\hash_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing\hash_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing\hash_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.aggregate.aggregate_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\aggregate
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\aggregate\aggregate_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/aggregate/aggregate_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=aggregate_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_aggregate_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/aggregate/aggregate_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\aggregate\aggregate_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/aggregate\aggregate_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/aggregate\aggregate_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/aggregate\aggregate_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.clustering.hamming.cluster_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\clustering
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\clustering\hamming
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\clustering\hamming\cluster_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/clustering/hamming/cluster_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cluster_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\clustering\hamming\cluster_cpu.cpp(161): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_cluster_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/clustering/hamming/cluster_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\clustering\hamming\cluster_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/clustering/hamming\cluster_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/clustering/hamming\cluster_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/clustering/hamming\cluster_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.sparse_product.sparse_product_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\sparse_product
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\sparse_product\sparse_product_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product/sparse_product_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=sparse_product_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_sparse_product_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product/sparse_product_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\sparse_product\sparse_product_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\sparse_product_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\sparse_product_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\sparse_product_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.sparse_product.clustered_sparse_product_cpu' extension
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\sparse_product\clustered_sparse_product_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product/clustered_sparse_product_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=clustered_sparse_product_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_clustered_sparse_product_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product/clustered_sparse_product_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\sparse_product\clustered_sparse_product_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\clustered_sparse_product_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\clustered_sparse_product_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/sparse_product\clustered_sparse_product_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.causal_product.causal_product_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\causal_product
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\causal_product\causal_product_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/causal_product/causal_product_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=causal_product_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_causal_product_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/causal_product/causal_product_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\causal_product\causal_product_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/causal_product\causal_product_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/causal_product\causal_product_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/causal_product\causal_product_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.local_product.local_product_cpu' extension
  creating C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers\local_product
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] cl /showIncludes /nologo /Ox /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\local_product\local_product_cpu.cpp /FoC:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/local_product/local_product_cpu.obj -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=local_product_cpu -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++14
  cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
  cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
  C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(389): warning C4101: 'e': unreferenced local variable
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\lib /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\libs /LIBPATH:C:\Users\codeninja\Anaconda3\envs\RAT\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_local_product_cpu C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/local_product/local_product_cpu.obj /OUT:build\lib.win-amd64-3.7\fast_transformers\local_product\local_product_cpu.cp37-win_amd64.pyd /IMPLIB:C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/local_product\local_product_cpu.cp37-win_amd64.lib
     Creating library C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/local_product\local_product_cpu.cp37-win_amd64.lib and object C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/local_product\local_product_cpu.cp37-win_amd64.exp
  Generating code
  Finished generating code
  building 'fast_transformers.hashing.hash_cuda' extension
  Emitting ninja build file C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\hashing\hash_cuda.cu -o C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing/hash_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -arch=compute_50 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=hash_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: C:/Users/CODENI~1/AppData/Local/Temp/pip-install-ft9sxu3x/pytorch-fast-transformers/build/temp.win-amd64-3.7/Release/fast_transformers/hashing/hash_cuda.obj
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\TH -IC:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\codeninja\Anaconda3\envs\RAT\include -IC:\Users\codeninja\Anaconda3\envs\RAT\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -c C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\fast_transformers\hashing\hash_cuda.cu -o C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\build\temp.win-amd64-3.7\Release\fast_transformers/hashing/hash_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -arch=compute_50 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=hash_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\c10/util/ThreadLocalDebugInfo.h(12): warning: modifier is ignored on an enum specifier

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\ATen/core/boxing/impl/boxing.h(100): warning: integer conversion resulted in a change of sign

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\ATen/record_function.h(13): warning: modifier is ignored on an enum specifier

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\ATen/core/op_registration/op_whitelist.h(39): warning: integer conversion resulted in a change of sign

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1348): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\torch/csrc/autograd/profiler.h(106): warning: modifier is ignored on an enum specifier

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\torch/csrc/autograd/profiler.h(138): warning: modifier is ignored on an enum specifier

  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include/torch/csrc/api/include\torch/nn/modules/transformerlayer.h(74): warning: extra ";" ignored

  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.23.28105/include\xutility(2074): error: function "torch::OrderedDict<Key, Value>::Item::operator=(const torch::OrderedDict<std::string, at::Tensor>::Item &) [with Key=std::string, Value=at::Tensor]" (declared implicitly) cannot be referenced -- it is a deleted function
            detected during:
              instantiation of "_OutIt std::_Move_unchecked1(_InIt, _InIt, _OutIt, std::false_type) [with _InIt=torch::OrderedDict<std::string, at::Tensor>::Item *, _OutIt=torch::OrderedDict<std::string, at::Tensor>::Item *]"
  (2090): here
              instantiation of "_OutIt std::_Move_unchecked(_InIt, _InIt, _OutIt) [with _InIt=torch::OrderedDict<std::string, at::Tensor>::Item *, _OutIt=torch::OrderedDict<std::string, at::Tensor>::Item *]"
  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.23.28105/include\vector(1304): here
              instantiation of "std::vector<_Ty, _Alloc>::iterator std::vector<_Ty, _Alloc>::erase(std::vector<_Ty, _Alloc>::const_iterator) [with _Ty=torch::OrderedDict<std::string, at::Tensor>::Item, _Alloc=std::allocator<torch::OrderedDict<std::string, at::Tensor>::Item>]"
  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include\torch/csrc/api/include/torch/ordered_dict.h(420): here
              instantiation of "void torch::OrderedDict<Key, Value>::erase(const Key &) [with Key=std::string, Value=at::Tensor]"
  C:/Users/codeninja/Anaconda3/envs/RAT/lib/site-packages/torch/include/torch/csrc/api/include\torch/nn/modules/container/parameterdict.h(51): here

  2 errors detected in the compilation of "C:/Users/CODENI~1/AppData/Local/Temp/tmpxft_00006418_00000000-10_hash_cuda.cpp1.ii".
  hash_cuda.cu
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py", line 1522, in _run_ninja_build
      env=env)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\subprocess.py", line 512, in run
      output=stdout, stderr=stderr)
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\setup.py", line 209, in <module>
      setup_package()
    File "C:\Users\CODENI~1\AppData\Local\Temp\pip-install-ft9sxu3x\pytorch-fast-transformers\setup.py", line 204, in setup_package
      install_requires=["torch"]
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\setuptools\__init__.py", line 163, in setup
      return distutils.core.setup(**attrs)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\core.py", line 148, in setup
      dist.run_commands()
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\wheel\bdist_wheel.py", line 290, in run
      self.run_command('build')
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\command\build.py", line 135, in run
      self.run_command(cmd_name)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\setuptools\command\build_ext.py", line 87, in run
      _build_ext.run(self)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
      _build_ext.build_ext.run(self)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\command\build_ext.py", line 340, in run
      self.build_extensions()
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py", line 653, in build_extensions
      build_ext.build_extensions(self)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions
      self.build_extension(ext)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\setuptools\command\build_ext.py", line 208, in build_extension
      _build_ext.build_extension(self, ext)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\distutils\command\build_ext.py", line 534, in build_extension
      depends=ext.depends)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py", line 635, in win_wrap_ninja_compile
      with_cuda=with_cuda)
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py", line 1238, in _write_ninja_file_and_compile_objects
      error_prefix='Error compiling objects for extension')
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\torch\utils\cpp_extension.py", line 1538, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  Error in atexit._run_exitfuncs:
  Traceback (most recent call last):
    File "C:\Users\codeninja\Anaconda3\envs\RAT\lib\site-packages\colorama\ansitowin32.py", line 59, in closed
      return stream.closed
  ValueError: underlying buffer has been detached
  ----------------------------------------
  ERROR: Failed building wheel for pytorch-fast-transformers

Local attention returning nan when using mask

Hello @angeloskath and fast-transformers team.

I was testing the version on master with local attention. And apparently there is a bug when using mask. It is always gives nan values if I use mask. Using other types of attentions such as full or linear works good.
If I do not use a length_mask, local attention works.

I attached a small code where you can reproduce the error.

bug_local.zip

Here is the output of Linear and Local. I'm using Python 3.8, Pytorch 1.6 and without cuda

Linear Output:
tensor([[[-4.6710e-02,  2.5698e-01,  1.6553e-01,  ..., -3.8887e-02,
           1.4760e+00, -5.5345e-01],
         [ 4.6288e-03,  7.2794e-02, -3.8738e-01,  ..., -6.5744e-01,
           1.5919e+00, -1.4824e+00],
         [ 9.8010e+00,  7.8635e-01, -5.6454e-01,  ...,  4.7453e+00,
          -2.0123e+00, -2.6727e+00],
         ...,
         [-7.6029e-04,  7.1779e-01, -7.7213e-01,  ..., -1.8993e+00,
           4.3610e+00,  2.4297e+00],
         [-1.7685e-01,  7.0581e-01, -1.1693e+00,  ..., -2.3611e+00,
           4.4412e+00,  1.8678e+00],
         [-1.1229e-01,  6.4918e-01, -8.6619e-01,  ..., -1.7007e+00,
           4.4069e+00,  1.7709e+00]],

        [[ 9.8255e+00,  9.2683e-01,  5.2733e-01,  ..., -1.9350e-01,
           1.8855e+00, -1.2510e+00],
         [ 4.5204e-01,  8.0860e-01,  8.7983e+00,  ...,  1.6027e+00,
           3.0442e+00,  1.4045e+00],
         [ 1.0541e-01,  1.2123e+00,  9.4227e+00,  ...,  2.0576e+00,
           3.2600e+00,  1.1319e+00],
         ...,
         [ 1.1826e-01,  9.7299e-01,  4.0329e-01,  ..., -3.3727e+00,
           4.6210e+00,  1.6874e+00],
         [ 7.0923e-01,  1.0117e+00,  3.8741e-01,  ..., -1.7700e+00,
           4.7787e+00,  1.8800e+00],
         [ 8.1268e-01,  2.6620e-01,  2.1668e-01,  ..., -1.9421e+00,
           4.9479e+00,  1.9297e+00]],

        [[ 4.1962e-01,  4.1222e-01,  8.9894e+00,  ...,  3.7024e+00,
           5.6398e-01,  9.1150e-01],
         [ 4.0182e-01,  7.2579e-01,  8.7252e+00,  ..., -1.3470e+00,
           1.6876e+00, -9.5219e-01],
         [ 5.8073e-01,  2.5503e-01,  9.2737e+00,  ..., -5.1724e-01,
           1.8241e+00, -1.4023e+00],
         ...,
         [ 8.9015e+00,  4.8112e-01,  5.4773e-01,  ...,  3.1965e+00,
           2.5537e-01, -3.1801e+00],
         [ 2.7294e-01,  1.3310e+00,  1.0006e+01,  ..., -1.3543e+00,
           1.3383e+00, -1.2746e+00],
         [ 1.4104e-01,  7.2310e-01,  1.0169e+01,  ..., -1.5161e+00,
           1.4063e+00, -1.4794e+00]]], grad_fn=<NativeLayerNormBackward>)
Local Output:
tensor([[[    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         ...,
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],

        [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         ...,
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan],
         [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],

        [[ 0.2749, -0.0732,  8.1727,  ...,  2.6264,  0.5370,  0.0138],
         [-0.4159,  0.2013,  8.5345,  ..., -1.4085,  1.6111, -1.8517],
         [-0.4869,  0.2738,  8.3643,  ..., -2.5074,  1.0786, -2.1240],
         ...,
         [ 8.7382, -0.1080, -0.1985,  ...,  1.4713,  0.0980, -4.0366],
         [-0.3936, -0.1394,  9.1536,  ..., -2.3590,  1.0853, -2.0395],
         [-0.1424, -0.1040,  9.6754,  ..., -1.8460,  1.4940, -1.4138]]],
       grad_fn=<NativeLayerNormBackward>)

Do not work with pytorch 1.7

  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
    from ..causal_product import causal_dot_product
  File "/usr/local/lib/python3.6/dist-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /usr/local/lib/python3.6/dist-packages/fast_transformers/causal_product/causal_product_cpu.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl23ExcludeDispatchKeyGuardC1ENS_11DispatchKeyE

feature map function ฯ† (x) = elu(x) + 1,

Hi,
Thanks for your great work!
I have some questions, Why choose elu(x) + 1 as the feature map function,Is it suitable for sequences of different lengths? What conditions does the feature map function need to meet?
Thanks for your help.

Problems Installing on Debian GNU/Linux 10 (buster) using Python 3.7.3

Hi,

first of all, thank you for your great work! Unfortunately I have troubles installing your package:

System setup

  • Debian GNU/Linux 10 (buster)
  • Python 3.7.3 in a clean venv
  • Cuda V10.1.243
  • g++ (Debian 8.3.0-6) 8.3.0
  • Pytorch according to pytorch.org pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

The installed packages look like this:

Package       Version
------------- -----------------
future        0.18.2
numpy         1.19.1
Pillow        7.2.0
pip           18.1
pkg-resources 0.0.0
pygpu         0.7.6+20.g9cec614
setuptools    40.8.0
torch         1.5.1+cu101
torchvision   0.6.1+cu101

When I run pip install pytorch-fast-transformers I get a couple of errors:

Collecting pytorch-fast-transformers
  Cache entry deserialization failed, entry ignored
  Using cached https://files.pythonhosted.org/packages/4d/e9/7c352eb727b87ea71b23f00a52de80d40dc0398937a82044692eb26a2fb7/pytorch-fast-transformers-0.1.3.tar.gz
Requirement already satisfied: torch in ./fast_transformer/lib/python3.7/site-packages (from pytorch-fast-transformers) (1.5.1+cu101)
Requirement already satisfied: numpy in ./fast_transformer/lib/python3.7/site-packages (from torch->pytorch-fast-transformers) (1.19.1)
Requirement already satisfied: future in ./fast_transformer/lib/python3.7/site-packages (from torch->pytorch-fast-transformers) (0.18.2)
Building wheels for collected packages: pytorch-fast-transformers
  Running setup.py bdist_wheel for pytorch-fast-transformers ... error
  Failed building wheel for pytorch-fast-transformers
  Running setup.py clean for pytorch-fast-transformers
Failed to build pytorch-fast-transformers
Installing collected packages: pytorch-fast-transformers
  Running setup.py install for pytorch-fast-transformers ... error
Command /home/user/envs/fast_transformer/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-vcryr7p1/pytorch-fast-transformers/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3ckyuo8i/install-record.txt --single-version-externally-managed --compile --install-headers /home/user/envs/fast_transformer/include/site/python3.7/pytorch-fast-transformers" failed with error code 1 in /tmp/pip-install-vcryr7p1/pytorch-fast-transformers/

There are a lot of deprecation warnings in the logs but the critical part seems to be:

    x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.7/fast_transformers/causal_product/causal_product_cpu.o -L /home/user/envs/fast_transformer/lib/python3.7/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.7/fast_transformers/causal_product/causal_product_cpu.cpython-37m-x86_64-linux-gnu.so
    building 'fast_transformers.hashing.hash_cuda' extension
    /opt/cuda_10.1/bin/nvcc -I/home/user/envs/fast_transformer/lib/python3.7/site-packages/torch/include -I/home/user/envs/fast_transformer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/envs/fast_transformer/lib/python3.7/site-packages/torch/include/TH -I/home/user/envs/fast_transformer/lib/python3.7/site-packages/torch/include/THC -I/opt/cuda_10.1/include -I/home/user/envs/fast_transformer/include -I/usr/include/python3.7m -c fast_transformers/hashing/hash_cuda.cu -o build/temp.linux-x86_64-3.7/fast_transformers/hashing/hash_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -arch=compute_50 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=hash_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    sh: 1: cicc: Permission denied
    error: command '/opt/cuda_10.1/bin/nvcc' failed with exit status 126
Cleaning up...
  Removing source in /tmp/pip-install-vcryr7p1/pytorch-fast-transformers
Removed build tracker '/tmp/pip-req-tracker-ussod1ct'
Command "/home/user/envs/fast_transformer/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-vcryr7p1/pytorch-fast-transformers/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3ckyuo8i/install-record.txt --single-version-externally-managed --compile --install-headers /home/user/envs/fast_transformer/include/site/python3.7/pytorch-fast-transformers" failed with error code 1 in /tmp/pip-install-vcryr7p1/pytorch-fast-transformers/
Exception information:
Traceback (most recent call last):
  File "/home/user/envs/fast_transformer/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 143, in main
    status = self.run(options, args)
  File "/home/user/envs/fast_transformer/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 386, in run
    use_user_site=options.use_user_site,
  File "/home/user/envs/fast_transformer/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 49, in install_given_reqs
    **kwargs
  File "/home/user/envs/fast_transformer/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 791, in install
    spinner=spinner,
  File "/home/user/envs/fast_transformer/lib/python3.7/site-packages/pip/_internal/utils/misc.py", line 723, in call_subprocess
    % (command_desc, proc.returncode, cwd))
pip._internal.exceptions.InstallationError: Command "/home/user/envs/fast_transformer/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-vcryr7p1/pytorch-fast-transformers/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3ckyuo8i/install-record.txt --single-version-externally-managed --compile --install-headers /home/user/envs/fast_transformer/include/site/python3.7/pytorch-fast-transformers" failed with error code 1 in /tmp/pip-install-vcryr7p1/pytorch-fast-transformers/    

I hope that is sufficient information.

As a general note: it would be helpful if the requirements were stated a bit more clearly (OS, cuda version, pytorch version and compiler)

Thank you for your help.

(Note: the logs are slightly changed in order to obscure user and system information)

None type error with local attention

Trying to run the example from the readme using local attention instead of linear attention. I changed the attention_type and added an additional argument in the TransformerEncoderBuilder.from_kwargs method:

import torch
from fast_transformers.builders import TransformerEncoderBuilder

# Create the builder for our transformers
builder = TransformerEncoderBuilder.from_kwargs(
    n_layers=8,
    n_heads=8,
    query_dimensions=64,
    value_dimensions=64,
    feed_forward_dimensions=1024,
    local_context=8, #ADDED THIS LINE
)

# Build a transformer with softmax attention
builder.attention_type = "full"
softmax_model = builder.get()

# Build a transformer with linear attention
builder.attention_type = "local" #CHANGED THIS LINE
linear_model = builder.get()

# Construct the dummy input
X = torch.rand(10, 1000, 8*64)

# Prepare everythin for CUDA
X = X.cuda()
softmax_model.cuda()
softmax_model.eval()
linear_model.cuda()
linear_model.eval()

# Warmup the GPU
with torch.no_grad():
    softmax_model(X)
    linear_model(X)
torch.cuda.synchronize()

# Measure the execution time
softmax_start = torch.cuda.Event(enable_timing=True)
softmax_end = torch.cuda.Event(enable_timing=True)
linear_start = torch.cuda.Event(enable_timing=True)
linear_end = torch.cuda.Event(enable_timing=True)

with torch.no_grad():
    softmax_start.record()
    y = softmax_model(X)
    softmax_end.record()
    torch.cuda.synchronize()
    print("Softmax: ", softmax_start.elapsed_time(softmax_end), "ms")
    # Softmax: 144 ms (on a GTX1080Ti)

with torch.no_grad():
    linear_start.record()
    y = linear_model(X)
    linear_end.record()
    torch.cuda.synchronize()
    print("Linear: ", linear_start.elapsed_time(linear_end), "ms")
    # Linear: 68 ms (on a GTX1080Ti)

The exemple throws an error:

Traceback (most recent call last):
  File "_.py", line 35, in <module>
    linear_model(X)
  File "/home/jjhon/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jjhon/.local/lib/python3.6/site-packages/fast_transformers/transformers.py", line 139, in forward
    x = layer(x, attn_mask=attn_mask, length_mask=length_mask)
  File "/home/jjhon/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jjhon/.local/lib/python3.6/site-packages/fast_transformers/transformers.py", line 81, in forward
    key_lengths=length_mask
  File "/home/jjhon/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jjhon/.local/lib/python3.6/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward
    key_lengths
  File "/home/jjhon/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jjhon/.local/lib/python3.6/site-packages/fast_transformers/attention/local_attention.py", line 82, in forward
    self.local_context
  File "/home/jjhon/.local/lib/python3.6/site-packages/fast_transformers/local_product/__init__.py", line 49, in forward
    local_context
TypeError: 'NoneType' object is not callable

It does work with the other attention modules.
Am I doing something wrong?
Is the local_context argument supposed to be an integer?

Thank you.

EDIT: looks like it is failing using cuda only (pytorch 1.6 with cuda 10.1), it works on the cpu

EDIT2: fixed using --no-cache-dir argument when installing with pip (to recompile)

windows installation error linking local_product_cuda.cu

I've been trying to install on windows using pip and it looks like I'm almost there. I get through compiling everything and then I get an error when trying to complete linking of local-product-cuda.

System: Win 10, cuda 10.2.89 , pytorch 1.6, python 3.8

traceback:
local_product_cuda.cu
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\lib\site-packages\torch\lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib/x64" /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\libs /LIBPATH:C:\Users\user\Anaconda3\envs\testenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda.lib /EXPORT:PyInit_local_product_cuda C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product/local_product_cuda.obj /OUT:build\lib.win-amd64-3.8\fast_transformers\local_product\local_product_cuda.cp38-win_amd64.pyd /IMPLIB:C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.lib

Creating library C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.lib and object C:\Users\user\AppData\Local\Temp\pip-install-x6t631um\pytorch-fast-transformers\build\temp.win-amd64-3.8\Release\fast_transformers/local_product\local_product_cuda.cp38-win_amd64.exp

local_product_cuda.obj : error LNK2001: unresolved external symbol "public: long * __cdecl at::Tensor::data_ptr(void)const " (??$data_ptr@J@Tensor@at@@QEBAPEAJXZ)

build\lib.win-amd64-3.8\fast_transformers\local_product\local_product_cuda.cp38-
win_amd64.pyd : fatal error LNK1120: 1 unresolved externals

error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe' failed with exit status 1120

I've been investigating for quite a few hours now, but I can't figure out why I'm getting a linking error. From searching the error it seems like it's some issue with the .lib or function definition not being accessible to the .obj, but it seems like both the .lib and .obj are being created, and I'm assuming all definitions are wrapped into the pip bundle if others are able to install. I wanted to post here in case it is an issue with the dependencies somewhere or something getting messed up with windows. Anyone else having this problem or have an idea where to start in solving it?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.