TensorFlow implementation of Continual Transformer building blocks, which augment regular transformer layers with the ability to compute the attention output per token step.
The layers are modelled on the tf.keras.layers.MultiHeadAttention
and should work as drop-in replacements in most cases.
Continual Transformers and its modules can be installed in in your project using:
pip install git+https://github.com/LukasHedegaard/continual-transformers-tf.git
from continual_transformers_tf import CoSiMultiHeadAttention
layer = CoSiMultiHeadAttention(seq_len=10, num_heads=2, key_dim=4)
![](figures/CoSiDotProductAttention.png)
from continual_transformers_tf import CircularPositionalEncoding
layer = CircularPositionalEncoding(max_len=10, embed_dim=4)
![](figures/CircularPositionalEncoding.png)
from continual_transformers_tf import CoSiTransformerEncoder
layer = CoSiTransformerEncoder(
seq_len=10,
embed_dim=4,
num_heads=2,
ff_dim=16,
dropout_rate=0.1,
)
@article{hedegaard2022cotrans,
title={Continual Transformers: Redundancy-Free Attention for Online Inference},
author={Lukas Hedegaard and Alexandros Iosifidis},
journal={preprint, arXiv:2201.06268},
year={2022}
}