GithubHelp home page GithubHelp logo

muditbhargava66 / pyxlstm Goto Github PK

View Code? Open in Web Editor NEW
224.0 224.0 22.0 123 KB

Efficient Python library for Extended LSTM with exponential gating, memory mixing, and matrix memory for superior sequence modeling.

Home Page: https://pyxlstm.readthedocs.io/

License: MIT License

Python 100.00%
language-modeling lstm sequence-modeling xlstm

pyxlstm's Introduction

Hi there, I'm Mudit Bhargava! ๐Ÿ‘‹

Connect with Me

LinkedIn | Twitter | Personal Website | Email

Feel free to explore my repositories and contributions below!

๐Ÿ”ฌ My Expertise

                   Computer Architecture
                 & Hardware Design
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚                             โ”‚
           โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
           โ”‚  โ”‚                       โ”‚  โ”‚
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”           โ”Œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”
     โ”‚     โ”‚  โ”‚     โ”‚           โ”‚     โ”‚  โ”‚     โ”‚
     โ”‚     โ”‚  โ”‚     โ”‚           โ”‚     โ”‚  โ”‚     โ”‚
     โ”‚  High-Performance        Communication  โ”‚
     โ”‚  Computing &              Systems &     โ”‚
     โ”‚  Optimization             Protocols     โ”‚
     โ”‚     โ”‚  โ”‚     โ”‚           โ”‚     โ”‚  โ”‚     โ”‚
     โ”‚     โ”‚  โ”‚     โ”‚           โ”‚     โ”‚  โ”‚     โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”˜           โ””โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚  โ”‚        Machine        โ”‚  โ”‚
           โ”‚  โ”‚       Learning        โ”‚  โ”‚
           โ”‚  โ”‚         & AI          โ”‚  โ”‚
           โ”‚  โ”‚                       โ”‚  โ”‚
           โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
           โ”‚                             โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
๐ŸŒž Morning                 0 tasks        โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘    0 % 
๐ŸŒ† Daytime                20 tasks        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   20 %
๐ŸŒƒ Evening                40 tasks        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   40 %
๐ŸŒ™ Night                  40 tasks        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘   40 %

pyxlstm's People

Contributors

alifa98 avatar muditbhargava66 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pyxlstm's Issues

Fix sLSTM and mLSTM to work with 1 layer

sLSTM and mLSTM have dropout layers with length of 1 less than other layers:

self.dropout_layers = nn.ModuleList([nn.Dropout(dropout) for _ in range(num_layers - 1)])

So if you will try to run models with only one layer, no layers will be applied to input because of this for:

for i, (lstm, dropout, f_gate, i_gate) in enumerate(zip(self.lstms, self.dropout_layers, self.exp_forget_gates, self.exp_input_gates))

unit tests as per README fail

The unit tests as per README.md fail with this installation procedure:

$ cat environment.yml 
name: xlstm
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python
  - pip
$ mamba env create -f environment.yml
$ pip install -r requirements.txt

On Ubuntu 22.04.4 LTS
With NVIDIA Server Driver metapackage from nvidia-driver-535-server (proprietary)

Then, as per this closed issue regarding setup.py:

$ pip install .
Successfully installed PyxLSTM-1.0.1
$ python -m unittest discover tests
EEEEEEEEE
======================================================================
ERROR: test_backward_pass (test_block.TestXLSTMBlock.test_backward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_block.py", line 37, in test_backward_pass
    output_seq, _ = xlstm_block(input_seq)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

======================================================================
ERROR: test_forward_pass_mlstm (test_block.TestXLSTMBlock.test_forward_pass_mlstm)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_block.py", line 27, in test_forward_pass_mlstm
    output_seq, hidden_state = xlstm_block(input_seq)
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/mlstm.py", line 68, in forward
    C_t = f * C + i * torch.matmul(values.unsqueeze(2), keys.unsqueeze(1)).squeeze(1)
          ~~^~~
RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 1

======================================================================
ERROR: test_forward_pass_slstm (test_block.TestXLSTMBlock.test_forward_pass_slstm)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_block.py", line 17, in test_forward_pass_slstm
    output_seq, hidden_state = xlstm_block(input_seq)
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

======================================================================
ERROR: test_backward_pass (test_mlstm.TestMLSTM.test_backward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_mlstm.py", line 27, in test_backward_pass
    output_seq, _ = mlstm(input_seq)
                    ^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/mlstm.py", line 68, in forward
    C_t = f * C + i * torch.matmul(values.unsqueeze(2), keys.unsqueeze(1)).squeeze(1)
          ~~^~~
RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 1

======================================================================
ERROR: test_forward_pass (test_mlstm.TestMLSTM.test_forward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_mlstm.py", line 17, in test_forward_pass
    output_seq, hidden_state = mlstm(input_seq)
                               ^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/mlstm.py", line 68, in forward
    C_t = f * C + i * torch.matmul(values.unsqueeze(2), keys.unsqueeze(1)).squeeze(1)
          ~~^~~
RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 1

======================================================================
ERROR: test_backward_pass (test_model.TestXLSTMModel.test_backward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_model.py", line 29, in test_backward_pass
    output_seq, _ = xlstm_model(input_seq)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/model.py", line 39, in forward
    output_seq, hidden_state = block(output_seq, hidden_states[i])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

======================================================================
ERROR: test_forward_pass (test_model.TestXLSTMModel.test_forward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_model.py", line 20, in test_forward_pass
    output_seq, hidden_states = xlstm_model(input_seq)
                                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/model.py", line 39, in forward
    output_seq, hidden_state = block(output_seq, hidden_states[i])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

======================================================================
ERROR: test_backward_pass (test_slstm.TestSLSTM.test_backward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_slstm.py", line 27, in test_backward_pass
    output_seq, _ = slstm(input_seq)
                    ^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

======================================================================
ERROR: test_forward_pass (test_slstm.TestSLSTM.test_forward_pass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jabowery/devel/xlstm/tests/test_slstm.py", line 17, in test_forward_pass
    output_seq, hidden_state = slstm(input_seq)
                               ^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/mambaforge/envs/xlstm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jabowery/devel/xlstm/xLSTM/slstm.py", line 53, in forward
    if i < self.num_layers - 1:
       ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

----------------------------------------------------------------------
Ran 9 tests in 0.044s

FAILED (errors=9)

RuntimeError: Boolean value of Tensor with more than one value is ambiguous in xLSTM/slstm.py

RuntimeError: Boolean value of Tensor with more than one value is ambiguous in xLSTM/slstm.py

Issue Description

I'm encountering a RuntimeError when attempting to execute a backward pass in my xLSTM model. The error message is:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

This issue occurs in the slstm.py module, specifically at line 53 during a forward pass. Here's the relevant code snippet:

if i < self.num_layers - 1:
    # Further operations

It seems the variable i, which is expected to be an integer index, is somehow being interpreted as a tensor, leading to an ambiguous boolean condition in the if statement. This happens during the operation lstm_output, hidden_state = self.lstm(input_seq, hidden_state), suggesting that there might be an issue with how the LSTM output or hidden state is handled or initialized.

Request for Help
Could someone help clarify what might be going wrong here or suggest modifications to avoid this issue? Any guidance would be greatly appreciated.

Thank you!

ModuleList

for gate in self.exp_forget_gates + self.exp_input_gates:
TypeError: unsupported operand type(s) for +: 'ModuleList' and 'ModuleList'

Hidden layer state output error found when utilizing mlstm

I was running xLSTM_shape_verification.py with lstm_type changed to mlstm and found that the hidden layer output was incorrectly output as:
AttributeError: 'tuple' object has no attribute 'shape'
Output sequence shape: torch.Size([4, 10, 10000])
Hidden states shapes:

When running language_modeling.py with an xlstm composed of mlstm, the loss is almost always 0.

May be wrong code

C = torch.zeros(batch_size, self.hidden_size, self.hidden_size, device=lstm.weight_ih.device)

This code will make a wrong in:

C_t = f * C + i * torch.matmul(values.unsqueeze(2), keys.unsqueeze(1)).squeeze(1)

RuntimeError: The size of tensor a (2534) must match the size of tensor b (256) at non-singleton dimension 1

Can you please check it?

A bug in mLSTM.py

Hi,
I appreciate that you provided a good repo for my research.

But there is a bug in xLSTM/mLSTM.py. The variable 'i' in Line 59 is conflicting with the one in Line 65.

Best regards

NaNs in testing

Issue

Getting NaNs on backward propoagation.

Trace

this was the call1
this was the call12
Batch loss was for batch number {batch_idx}: tensor(9.6656, device='cuda:0', grad_fn=)
this was the call13
NaN detected in gradient of embedding.weight
NaN detected in parameter embedding.weight after update
this was the call1
this was the call12
NaN detected in model output
Epoch 1/1, Average Loss: 0.0863
Training completed! Total time: 2.32 seconds

Code

    print("this was the call1")
    if check_nan(input_seq, "input_seq"):
        break
    
    output, _ = model(input_seq)
    
    print("this was the call12")
    if check_nan(output, "model output"):
        break
    
    output = output.contiguous().view(-1, len(loader_object.vocab))
    target_seq = target_seq.contiguous().view(-1)
    
    loss = criterion(output, target_seq)
    print("Batch loss was for batch number {batch_idx}: ", loss)

    
    print("this was the call13")
    if check_nan(loss, "loss"):
        break

Dataset Used

gwlms/dewiki-20230701-flair-corpus

Possible things to look at

Note: I am using the language_model.py code provided in the repo with the only change being the dataset I am using.

  1. I suspect I might be dealing with exploding gradients. The error "NaN detected in gradient of embedding.weight" is a big clue here. I'm thinking my gradients are probably getting too large during backpropagation.
  2. I'm a bit concerned about my loss function. That loss value of 9.6656 seems pretty high to me (the starting with the random dataset you provided was around 7).

Purpose of Ticket

ruling out the possibility that there might be anything wrong with the way XLSTM has been implemented, additionally would love if you would like to collaborate on getting your implementation to work with a large dataset.

Is pip correct?

Hi,

Sorry, but did you put the package on pip? cuz it seems we do no have such thing ๐Ÿ˜ž

I wanted to try this but:

ERROR: Could not find a version that satisfies the requirement PyxLSTM (from versions: none)
ERROR: No matching distribution found for PyxLSTM

Problem with input gates in sLSTM

In the realization of sLSTM the cell state $c_t$ is updated like this:

c = f * c + i * lstm.weight_hh.new_zeros(batch_size, self.hidden_size)

So because of .new_zeros() input gate makes no sense

RuntimeError: input has inconsistent input_size: got 8 expected 16

I was testing the module using this code :

from xLSTM.model import xLSTM
import torch
model = xLSTM(5, 8, 16, 5, 2, 0.1, True, 'slstm')
inputs = torch.randint(low=0, high=5,size=(12,15000))
outputs = model(inputs)

when I set number of heads > 1 I get this error :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/pyxLSTM/xLSTM/model.py", line 31, in forward
    output_seq, hidden_state = block(output_seq, hidden_states[i])
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/pyxLSTM/xLSTM/block.py", line 55, in forward
    lstm_output, hidden_state = self.lstm(input_seq, hidden_state)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/pyxLSTM/xLSTM/slstm.py", line 47, in forward
    h, c = lstm(x, (hidden_state[a][0], hidden_state[a][1]))
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bkffadia/.local/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 1347, in forward
    ret = _VF.lstm_cell(
RuntimeError: input has inconsistent input_size: got 8 expected 16

Unable to find the get_device implementation

The example given in the README.md file had a import statement that said:

from xLSTM.utils import load_config, set_seed, get_device

I am unable to find the implementation for the get_device function, could you please clarify this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.