Compiling and loading Llama 2 in Neuron is working great for me on a <code class="notr

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Could I get help on this <a class="user-mention notranslate" data-hovercard-type="user

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Generate Llama 2 from Embeddings about transformers-neuronx HOT 5 OPEN

liechtym commented on August 16, 2024

Generate Llama 2 from Embeddings

from transformers-neuronx.

Comments (5)

liechtym commented on August 16, 2024

In transformers repo they said the HuggingFaceGenerationModelAdapter incompatibility error is probably stemming from the tranfomers-neuronx wrapper. Any help with this?

Here is the error:

Traceback (most recent call last):
  File "modular.py", line 107, in <module>
    chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
  File "modular.py", line 62, in __init__
    self.model = model_cls.from_config(model_config)
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
    model = cls(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
    super().__init__(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
    self.llama_model, self.llama_tokenizer = self.init_llm(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
    llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
    super().__init__(config)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
    config = cls._check_and_enable_sdpa(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
    raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

See more details on the issue page: huggingface/transformers#28396.

Of course my general goal is to simply get this working with input embeddings so if this is not the right route, let me know.

from transformers-neuronx.

shebbur-aws commented on August 16, 2024

Hi @liechtym , We do not have support for external embeddings. One way you could potentially get around this is by replacing the model embedding weights directly. Please let us know if that helps.

from transformers-neuronx.

liechtym commented on August 16, 2024

@shebbur-aws Thanks for your reply. A workaround is totally fine for me. Would you be able to give a quick explanation/example for how to replace the embedding weights and run the forward pass on the rest of the model?

from transformers-neuronx.

liechtym commented on August 16, 2024

Could I get help on this @shebbur-aws ?

from transformers-neuronx.

davidshtian commented on August 16, 2024

@liechtym @shebbur-aws Hi~ I've got the same situation here, do you have any resolution or workaround on this? Input embeds as model input parameter instead of input ids. Thanks~

from transformers-neuronx.

Generate Llama 2 from Embeddings about transformers-neuronx HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs