GithubHelp home page GithubHelp logo

fourthbrain / building-with-instruction-tuned-llms-a-step-by-step-guide Goto Github PK

View Code? Open in Web Editor NEW
176.0 7.0 43.0 11 KB

Resources relating to the DLAI event: https://www.youtube.com/watch?v=eTieetk2dSw

building-with-instruction-tuned-llms-a-step-by-step-guide's Introduction

๐Ÿ‘‹ Welcome to the Support Repository for the DeepLearningAI Event: Building with Instruction-Tuned LLMs: A Step-by-Step Guide

Here are a collection of resources you can use to help fine-tune your LLMs, as well as create a few simple LLM powered applications!

๐Ÿ—ฃ๏ธ Slides:

Here are the slides used for the event: Building with Instruction-Tuned LLMs

๐Ÿชก Fine-tuning Approaches:

Instruct-tuning OpenLM's OpenLLaMA on the Dolly-15k Dataset Notebooks:

Notebook Purpose Link
Instruct-tuning Leveraging QLoRA Supervised fine-tuning! Here
Instruct-tuning Leveraging Lit-LLaMA Using Lightning-AI's Lit-LLaMA frame for Supervised fine-tuning Here
Natural Language to SQL fine-tuning using Lit-LLaMA Using Lightning-AI's Lit-LLaMA frame for Supervised fine-tuning on the Natural Language to SQL task Here

MarketMail Using BLOOMz Resources:

Notebook Purpose Link
BLOOMz-LoRA Unsupervised Fine-tuning Notebook Fine-tuning BLOOMz with an unsupervised approach using Sythetic Data! Here
Creating Synthetic Data with GPT-3.5-turbo Generate Data for Your Model! Here

๐Ÿ—๏ธ Converting Models into Applications:

Notebook Purpose Link
Open-source LangChain Example Leveraging LangChain to build a Hugging Face ๐Ÿค— Powered Application Here
Open AI LangChain Example Building an Open AI Powered Application Here

๐Ÿ–ฅ๏ธ Demos:

Demo Info Link
Instruct-tuned Chatbot Leveraging QLoRA This demo is currently powered by the Guanaco Model - will be updated once our instruct-tuned model finishes training! Here
TalkToMyDoc Query the first Hitch Hiker's Guide book! Here

building-with-instruction-tuned-llms-a-step-by-step-guide's People

Contributors

chris-alexiuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

building-with-instruction-tuned-llms-a-step-by-step-guide's Issues

CUDA error: device-side assert triggered; CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I'm re-running the Supervised_Instruct_tuning_OpenLLaMA_... notebook on Colab Pro with A100 GPU. I got the following error during supervised_finetuning_trainer.train() step (a quick Stack Overflow search suggests shape mismatch as the likely cause, but I did not change anything in the code):

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ in <cell line: 1>:1 โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1696 in train โ”‚
โ”‚ โ”‚
โ”‚ 1693 โ”‚ โ”‚ inner_training_loop = find_executable_batch_size( โ”‚
โ”‚ 1694 โ”‚ โ”‚ โ”‚ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size โ”‚
โ”‚ 1695 โ”‚ โ”‚ ) โ”‚
โ”‚ โฑ 1696 โ”‚ โ”‚ return inner_training_loop( โ”‚
โ”‚ 1697 โ”‚ โ”‚ โ”‚ args=args, โ”‚
โ”‚ 1698 โ”‚ โ”‚ โ”‚ resume_from_checkpoint=resume_from_checkpoint, โ”‚
โ”‚ 1699 โ”‚ โ”‚ โ”‚ trial=trial, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1973 in _inner_training_loop โ”‚
โ”‚ โ”‚
โ”‚ 1970 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ with model.no_sync(): โ”‚
โ”‚ 1971 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ tr_loss_step = self.training_step(model, inputs) โ”‚
โ”‚ 1972 โ”‚ โ”‚ โ”‚ โ”‚ else: โ”‚
โ”‚ โฑ 1973 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ tr_loss_step = self.training_step(model, inputs) โ”‚
โ”‚ 1974 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ 1975 โ”‚ โ”‚ โ”‚ โ”‚ if ( โ”‚
โ”‚ 1976 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ args.logging_nan_inf_filter โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2787 in training_step โ”‚
โ”‚ โ”‚
โ”‚ 2784 โ”‚ โ”‚ โ”‚ return loss_mb.reduce_mean().detach().to(self.args.device) โ”‚
โ”‚ 2785 โ”‚ โ”‚ โ”‚
โ”‚ 2786 โ”‚ โ”‚ with self.compute_loss_context_manager(): โ”‚
โ”‚ โฑ 2787 โ”‚ โ”‚ โ”‚ loss = self.compute_loss(model, inputs) โ”‚
โ”‚ 2788 โ”‚ โ”‚ โ”‚
โ”‚ 2789 โ”‚ โ”‚ if self.args.n_gpu > 1: โ”‚
โ”‚ 2790 โ”‚ โ”‚ โ”‚ loss = loss.mean() # mean() to average on multi-gpu parallel training โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2819 in compute_loss โ”‚
โ”‚ โ”‚
โ”‚ 2816 โ”‚ โ”‚ โ”‚ labels = inputs.pop("labels") โ”‚
โ”‚ 2817 โ”‚ โ”‚ else: โ”‚
โ”‚ 2818 โ”‚ โ”‚ โ”‚ labels = None โ”‚
โ”‚ โฑ 2819 โ”‚ โ”‚ outputs = model(**inputs) โ”‚
โ”‚ 2820 โ”‚ โ”‚ # Save past state if it exists โ”‚
โ”‚ 2821 โ”‚ โ”‚ # TODO: this needs to be fixed and made cleaner later. โ”‚
โ”‚ 2822 โ”‚ โ”‚ if self.args.past_index >= 0: โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl โ”‚
โ”‚ โ”‚
โ”‚ 1498 โ”‚ โ”‚ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks โ”‚
โ”‚ 1499 โ”‚ โ”‚ โ”‚ โ”‚ or _global_backward_pre_hooks or _global_backward_hooks โ”‚
โ”‚ 1500 โ”‚ โ”‚ โ”‚ โ”‚ or _global_forward_hooks or _global_forward_pre_hooks): โ”‚
โ”‚ โฑ 1501 โ”‚ โ”‚ โ”‚ return forward_call(*args, **kwargs) โ”‚
โ”‚ 1502 โ”‚ โ”‚ # Do not call functions when jit is used โ”‚
โ”‚ 1503 โ”‚ โ”‚ full_backward_hooks, non_full_backward_hooks = [], [] โ”‚
โ”‚ 1504 โ”‚ โ”‚ backward_pre_hooks = [] โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:827 in forward โ”‚
โ”‚ โ”‚
โ”‚ 824 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ **kwargs, โ”‚
โ”‚ 825 โ”‚ โ”‚ โ”‚ โ”‚ ) โ”‚
โ”‚ 826 โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โฑ 827 โ”‚ โ”‚ โ”‚ return self.base_model( โ”‚
โ”‚ 828 โ”‚ โ”‚ โ”‚ โ”‚ input_ids=input_ids, โ”‚
โ”‚ 829 โ”‚ โ”‚ โ”‚ โ”‚ attention_mask=attention_mask, โ”‚
โ”‚ 830 โ”‚ โ”‚ โ”‚ โ”‚ inputs_embeds=inputs_embeds, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl โ”‚
โ”‚ โ”‚
โ”‚ 1498 โ”‚ โ”‚ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks โ”‚
โ”‚ 1499 โ”‚ โ”‚ โ”‚ โ”‚ or _global_backward_pre_hooks or _global_backward_hooks โ”‚
โ”‚ 1500 โ”‚ โ”‚ โ”‚ โ”‚ or _global_forward_hooks or _global_forward_pre_hooks): โ”‚
โ”‚ โฑ 1501 โ”‚ โ”‚ โ”‚ return forward_call(*args, **kwargs) โ”‚
โ”‚ 1502 โ”‚ โ”‚ # Do not call functions when jit is used โ”‚
โ”‚ 1503 โ”‚ โ”‚ full_backward_hooks, non_full_backward_hooks = [], [] โ”‚
โ”‚ 1504 โ”‚ โ”‚ backward_pre_hooks = [] โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward โ”‚
โ”‚ โ”‚
โ”‚ 162 โ”‚ โ”‚ โ”‚ with torch.no_grad(): โ”‚
โ”‚ 163 โ”‚ โ”‚ โ”‚ โ”‚ output = old_forward(*args, **kwargs) โ”‚
โ”‚ 164 โ”‚ โ”‚ else: โ”‚
โ”‚ โฑ 165 โ”‚ โ”‚ โ”‚ output = old_forward(*args, **kwargs) โ”‚
โ”‚ 166 โ”‚ โ”‚ return module._hf_hook.post_forward(module, output) โ”‚
โ”‚ 167 โ”‚ โ”‚
โ”‚ 168 โ”‚ module.forward = new_forward โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:687 in โ”‚
โ”‚ forward โ”‚
โ”‚ โ”‚
โ”‚ 684 โ”‚ โ”‚ return_dict = return_dict if return_dict is not None else self.config.use_return โ”‚
โ”‚ 685 โ”‚ โ”‚ โ”‚
โ”‚ 686 โ”‚ โ”‚ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) โ”‚
โ”‚ โฑ 687 โ”‚ โ”‚ outputs = self.model( โ”‚
โ”‚ 688 โ”‚ โ”‚ โ”‚ input_ids=input_ids, โ”‚
โ”‚ 689 โ”‚ โ”‚ โ”‚ attention_mask=attention_mask, โ”‚
โ”‚ 690 โ”‚ โ”‚ โ”‚ position_ids=position_ids, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl โ”‚
โ”‚ โ”‚
โ”‚ 1498 โ”‚ โ”‚ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks โ”‚
โ”‚ 1499 โ”‚ โ”‚ โ”‚ โ”‚ or _global_backward_pre_hooks or _global_backward_hooks โ”‚
โ”‚ 1500 โ”‚ โ”‚ โ”‚ โ”‚ or _global_forward_hooks or _global_forward_pre_hooks): โ”‚
โ”‚ โฑ 1501 โ”‚ โ”‚ โ”‚ return forward_call(*args, **kwargs) โ”‚
โ”‚ 1502 โ”‚ โ”‚ # Do not call functions when jit is used โ”‚
โ”‚ 1503 โ”‚ โ”‚ full_backward_hooks, non_full_backward_hooks = [], [] โ”‚
โ”‚ 1504 โ”‚ โ”‚ backward_pre_hooks = [] โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward โ”‚
โ”‚ โ”‚
โ”‚ 162 โ”‚ โ”‚ โ”‚ with torch.no_grad(): โ”‚
โ”‚ 163 โ”‚ โ”‚ โ”‚ โ”‚ output = old_forward(*args, **kwargs) โ”‚
โ”‚ 164 โ”‚ โ”‚ else: โ”‚
โ”‚ โฑ 165 โ”‚ โ”‚ โ”‚ output = old_forward(*args, **kwargs) โ”‚
โ”‚ 166 โ”‚ โ”‚ return module._hf_hook.post_forward(module, output) โ”‚
โ”‚ 167 โ”‚ โ”‚
โ”‚ 168 โ”‚ module.forward = new_forward โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:536 in โ”‚
โ”‚ forward โ”‚
โ”‚ โ”‚
โ”‚ 533 โ”‚ โ”‚ โ”‚ attention_mask = torch.ones( โ”‚
โ”‚ 534 โ”‚ โ”‚ โ”‚ โ”‚ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embe โ”‚
โ”‚ 535 โ”‚ โ”‚ โ”‚ ) โ”‚
โ”‚ โฑ 536 โ”‚ โ”‚ attention_mask = self._prepare_decoder_attention_mask( โ”‚
โ”‚ 537 โ”‚ โ”‚ โ”‚ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_len โ”‚
โ”‚ 538 โ”‚ โ”‚ ) โ”‚
โ”‚ 539 โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:464 in โ”‚
โ”‚ _prepare_decoder_attention_mask โ”‚
โ”‚ โ”‚
โ”‚ 461 โ”‚ โ”‚ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] โ”‚
โ”‚ 462 โ”‚ โ”‚ combined_attention_mask = None โ”‚
โ”‚ 463 โ”‚ โ”‚ if input_shape[-1] > 1: โ”‚
โ”‚ โฑ 464 โ”‚ โ”‚ โ”‚ combined_attention_mask = _make_causal_mask( โ”‚
โ”‚ 465 โ”‚ โ”‚ โ”‚ โ”‚ input_shape, โ”‚
โ”‚ 466 โ”‚ โ”‚ โ”‚ โ”‚ inputs_embeds.dtype, โ”‚
โ”‚ 467 โ”‚ โ”‚ โ”‚ โ”‚ device=inputs_embeds.device, โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:49 in โ”‚
โ”‚ make_causal_mask โ”‚
โ”‚ โ”‚
โ”‚ 46 โ”‚ Make causal mask used for bi-directional self-attention. โ”‚
โ”‚ 47 โ”‚ """ โ”‚
โ”‚ 48 โ”‚ bsz, tgt_len = input_ids_shape โ”‚
โ”‚ โฑ 49 โ”‚ mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=de โ”‚
โ”‚ 50 โ”‚ mask_cond = torch.arange(mask.size(-1), device=device) โ”‚
โ”‚ 51 โ”‚ mask.masked_fill
(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0) โ”‚
โ”‚ 52 โ”‚ mask = mask.to(dtype) โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Bug report: uploading model to Huggingface failed

Hi, after I finished training and trying to upload model to HF using command

base_model.push_to_hub("zbruceli/openLLaMA_QLora", private=True)

The following error occurred:

NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported

Are there something I need to change when I create the model/repo on HF? Thank you!

Uploading the tokenizer works fine, so it is not a HF token issue.

BloomForSequenceClassification' does not have a lm_head ... can this technique still apply?

re the notebook :โœ‰๏ธ MarketMail AI โœ‰๏ธ Fine tuning BLOOMZ (Completed Version).ipynb

https://colab.research.google.com/drive/1ARmlaZZaKyAg6HTi57psFLPeh0hDRcPX?usp=sharing

i tried to modify the example to use BloomForSequenceClassification instead of AutoModelForCausalLM but the "Post-processing on the model":
model.lm_head = CastOutputToFloat(model.lm_head)
fails because BloomForSequenceClassification does not have an attribute lm_head.

This is true, so i change code to try and affect the last layer of BloomForSequenceClassification:
model.ln_f = CastOutputToFloat(model.ln_f)
This also fails: AttributeError: 'BloomForSequenceClassification' object has no attribute 'ln_f'

This leaves me wondering can thus gradient accumulation work for BloomForSequenceClassification? Or only for AutoModelForCausalLM? Alternatively, does anyone know if AutoModelForCausalLM can be used for fine tuning a classification task equally well as BloomForSequenceClassification?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.