GithubHelp home page GithubHelp logo

mpt-lora-patch's Introduction

MPT-7B LoRA Patch

This is the Python model code for MPT-7B patched so that it can be used with a LoRA. Note that while I tested that it works and I get reasonable results out, it is very possible that the model isn't being trained correctly. The model code specifically says that left padding is not supported, but I forcibly did so and got decent results.

Note that when using LoRA, there is a strange quirk that prevents me from causing generation with an empty prompt.

I also included a model-agnostic export_hf_checkpoint.py script, which you can use to merge your lora back into a new full model. Once you do this, you do not need to use the patched version of the model code anymore. That being said, if you want to be able to load the model in 8bit you will still need it. The usage is python export_hf_checkpoint.py <source> <lora> <dest>.

If you would like to use this with text-generation-webui, apply the following patch:

--- a/modules/training.py
+++ b/modules/training.py
@@ -28,12 +28,13 @@ try:
     MODEL_CLASSES = {v: k for k, v in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES}
 except:
     standard_modules = ["q_proj", "v_proj"]
-    model_to_lora_modules = {"llama": standard_modules, "opt": standard_modules, "gptj": standard_modules, "gpt_neox": ["query_key_value"]}
+    model_to_lora_modules = {"llama": standard_modules, "opt": standard_modules, "gptj": standard_modules, "gpt_neox": ["query_key_value"], "mpt": ["Wqkv"]}
     MODEL_CLASSES = {
         "LlamaForCausalLM": "llama",
         "OPTForCausalLM": "opt",
         "GPTJForCausalLM": "gptj",
-        "GPTNeoXForCausalLM": "gpt_neox"
+        "GPTNeoXForCausalLM": "gpt_neox",
+        "MPTForCausalLM": "mpt"
     }

 WANT_INTERRUPT = False

You will need to run the webui with these options:

python server.py --model mosaicml_mpt-7b-instruct --trust-remote-code --load-in-8bit

You may also need to patch bitsandbytes/nn/modules.py to prevent running out of VRAM when saving the LoRA:

--- a/modules.py
+++ b/modules.py
@@ -259,13 +259,13 @@
         if not self.state.has_fp16_weights and self.state.CB is None and self.state.CxB is not None:
             # reorder weight layout back from ampere/turing to row
             reorder_layout = True
-            weight_clone = self.weight.data.clone()
+            weight_clone = self.weight.data
         else:
             reorder_layout = False

         try:
             if reorder_layout:
-                self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
+                self.weight.data = undo_layout(self.state.CxB.cpu(), self.state.tile_indices.cpu())

             super()._save_to_state_dict(destination, prefix, keep_vars)

(It resides in miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/nn/modules.py for me.)

You can find the source model here: mosaicml/mpt-7b-instruct

The alterations are based on the source code for the llama model from HF Transformers.

Model License

CC-By-SA-3.0

mpt-lora-patch's People

Contributors

iwalton3 avatar retarfi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.