GithubHelp home page GithubHelp logo

intellinjun / intel-extension-for-transformers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from intel/intel-extension-for-transformers

0.0 0.0 0.0 70.78 MB

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

License: Apache License 2.0

Shell 0.18% JavaScript 0.18% C++ 44.33% Python 36.68% C 3.67% TypeScript 0.36% CSS 0.05% HTML 9.63% CMake 0.63% Jupyter Notebook 2.32% Dockerfile 0.26% Svelte 1.70%

intel-extension-for-transformers's Introduction

Intel® Extension for Transformers

An innovative toolkit to accelerate Transformer-based models on Intel platforms

Release Notes

🏭Architecture   |   💬NeuralChat   |   😃Inference   |   💻Examples   |   📖Documentations

🚀Latest News

  • NeuralChat, a customizable chatbot framework under Intel® Extension for Transformers, is now available for you to create your own chatbot within minutes! It supports a rich set of plugins Knowledge Retrieval, Speech Interaction, Query Caching, Security Guardrail, and multiple architectures such as Intel® Xeon® Scalable Processors and Habana Gaudi® Accelerator. Check out the below sample code and have a try now!
# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
  • 💬NeuralChat v1.1, a fine-tuned chat model based on MPT-7B using a mixed set of instruction datasets, is available on Hugging Face, together with the release of INT8 quantization recipes and benchmark results.

🏃Installation

Quick Install from Pypi

pip install intel-extension-for-transformers

For more installation method, please refer to Installation Page

🌟Introduction

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features and examples:

🌱Getting Started

Sentiment Analysis with Quantization

Prepare Dataset

from datasets import load_dataset, load_metric
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer

raw_datasets = load_dataset("glue", "sst2")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
raw_datasets = raw_datasets.map(lambda e: tokenizer(e['sentence'], truncation=True, padding='max_length', max_length=128), batched=True)

Quantization

from intel_extension_for_transformers.transformers import QuantizationConfig, metrics, objectives
from intel_extension_for_transformers.transformers.trainer import NLPTrainer

config = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english",config=config)
model.config.label2id = {0: 0, 1: 1}
model.config.id2label = {0: 'NEGATIVE', 1: 'POSITIVE'}
# Replace transformers.Trainer with NLPTrainer
# trainer = transformers.Trainer(...)
trainer = NLPTrainer(model=model, 
    train_dataset=raw_datasets["train"], 
    eval_dataset=raw_datasets["validation"],
    tokenizer=tokenizer
)
q_config = QuantizationConfig(metrics=[metrics.Metric(name="eval_loss", greater_is_better=False)])
model = trainer.quantize(quant_config=q_config)

input = tokenizer("I like Intel Extension for Transformers", return_tensors="pt")
output = model(**input).logits.argmax().item()

For more quick samples, please refer to Get Started Page. For more validated examples, please refer to Support Model Matrix

🎯Validated Performance

Model FP32 BF16 INT8
EleutherAI/gpt-j-6B 4163.67 (ms) 1879.61 (ms) 1612.24 (ms)
CompVis/stable-diffusion-v1-4 10.33 (s) 3.02 (s) N/A

Note*: GPT-J-6B software/hardware configuration please refer to text-generation. Stable-diffusion software/hardware configuration please refer to text-to-image

📖Documentation

OVERVIEW
Model Compression NeuralChat Neural Engine Kernel Libraries
MODEL COMPRESSION
Quantization Pruning Distillation Orchestration
Neural Architecture Search Export Metrics/Objectives Pipeline
NEURAL ENGINE
Model Compilation Custom Pattern Deployment Profiling
KERNEL LIBRARIES
Sparse GEMM Kernels Custom INT8 Kernels Profiling Benchmark
ALGORITHMS
Length Adaptive Data Augmentation
TUTORIALS AND RESULTS
Tutorials Supported Models Model Performance Kernel Performance

📃Selected Publications/Events

View Full Publication List.

Additional Content

💁Collaborations

Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach us and look forward to our collaborations on Intel Extension for Transformers!

intel-extension-for-transformers's People

Contributors

a32543254 avatar airmeng avatar ceciliawwq avatar changwangss avatar ddele avatar eason9393 avatar intellinjun avatar kevinintel avatar letonghan avatar lkk12014402 avatar luoyu-intel avatar lvliang-intel avatar n1ck-guo avatar nammbash avatar penghuicheng avatar spycsh avatar sunjiweiswift avatar sywangyi avatar tofindoutmagic avatar vincyzhang avatar violetch24 avatar xin3he avatar xinyuye-intel avatar xuhuiren avatar yi1ding avatar yuchengliu1 avatar zhentaoyu avatar zhenwei-intel avatar zhenzhong1 avatar zhewang1-intc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.