GithubHelp home page GithubHelp logo

llama_infer's Introduction

News

llama_infer

Inference script for Meta's LLaMA models using Hugging Face wrapper as in huggingface/transformers#21955

For the 65B model:

fp16 int8(bitsandbytes)
V100 OK, 5xV100 Bad results, short generated sequences
A100 OK, 6xA100 when using "auto" OK, 3xA100

Note that I didn't tweak the device_map for the case of A100 fp16. I expect it would be possible to reduce to somewhere near 4xA100 .

First install from source

git clone https://github.com/zphang/transformers.git --branch llama_push --depth=1
cd transformers
python3 setup.py develop --user
git clone https://github.com/huggingface/transformers.git --depth=1
cd transformers
python3 setup.py develop --user

Note: there is still ongoing confusion between LLaMATokenizer and LlamaTokenizer. When you load a model and there are complaints about missing LLaMATokenizer, you may have to temporarily use https://github.com/mbehm/transformers/ . Otherwise, you can go for the latest head in https://github.com/huggingface/transformers/ now.

Second convert the weights

python3 src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights \
    --model_size 7B \
    --output_dir /data/llama/hf/

Here we assume the converted weights are in /data/llama/hf/ .

Hack for tokenizer (may not be required)

If tokeinizer complains, please setup a soft link to make it happy.

/data/llama/hf/65b/tokenizer$ ls -lh
total 496K
lrwxrwxrwx 1 brainpp brainpp   23 Mar  5 13:51 config.json -> special_tokens_map.json

7B model

int8 (decent now after removing extra EOS)

python3 test_llam.py --do_int8 --low_cpu_mem_usage --variant 7b --model_path /data/llama/hf/

contrastive search

Puma is a 1996 film starring Jackie Chan and Leslie Cheung Kwok-wing.
The film's soundtrack was composed by Shigeru Umebayashi, who was nominated for the Golden Horse Award for Best Score at the 24th Golden Horse Awards.boldsquo;s first martial arts film in a decade, PUMA is a remake of the 1976 Shaw Brothers film Fist of Fury. Chan plays Chen Zhen, a Chinese student who travels to Japan to study karate. After being beaten by a group of yakuza, he vows to avenge his friend's death. The Japanese police are unable to stop him, and the yakuza send their best fighters to try to stop him.
Chen Zhen (Jackie Chan) is a Chinese student who has traveled to Japan to study karate. While on

float16 (decent)

python3 test_llam.py --low_cpu_mem_usage --variant 7b --model_path /data/llama/hf/

contrastive search

Puma is a 1980’s classic that has stood the test of time. With its sleek design and sporty look, Puma is a shoe that can be worn with anything and everything.
The Puma Suede is a sneaker that was introduced in 1968 by Adi Dassler and his brother Rudi. The Suede’s design was inspired by the moccasin shoes that Native Americans wore in the 19th century. It was originally called Clyde Court, after Walt Clyde Frazier, a basketball player for the New York Knicks.\nThe first version of the Puma Suede was made of leather and had a rubber sole. Later, the design was changed to a canvas upper and a plastic wedge heel. This version was more popular and is the one that we know today.
Today, the Puma Suede is one of the most popular

65B model

Work like a charm. Sample generations omitted.

llama_infer's People

Contributors

zsc avatar erjanmx avatar

Stargazers

Garose Choji avatar  avatar Yu-Ting Lee avatar  avatar  avatar Yilun Zhao avatar Yeyang avatar Qingsong Liu avatar Charles avatar bocai avatar Yifan Wei avatar  avatar  avatar Yixuan Su avatar  avatar William Stern avatar André Charneca avatar  avatar Dong Zhang avatar Tao Zhang avatar Researcher.YuanYuhui avatar  avatar  avatar François Lespinasse avatar  avatar Nikita avatar Ke Wen avatar Terry Cruz Melo avatar Jamie Jiazhan Feng avatar Weiliang Chen avatar Nick Arner avatar WonderSeen avatar Yang Wang  avatar Juca Da avatar ZZK avatar  avatar  avatar Brandon Biggs avatar Juyoung Suk avatar Titus avatar Than Lwin Aung avatar POPO avatar Marcio Fonseca avatar Armando Teles Fortes avatar Olivia-fsm avatar ChungYao.Ma avatar  avatar WPC avatar Dr. Liang Yao (姚亮) avatar Hui ZENG avatar zotona avatar Sebastien Campion  avatar Felipe Meres avatar Frank Qi avatar  avatar amrrs avatar Wang, Xiaozhe avatar  avatar Lowin Li avatar Chengxi Guo avatar Genkagaku.GPT avatar Yu Li avatar zeev grim avatar Kouuh avatar Zhu Shuai avatar T. Fang avatar Chibs avatar Zhaowei Wang avatar xihajun avatar  avatar Jinhua Liang avatar Sandalots avatar 爱可可-爱生活 avatar  avatar Taher Ali badnawarwala avatar Changtong avatar Huiqiang Jiang avatar Liang Ding avatar LY avatar Srihari avatar Nick Gerovac avatar Cosmo avatar David Dinucu-Jianu avatar Jiabao Ji avatar Andrew Chan avatar Mohamed Rashad avatar Shitty Girl avatar Volodymyr Kyrylov avatar  avatar Pradipta Deb avatar Giorgio Patrini avatar Younes Belkada avatar puy avatar Andrew Parry avatar Ilias Kamal avatar Song avatar  avatar David Zhuang avatar Jack Langerman avatar  avatar

Watchers

 avatar Daniel Beitler avatar  avatar

llama_infer's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.