GithubHelp home page GithubHelp logo

yyaadet / llm-perf Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 7.48 MB

LLM performance auto test. Get insight and replay evaluation with a little time.

License: GNU General Public License v3.0

Python 89.85% Shell 10.15%

llm-perf's Introduction

llm-perf

LLM performance auto test. Get insight and replay evaluation with a little time.

本项目的目标是打造一个从用户使用的角度出发的、可复现的、自动化程度高的大模型效果评测工具。项目将会包括使用的测试数据、测试代码、测试报告。

数据源

  • CEval的val数据集

支持大模型

  • gpt-3.5-turbo
  • Kimi
  • GLM4
  • 阶跃星辰
  • 文心一言 3.5
  • Minimax

运行Kimi测试

需要安装有python3.11版本。

  • cd llm-perf
  • pip install -r requirements.txt
  • 修改test_kimi.sh里面的token与cookie值

使用Safari或Chrome登陆进网站: https://kimi.moonshot.cn, 随便输入一个文字,用来启动一个新会话。如下图:

用新的token、cookie、chat_id替换脚本test_kimi.sh里面的值

运行GLM4测试

  • cd llm-perf
  • pip install -r requirements.txt
  • 修改test_glm4.sh里面的
    • token
    • cookie
    • assistant_id
    • conversion_id

运行gpt-3.5-turbo测试

不需要openai的api key。

  • cd llm-perf
  • pip install -r requirements.txt
  • python run.py chatgpt。第一次启动,需要手工登陆一下。命令启动成功,会打开一个浏览器,手工登陆一下poe.com网站,然后再运行一下刚才的那个命令。

运行阶跃星辰测试

  • cd llm-perf
  • pip install -r requirements.txt
  • 修改test_step.sh里面的
    • token
    • cookie
    • chat_id
    • appid
  • 运行 ./test_step.sh

运行文心一言3.5测试

只需要用户名与密码就可以。

  • cd llm-perf
  • pip install -r requirements.txt
  • python run.py yiyan --username {} --password {}。输入你自己的用户名与密码就可以开始测试了。

查看报告

  • cd streamlit
  • pip install -r requirements.txt
  • streamlit run llm_perf.py

测试结果明细位于datasets/

友情赞助

llm-perf's People

Contributors

yyaadet avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.