GithubHelp home page GithubHelp logo

bytedance / bytemlperf Goto Github PK

View Code? Open in Web Editor NEW
154.0 8.0 35.0 16.32 MB

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Home Page: https://bytemlperf.ai/

License: Apache License 2.0

Python 98.58% Shell 1.42%
python

bytemlperf's Introduction

ByteMLPerf Benchmark Tool

ByteMLPerf is an AI Accelerator Benchmark that focuses on evaluating AI Accelerators from practical production perspective, including the ease of use and versatility of software and hardware. Byte MLPerf has the following characteristics:

  • Models and runtime environments are more closely aligned with practical business use cases.
  • For ASIC hardware evaluation, besides evaluate performance and accuracy, it also measure metrics like compiler usability and coverage.
  • Performance and accuracy results obtained from testing on the open Model Zoo serve as reference metrics for evaluating ASIC hardware integration.

Category

The ByteMLPerf benchmark is structured into three main categories: Inference, Training, and Micro, each targeting different aspects of AI accelerator performance:

  • Inference: This category is subdivided into two distinct sections to cater to different types of models:

    • General Performance: This section is dedicated to evaluating the inference capabilities of accelerators using common models such as ResNet-50 and BERT. It aims to provide a broad understanding of the accelerator's performance across a range of typical tasks. Vendors can refer to this document for guidance on building general perf backend: ByteMLPerf General Perf Guide [中文版]

    • Large Language Model (LLM) Performance: Specifically designed to assess the capabilities of accelerators in handling large language models, this section addresses the unique challenges posed by the size and complexity of these models. Vendors can refer to this document for guidance on building llm perf backend: ByteMLPerf LLM Perf Guide [中文版]

  • Micro: The Micro category focuses on the performance of specific operations or "ops" that are fundamental to AI computations, such as Gemm, Softmax, and various communication operations. This granular level of testing is crucial for understanding the capabilities and limitations of accelerators at a more detailed operational level. Vendors can refer to this document for guidance on building micro perf backend: ByteMLPerf Micro Perf Guide[中文版]

  • Training: Currently under development, this category aims to evaluate the performance of AI accelerators in training scenarios. It will provide insights into how well accelerators can handle the computationally intensive process of training AI models, which is vital for the development of new and more advanced AI systems.

Vendors looking to evaluate and improve their AI accelerators can utilize the ByteMLPerf benchmark as a comprehensive guide. The benchmark not only offers a detailed framework for performance and accuracy evaluation but also includes considerations for compiler usability and coverage for ASIC hardware, ensuring a holistic assessment approach.

For more details, you can visit our offical website here: bytemlperf.ai

Vendor List

ByteMLPerf Vendor Backend List will be shown below

Vendor SKU Key Parameters Inference(General Perf) Inference(LLM Perf)
Intel Xeon - - -
Stream Computing STC P920
  • Computation Power:128 TFLOPS@FP16
  • Last Level Buffer: 8MB, 256GB/s
  • Level 1 Buffer: 1.25MB, 512GB/s
  • Memory: 16GB, 119.4GB/S
  • Host Interface:PCIe 4, 16x, 32GB/s
  • TDP: 160W
  • STC Introduction -
    Graphcore Graphcore® C600
  • Compute: 280 TFLOPS@FP16, 560 TFLOPS@FP8
  • In Processor Memory: 900 MB, 52 TB/s
  • Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s
  • TDP: 185W
  • IPU Introduction -
    Moffett-AI Moffett-AI S30
  • Compute: 1440 (32x-Sparse) TFLOPS@BF16, 2880 (32x-Sparse) TOPS@INT8,
  • Memory: 60 GB,
  • Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s
  • TDP: 250W
  • SPU Introduction -
    Habana Gaudi2
  • 24 Tensor Processor Cores, Dual matrix multiplication engines
  • Memory: 96 GB HBM2E, 48MB SRAM
  • HPU Introduction -

    Statement

    ASF Statement on Compliance with US Export Regulations and Entity List

    bytemlperf's People

    Contributors

    angrypowman avatar hantengfei99 avatar huijuanzh avatar jackzipu avatar jianzhexiao avatar keg0704 avatar mikughoul avatar minghui-bd avatar moffett-ai avatar stc-qiupeng avatar suisiyuan avatar yjessicagao avatar

    Stargazers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    Watchers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    bytemlperf's Issues

    The contents missed in the byte_mlperf/datasets folder

    When run the following command,

    python3 launch.py --task albert-torch-fp32 --hardware CPU
    

    got several errors, e.g.

    tar: byte_mlperf/datasets/open_squad: Cannot open: No such file or directory
    tar: Error is not recoverable: exiting now
    
    Traceback (most recent call last):
      File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 106, in <module>
        engine.start_engine()
      File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 53, in start_engine
        return self.workload_perf(self.workload)
      File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 63, in workload_perf
        ds = load_dataset(model_info)
      File "/workspace/public_version/ByteMLPerf/byte_mlperf/core/configs/dataset_store.py", line 39, in load_dataset
        data_loader = importlib.import_module('byte_mlperf.datasets.' +
      File "/workspace/public_version/ByteMLPerf/byte_mlperf/backends/CPU/venv/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
    ModuleNotFoundError: No module named 'byte_mlperf.datasets.open_squad.data_loader'
    

    After comparing with the local code, found the local code had the following structures of the open_squad folder:

    .
    ├── bert
    │   ├── accuracy_squad.py
    │   ├── evaluate.py
    │   └── __pycache__
    │       ├── accuracy_squad.cpython-38.pyc
    │       └── evaluate.cpython-38.pyc
    ├── create_squad_data.py
    ├── data_loader.py
    ├── dev-v1.1.json
    ├── eval_features_albert-torch-fp32.pickle
    ├── packing_utils.py
    ├── __pycache__
    │   ├── create_squad_data.cpython-38.pyc
    │   ├── data_loader.cpython-38.pyc
    │   ├── packing_utils.cpython-38.pyc
    │   └── test_accuracy.cpython-38.pyc
    ├── test_accuracy.py
    └── vocab.txt
    

    All the *.py missed in the current datasets/ folder, and even the open_squad folder.

    Need add these files to the codes to make it can run.

    can not run gp2 on habana gaudi2

    when we try to run gpt2 torch script on habana gaudi2, there is a error comes, see following picture
    image

    it is due to the device type of input tensors are not same, one is hpu and another is cpu
    image

    llm_perf not running

    Displayed “AttributeError: module 'posixpath' has no attribute 'byte_infer_perfdirname'” when llm perf is running.
    Uploading Screen Shot 2024-04-22 at 11.13.23.png…

    Re-batch not called if only the numeric checker set true

    When run the task conformer-encoder-onnx-fp32, which configured as "test_accuracy": false and "test_numeric": true, the inputs feeds data was not re-batched to batch size 4, and caused the IPU poprt runtime raised an exception like "input data size is smaller than the model required."

    Checked the code and found the rebatch only called if "test_accuracy": true set in perf_engine.py as

    if workload['test_accuracy']:
                log.info("******************************************* Running Accuracy Checker... *******************************************")
    
                dataset.rebatch(self.runtime_backend.get_loaded_batch_size())
                accuracy_results = AccuracyChecker.calculate_acc(
                    workload['data_percent'])
    
                accuracy_report['Data Percent'] = workload['data_percent']
                accuracy_report.update(accuracy_results)
    

    Need to move it out of the if workload['test_accuracy']: statement, or add a same line in the codes below:

    diff --git a/byte_mlperf/core/perf_engine.py b/byte_mlperf/core/perf_engine.py
    index 4172b22..e0e3d91 100644
    --- a/byte_mlperf/core/perf_engine.py
    +++ b/byte_mlperf/core/perf_engine.py
    @@ -206,6 +206,7 @@ class PerfEngine:
             # test numeric
             if workload['test_numeric']:
                 log.info("******************************************* Running Numeric Checker... *******************************************")
    +            dataset.rebatch(self.runtime_backend.get_loaded_batch_size())
                 if not workload['test_accuracy']:
                     accuracy_results = AccuracyChecker.calculate_acc(
                         workload['data_percent'])
    

    SD-Vae-decoder 输入是否有误?

    Vae-decoder输入256对应2048输出,这个规模有点太大了,
    unet 输入hw 32, 输出hw 32, 以及vae-encoder 输入hw 256,输出hw 32,
    推测vae-decoder 输入应该是32?

    are the bert tensorflow and bert pytorch the same model?

    Hi, I am a little confused by the framework choices.

    in the prepare_model_and_dataset.sh file, do we download the same file for two different framework?

    if [ $1 == "bert-tf-fp32" -o $1 == "bert-torch-fp32" ]; then
        wget -O byte_mlperf/download/open_bert.tar https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/open_bert.tar
        tar xf byte_mlperf/download/open_bert.tar -C byte_mlperf/model_zoo/regular/
    

    'PerfEngine' object has no attribute 'pre_compile_config' when the get_best_batch_size() implemented in the backend

    After implemented the get_best_batch_size() to return a list of batch sizes, got the following error:

    Traceback (most recent call last):
      File "byte_mlperf/core/perf_engine.py", line 376, in <module>
        engine.start_engine()
      File "byte_mlperf/core/perf_engine.py", line 90, in start_engine
        status, workload_report = self.single_workload_perf(self.workload)
      File "byte_mlperf/core/perf_engine.py", line 155, in single_workload_perf
        self.pre_compile_config['workload']['batch_sizes'] = best_batch_sizes
    AttributeError: 'PerfEngine' object has no attribute 'pre_compile_config'
    

    The following codes

            best_batch_sizes = self.compile_backend.get_best_batch_size()
            if isinstance(best_batch_sizes, list):
                self.pre_compile_config['workload'][
                    'batch_sizes'] = best_batch_sizes
    

    should be:

            best_batch_sizes = self.compile_backend.get_best_batch_size()
            if isinstance(best_batch_sizes, list):
                pre_compile_config['workload'][
                    'batch_sizes'] = best_batch_sizes
    

    运行gpt2-torch-fp32 task 遇到的2个问题

    问题1: general_perf/prepare_model_and_dataset.sh 脚本无法下载到正确的位置:

    wget -O general_perf/download/traced_gpt2.tar https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/traced_gpt2.tar
    tar xf general_perf/download/gpt2.tar -C general_perf/model_zoo/sota/
    

    下载的tar 包和解压的tar 包名称不一样,需要修改为:

    wget -O general_perf/download/traced_gpt2.tar -c https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/traced_gpt2.tar
    mkdir general_perf/model_zoo/sota/traced_gpt2
    tar xf general_perf/download/traced_gpt2.tar -C general_perf/model_zoo/sota/traced_gpt2/
    

    问题2: 运行过程中遇到如下报错,程序不运行也不异常退出,麻烦确认下是什么问题?

    INFO:PerfEngine:******************************************* Running Accuracy Checker... *******************************************
    INFO:FAKE_DATA:Rebatching batch size to: 4 ...
    INFO:TestAccuracy:Start to calculate accuracy...
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:28<00:00,  3.48it/s]
    INFO:TestAccuracy:Batch size is 4, Accuracy: 0.0
    /ByteMLPerf/byte_infer_perf/general_perf/datasets/fake_dataset/test_accuracy.py:48: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      np.array(diffs),
    运行到这里就卡死了
    

    Deberta model json config file does not match the input of the model.

    配置文件
    byte_mlperf/model_zoo/deberta-torch-fp32.json
    快照:
    {
    "model": "deberta-torch-fp32",
    "model_path": "byte_mlperf/model_zoo/popular/open_deberta/deberta-base-squad.pt",
    "framework": "Pytorch",
    "framework_version": "1.10.0",
    "model_format": "pt",
    "model_precision": "FP32",
    "inputs":"input_ids.1,attention_mask.1,token_type_ids",
    "outputs":"start_logits,end_logits",
    "input_shape": {"input_ids.1": [1,384], "attention_mask.1": [1,384], "token_type_ids": [1,384]},
    "input_type": "LONG,LONG,LONG",
    "dataset_name": "open_squad",
    "max_batch_size": 64,
    "is_quantized": false
    }

    在配置文件中,input 为三个。
    "input_shape": {"input_ids.1": [1,384], "attention_mask.1": [1,384], "token_type_ids": [1,384]},

    在pt模型中,input为三个,但是token_type_ids是独立的没有参与模型运算,其实是两个input。
    byte_mlperf/model_zoo/popular/open_deberta/deberta-base-squad.pt
    image

    pt to onnx:
    在onnx模型中,input为两个,在pt模型中的token_type_ids input因为没有参与运算,所以被优化掉了。
    image

    json文件的input配置与模型的input不匹配。

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.