bytedance / bytemlperf Goto Github PK

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Home Page: https://bytemlperf.ai/

License: Apache License 2.0

Python 98.58% Shell 1.42%

python

bytemlperf's Introduction

ByteMLPerf Benchmark Tool

ByteMLPerf is an AI Accelerator Benchmark that focuses on evaluating AI Accelerators from practical production perspective, including the ease of use and versatility of software and hardware. Byte MLPerf has the following characteristics:

Models and runtime environments are more closely aligned with practical business use cases.
For ASIC hardware evaluation, besides evaluate performance and accuracy, it also measure metrics like compiler usability and coverage.
Performance and accuracy results obtained from testing on the open Model Zoo serve as reference metrics for evaluating ASIC hardware integration.

Vendor List

ByteMLPerf Vendor Backend List will be shown below

Vendor	SKU	Key Parameters	Inference(General Perf)	Inference(LLM Perf)
Intel	Xeon	-	-	-
Stream Computing	STC P920	Computation Power:128 TFLOPS@FP16 Last Level Buffer: 8MB, 256GB/s Level 1 Buffer: 1.25MB, 512GB/s Memory: 16GB, 119.4GB/S Host Interface：PCIe 4, 16x, 32GB/s TDP: 160W	STC Introduction	-
Graphcore	Graphcore® C600	Compute: 280 TFLOPS@FP16, 560 TFLOPS@FP8 In Processor Memory: 900 MB, 52 TB/s Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s TDP: 185W	IPU Introduction	-
Moffett-AI	Moffett-AI S30	Compute: 1440 (32x-Sparse) TFLOPS@BF16, 2880 (32x-Sparse) TOPS@INT8, Memory: 60 GB, Host Interface: Dual PCIe Gen4 8-lane interfaces, 32GB/s TDP: 250W	SPU Introduction	-
Habana	Gaudi2	24 Tensor Processor Cores, Dual matrix multiplication engines Memory: 96 GB HBM2E, 48MB SRAM	HPU Introduction	-

Statement

ASF Statement on Compliance with US Export Regulations and Entity List

bytemlperf's People

Contributors

Stargazers

Watchers

bytemlperf's Issues

The contents missed in the byte_mlperf/datasets folder

When run the following command,

python3 launch.py --task albert-torch-fp32 --hardware CPU

got several errors, e.g.

tar: byte_mlperf/datasets/open_squad: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

Traceback (most recent call last):
  File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 106, in <module>
    engine.start_engine()
  File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 53, in start_engine
    return self.workload_perf(self.workload)
  File "byte_mlperf/backends/CPU/calculate_cpu_diff.py", line 63, in workload_perf
    ds = load_dataset(model_info)
  File "/workspace/public_version/ByteMLPerf/byte_mlperf/core/configs/dataset_store.py", line 39, in load_dataset
    data_loader = importlib.import_module('byte_mlperf.datasets.' +
  File "/workspace/public_version/ByteMLPerf/byte_mlperf/backends/CPU/venv/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'byte_mlperf.datasets.open_squad.data_loader'

After comparing with the local code, found the local code had the following structures of the open_squad folder:

.
├── bert
│   ├── accuracy_squad.py
│   ├── evaluate.py
│   └── __pycache__
│       ├── accuracy_squad.cpython-38.pyc
│       └── evaluate.cpython-38.pyc
├── create_squad_data.py
├── data_loader.py
├── dev-v1.1.json
├── eval_features_albert-torch-fp32.pickle
├── packing_utils.py
├── __pycache__
│   ├── create_squad_data.cpython-38.pyc
│   ├── data_loader.cpython-38.pyc
│   ├── packing_utils.cpython-38.pyc
│   └── test_accuracy.cpython-38.pyc
├── test_accuracy.py
└── vocab.txt

All the *.py missed in the current datasets/ folder, and even the open_squad folder.

Need add these files to the codes to make it can run.

请教下目前 model_zoo 中提供的 fp32 的模型，有提供工具或者方法快速转换为 fp

can not run gp2 on habana gaudi2

when we try to run gpt2 torch script on habana gaudi2, there is a error comes, see following picture

it is due to the device type of input tensors are not same, one is hpu and another is cpu

llm_perf not running

Displayed “AttributeError: module 'posixpath' has no attribute 'byte_infer_perfdirname'” when llm perf is running.

Re-batch not called if only the numeric checker set true

When run the task conformer-encoder-onnx-fp32, which configured as "test_accuracy": false and "test_numeric": true, the inputs feeds data was not re-batched to batch size 4, and caused the IPU poprt runtime raised an exception like "input data size is smaller than the model required."

Checked the code and found the rebatch only called if "test_accuracy": true set in perf_engine.py as

if workload['test_accuracy']:
            log.info("******************************************* Running Accuracy Checker... *******************************************")

            dataset.rebatch(self.runtime_backend.get_loaded_batch_size())
            accuracy_results = AccuracyChecker.calculate_acc(
                workload['data_percent'])

            accuracy_report['Data Percent'] = workload['data_percent']
            accuracy_report.update(accuracy_results)

Need to move it out of the if workload['test_accuracy']: statement, or add a same line in the codes below:

diff --git a/byte_mlperf/core/perf_engine.py b/byte_mlperf/core/perf_engine.py
index 4172b22..e0e3d91 100644
--- a/byte_mlperf/core/perf_engine.py
+++ b/byte_mlperf/core/perf_engine.py
@@ -206,6 +206,7 @@ class PerfEngine:
         # test numeric
         if workload['test_numeric']:
             log.info("******************************************* Running Numeric Checker... *******************************************")
+            dataset.rebatch(self.runtime_backend.get_loaded_batch_size())
             if not workload['test_accuracy']:
                 accuracy_results = AccuracyChecker.calculate_acc(
                     workload['data_percent'])

SD-Vae-decoder 输入是否有误？

Vae-decoder输入256对应2048输出，这个规模有点太大了，
unet 输入hw 32, 输出hw 32, 以及vae-encoder 输入hw 256，输出hw 32，
推测vae-decoder 输入应该是32?

are the bert tensorflow and bert pytorch the same model?

Hi, I am a little confused by the framework choices.

in the prepare_model_and_dataset.sh file, do we download the same file for two different framework?

if [ $1 == "bert-tf-fp32" -o $1 == "bert-torch-fp32" ]; then
    wget -O byte_mlperf/download/open_bert.tar https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/open_bert.tar
    tar xf byte_mlperf/download/open_bert.tar -C byte_mlperf/model_zoo/regular/

'PerfEngine' object has no attribute 'pre_compile_config' when the get_best_batch_size() implemented in the backend

After implemented the get_best_batch_size() to return a list of batch sizes, got the following error:

Traceback (most recent call last):
  File "byte_mlperf/core/perf_engine.py", line 376, in <module>
    engine.start_engine()
  File "byte_mlperf/core/perf_engine.py", line 90, in start_engine
    status, workload_report = self.single_workload_perf(self.workload)
  File "byte_mlperf/core/perf_engine.py", line 155, in single_workload_perf
    self.pre_compile_config['workload']['batch_sizes'] = best_batch_sizes
AttributeError: 'PerfEngine' object has no attribute 'pre_compile_config'

The following codes

        best_batch_sizes = self.compile_backend.get_best_batch_size()
        if isinstance(best_batch_sizes, list):
            self.pre_compile_config['workload'][
                'batch_sizes'] = best_batch_sizes

should be:

        best_batch_sizes = self.compile_backend.get_best_batch_size()
        if isinstance(best_batch_sizes, list):
            pre_compile_config['workload'][
                'batch_sizes'] = best_batch_sizes

运行gpt2-torch-fp32 task 遇到的2个问题

问题1: general_perf/prepare_model_and_dataset.sh 脚本无法下载到正确的位置：

wget -O general_perf/download/traced_gpt2.tar https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/traced_gpt2.tar
tar xf general_perf/download/gpt2.tar -C general_perf/model_zoo/sota/

下载的tar 包和解压的tar 包名称不一样，需要修改为：

wget -O general_perf/download/traced_gpt2.tar -c https://lf-bytemlperf.17mh.cn/obj/bytemlperf-zoo/traced_gpt2.tar
mkdir general_perf/model_zoo/sota/traced_gpt2
tar xf general_perf/download/traced_gpt2.tar -C general_perf/model_zoo/sota/traced_gpt2/

问题2: 运行过程中遇到如下报错，程序不运行也不异常退出，麻烦确认下是什么问题？

INFO:PerfEngine:******************************************* Running Accuracy Checker... *******************************************
INFO:FAKE_DATA:Rebatching batch size to: 4 ...
INFO:TestAccuracy:Start to calculate accuracy...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:28<00:00,  3.48it/s]
INFO:TestAccuracy:Batch size is 4, Accuracy: 0.0
/ByteMLPerf/byte_infer_perf/general_perf/datasets/fake_dataset/test_accuracy.py:48: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.array(diffs),
运行到这里就卡死了

Deberta model json config file does not match the input of the model.

配置文件
byte_mlperf/model_zoo/deberta-torch-fp32.json
快照:
{
"model": "deberta-torch-fp32",
"model_path": "byte_mlperf/model_zoo/popular/open_deberta/deberta-base-squad.pt",
"framework": "Pytorch",
"framework_version": "1.10.0",
"model_format": "pt",
"model_precision": "FP32",
"inputs":"input_ids.1,attention_mask.1,token_type_ids",
"outputs":"start_logits,end_logits",
"input_shape": {"input_ids.1": [1,384], "attention_mask.1": [1,384], "token_type_ids": [1,384]},
"input_type": "LONG,LONG,LONG",
"dataset_name": "open_squad",
"max_batch_size": 64,
"is_quantized": false
}

在配置文件中，input 为三个。
"input_shape": {"input_ids.1": [1,384], "attention_mask.1": [1,384], "token_type_ids": [1,384]},

在pt模型中，input为三个，但是token_type_ids是独立的没有参与模型运算，其实是两个input。
byte_mlperf/model_zoo/popular/open_deberta/deberta-base-squad.pt

pt to onnx：
在onnx模型中，input为两个，在pt模型中的token_type_ids input因为没有参与运算，所以被优化掉了。

json文件的input配置与模型的input不匹配。

bytedance / bytemlperf Goto Github PK

bytemlperf's Introduction

ByteMLPerf Benchmark Tool

Category

Vendor List

Statement

bytemlperf's People

Contributors

Stargazers

Watchers

Forkers

bytemlperf's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs