nvidia-merlin / hugectr Goto Github PK

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

License: Apache License 2.0

CMake 1.34% C++ 39.83% Cuda 26.36% Shell 0.48% Python 15.76% Jupyter Notebook 16.19% Makefile 0.01% HTML 0.01% Batchfile 0.01% C 0.01%

cpp deep-learning gpu-acceleration recommendation-system recommender-system

hugectr's Introduction

HugeCTR

HugeCTR is a GPU-accelerated recommender framework designed for training and inference of large deep learning models.

Design Goals:

Fast: HugeCTR performs outstandingly in recommendation benchmarks including MLPerf.
Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR with plenty of documents, notebooks and samples.
Domain Specific: HugeCTR provides the essentials, so that you can efficiently deploy your recommender models with very large embedding.

NOTE: If you have any questions in using HugeCTR, please file an issue or join our Slack channel to have more interactive discussions.

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, refer to our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, do the following:

Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:
```
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-hugectr:24.06
```
NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

NOTE: HugeCTR uses NCCL to share data between ranks, and NCCL may requires shared memory for IPC and pinned (page-locked) system memory resources. It is recommended that you increase these resources by issuing the following options in the docker run command.
```
-shm-size=1g -ulimit memlock=-1
```

Write a simple Python script to generate a synthetic dataset:

# dcn_parquet_generate.py
import hugectr
from hugectr.tools import DataGeneratorParams, DataGenerator
data_generator_params = DataGeneratorParams(
  format = hugectr.DataReaderType_t.Parquet,
  label_dim = 1,
  dense_dim = 13,
  num_slot = 26,
  i64_input_key = False,
  source = "./dcn_parquet/file_list.txt",
  eval_source = "./dcn_parquet/file_list_test.txt",
  slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 
                     20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120,
                     1543 ],
  dist_type = hugectr.Distribution_t.PowerLaw,
  power_law_type = hugectr.PowerLaw_t.Short)
data_generator = DataGenerator(data_generator_params)
data_generator.generate()

Generate the Parquet dataset for your DCN model by running the following command:
```
python dcn_parquet_generate.py
```
NOTE: The generated dataset will reside in the folder ./dcn_parquet, which contains training and evaluation data.

Write a simple Python script for training:

# dcn_parquet_train.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 1280,
                              batchsize_eval = 1024,
                              batchsize = 1024,
                              lr = 0.001,
                              vvgpu = [[0]],
                              repeat_dataset = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
                                 source = ["./dcn_parquet/file_list.txt"],
                                 eval_source = "./dcn_parquet/file_list_test.txt",
                                 slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 
                                                   20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543 ])
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                    update_type = hugectr.Update_t.Global)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array =
                        [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
                           workspace_size_per_gpu_in_mb = 75,
                           embedding_vec_size = 16,
                           combiner = "sum",
                           sparse_embedding_name = "sparse_embedding1",
                           bottom_name = "data1",
                           optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                           bottom_names = ["sparse_embedding1"],
                           top_names = ["reshape1"],
                           leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                           bottom_names = ["reshape1", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross,
                           bottom_names = ["concat1"],
                           top_names = ["multicross1"],
                           num_layers=6))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                           bottom_names = ["concat1"],
                           top_names = ["fc1"],
                           num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                           bottom_names = ["fc1"],
                           top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                           bottom_names = ["relu1"],
                           top_names = ["dropout1"],
                           dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                           bottom_names = ["dropout1", "multicross1"],
                           top_names = ["concat2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                           bottom_names = ["concat2"],
                           top_names = ["fc2"],
                           num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                           bottom_names = ["fc2", "label"],
                           top_names = ["loss"]))
model.compile()
model.summary()
model.graph_to_json(graph_config_file = "dcn.json")
model.fit(max_iter = 5120, display = 200, eval_interval = 1000, snapshot = 5000, snapshot_prefix = "dcn")

NOTE: Ensure that the paths to the synthetic datasets are correct with respect to this Python script. data_reader_type, check_type, label_dim, dense_dim, and data_reader_sparse_param_array should be consistent with the generated dataset.

Train the model by running the following command:
```
python dcn_parquet_train.py
```
NOTE: It is presumed that the evaluation AUC value is incorrect since randomly generated datasets are being used. When the training is done, files that contain the dumped graph JSON, saved model weights, and optimizer states will be generated.

For more information, refer to the HugeCTR User Guide.

HugeCTR SDK

We're able to support external developers who can't use HugeCTR directly by exporting important HugeCTR components using:

Sparse Operation Kit directory | documentation: a python package wrapped with GPU accelerated operations dedicated for sparse training/inference cases.
GPU Embedding Cache: embedding cache available on the GPU memory designed for CTR inference workload.

Support and Feedback

If you encounter any issues or have questions, go to https://github.com/NVIDIA/HugeCTR/issues and submit an issue so that we can provide you with the necessary resolutions and answers. To further advance the HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

Contributing to HugeCTR

With HugeCTR being an open source project, we welcome contributions from the general public. With your contributions, we can continue to improve HugeCTR's quality and performance. To learn how to contribute, refer to our HugeCTR Contributor Guide.

Additional Resources

Webpages
NVIDIA Merlin
NVIDIA HugeCTR

Publications

Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Jie Liu, Ji Shi and Zehuan Wang, "A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models," Proceedings of the 16th ACM Conference on Recommender Systems, pp. 408-419, 2022.

Zehuan Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Shijie Liu, Daniel G. Abel, Xu Guo, Jianbing Dong, Ji Shi and Kunlun Li, "Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference," Proceedings of the 16th ACM Conference on Recommender Systems, pp. 534-537, 2022.

Talks

Conference / Website	Title	Date	Speaker	Language
ACM RecSys 2022	A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models	September 2022	Matthias Langer	English
Short Videos Episode 1	Merlin HugeCTR：GPU 加速的推荐系统框架	May 2022	Joey Wang	中文
Short Videos Episode 2	HugeCTR 分级参数服务器如何加速推理	May 2022	Joey Wang	中文
Short Videos Episode 3	使用 HugeCTR SOK 加速 TensorFlow 训练	May 2022	Gems Guo	中文
GTC Sping 2022	Merlin HugeCTR: Distributed Hierarchical Inference Parameter Server Using GPU Embedding Cache	March 2022	Matthias Langer, Yingcan Wei, Yu Fan	English
APSARA 2021	GPU 推荐系统 Merlin	Oct 2021	Joey Wang	中文
GTC Spring 2021	Learn how Tencent Deployed an Advertising System on the Merlin GPU Recommender Framework	April 2021	Xiangting Kong, Joey Wang	English
GTC Spring 2021	Merlin HugeCTR: Deep Dive Into Performance Optimization	April 2021	Minseok Lee	English
GTC Spring 2021	Integrate HugeCTR Embedding with TensorFlow	April 2021	Jianbing Dong	English
GTC China 2020	MERLIN HUGECTR ：深入研究性能优化	Oct 2020	Minseok Lee	English
GTC China 2020	性能提升 7 倍 + 的高性能 GPU 广告推荐加速系统的落地实现	Oct 2020	Xiangting Kong	中文
GTC China 2020	使用 GPU EMBEDDING CACHE 加速 CTR 推理过程	Oct 2020	Fan Yu	中文
GTC China 2020	将 HUGECTR EMBEDDING 集成于 TENSORFLOW	Oct 2020	Jianbing Dong	中文
GTC Spring 2020	HugeCTR: High-Performance Click-Through Rate Estimation Training	March 2020	Minseok Lee, Joey Wang	English
GTC China 2019	HUGECTR: GPU 加速的推荐系统训练	Oct 2019	Joey Wang	中文

Blogs

Conference / Website	Title	Date	Authors	Language
Wechat Blog	Merlin HugeCTR 分级参数服务器系列之三：集成到TensorFlow	Nov. 2022	Kingsley Liu	中文
NVIDIA Devblog	Scaling Recommendation System Inference with Merlin Hierarchical Parameter Server/使用 Merlin 分层参数服务器扩展推荐系统推理	August 2022	Shashank Verma, Wenwen Gao, Yingcan Wei, Matthias Langer, Jerry Shi, Fan Yu, Kingsley Liu, Minseok Lee	English/中文
NVIDIA Devblog	Merlin HugeCTR Sparse Operation Kit 系列之二	June 2022	Kunlun Li	中文
NVIDIA Devblog	Merlin HugeCTR Sparse Operation Kit 系列之一	March 2022	Gems Guo, Jianbing Dong	中文
Wechat Blog	Merlin HugeCTR 分级参数服务器系列之二	March 2022	Yingcan Wei, Matthias Langer, Jerry Shi	中文
Wechat Blog	Merlin HugeCTR 分级参数服务器系列之一	Jan. 2022	Yingcan Wei, Jerry Shi	中文
NVIDIA Devblog	Accelerating Embedding with the HugeCTR TensorFlow Embedding Plugin	Sept 2021	Vinh Nguyen, Ann Spencer, Joey Wang and Jianbing Dong	English
medium.com	Optimizing Meituan’s Machine Learning Platform: An Interview with Jun Huang	Sept 2021	Sheng Luo and Benedikt Schifferer	English
medium.com	Leading Design and Development of the Advertising Recommender System at Tencent: An Interview with Xiangting Kong	Sept 2021	Xiangting Kong, Ann Spencer	English
NVIDIA Devblog	扩展和加速大型深度学习推荐系统 – HugeCTR 系列第 1 部分	June 2021	Minseok Lee	中文
NVIDIA Devblog	使用 Merlin HugeCTR 的 Python API 训练大型深度学习推荐模型 – HugeCTR 系列第 2 部分	June 2021	Vinh Nguyen	中文
medium.com	Training large Deep Learning Recommender Models with Merlin HugeCTR’s Python APIs — HugeCTR Series Part 2	May 2021	Minseok Lee, Joey Wang, Vinh Nguyen and Ashish Sardana	English
medium.com	Scaling and Accelerating large Deep Learning Recommender Systems — HugeCTR Series Part 1	May 2021	Minseok Lee	English
IRS 2020	Merlin: A GPU Accelerated Recommendation Framework	Aug 2020	Even Oldridge etc.	English
NVIDIA Devblog	Introducing NVIDIA Merlin HugeCTR: A Training Framework Dedicated to Recommender Systems	July 2020	Minseok Lee and Joey Wang	English

hugectr's People

Contributors

Stargazers

Watchers

Forkers

lycing panyx0718 huangjun6919 yjmade hengqujushi freehawk laisun ivankxt zhp510730568 xiangchenchao awesome-archive tensor-tang lvjun93 zhouyonglong dnuang allensmile seeker1943 tigeroses buracagyang dmudiger straywarrior sprinterzzj jojoyu chenghuige shuoranly nadirnadir shafiahmed mbrukman renyi533 lndkcg oyilmaz-nvidia tilaba lliubr mtmd cxc-maker qingshui rhdong calliwen novigard jepsonwong ethem-kinginthenorth nicolascheng twoflypig goooxu yueyedeai vinhngx liujieshane flame4 xiaoleishi-nv eric-haibin-lin rcdnn noelenenoone jimshith wanglc2008 tobehuang vslyu justinhedge gloriaxie123 miguelusque dendisuhubdy chomolungma mulxcode mr-nineteen darwin-systems pahal2007 caoyuji1986 qianrenjian albertvillanova byshiue shadowell zhaojunzuozjzfr xiaming9880 447555240 bkarsin zhurou603 xiesai gsj1029 jinqingyu benfred pluto1944 btbujiangjun tonyweo baagie7 snapbuy yiqxiaobai chunyang-wen gavinljj antdogg149 bashimao tpnguyen wuziyou199217 benikahall cjnolet tianhaofu xmh645214784 jershi425 jaydown marsmiao sniperxyp teora

hugectr's Issues

setting seed can't reproduce the results

What have I done?

set seed in config file: "solver": {"seed": 100}
set maxiter=2 ， eval_interval=1
set file_list.txt with 1 file set file_list_with.txt with 1 file
set train and eval reader chunk_size with 1 : data_reader.reset(new DataReader(source_data, batch_size, label_dim, dense_dim,
check_type, data_reader_sparse_param_array,
gpu_resource_group, 1, use_mixed_precision));
run the train process ./huge_ctr --train model.json twice

What happened?

AverageLoss(1.200125 and 1.18949 ) are too far

the first train log:

 [05d17h49m17s][HUGECTR][INFO]: Iter: 1 Time(1 iters): 0.101207s Loss: 1.211278 lr:0.000100

[05d17h49m18s][HUGECTR][INFO]: Evaluation, AUC: 0.501446
[05d17h49m18s][HUGECTR][INFO]: Evaluation, AverageLoss: 1.200125

the second train log:

 
[05d17h51m37s][HUGECTR][INFO]: Iter: 1 Time(1 iters): 0.093456s Loss: 1.200530 lr:0.000100

[05d17h51m37s][HUGECTR][INFO]: Evaluation, AUC: 0.397724

[05d17h51m37s][HUGECTR][INFO]: Evaluation, AverageLoss: 1.18949

Looks like missing dollar mark in dockerfile.

This is a trivial bug report.

It looks like the last argument at https://github.com/NVIDIA/HugeCTR/blob/master/tools/dockerfiles/build.Dockerfile#L36, -DNCCL_A2A=NCCL_A2A, should be replaced with -DNCCL_A2A=$NCCL_A2A.

Custom models on HugeCTR

Hi HugeCTR experts,

I want to implement a custom model on HugeCTR. So far, I could not find docs that show how to import layers/optimizers to build a custom model. Or is there anything I miss?

I wonder if you guys have or will release documentations that show how to build custom model?

Thanks

General command line options parser

Description:
We may need a general command line options parser for ./huge_ctr, ./data_generator et cetera.
Comments:

when cache_size_ >1 the train loss is zero

We find that setting cache_size_ >1 in DataCollector , the train loss is almost zero . In DataCollector.hpp :

template <typename TypeKey>
void DataCollector<TypeKey>::collect() {
  if (counter_ < cache_size_ || cache_size_ == 0) {
    collect_();
  } else {
    collect_blank_(); 
  }
}

counter_ is increment , and will never less than cache_size_ once it's bigger than cache_size_ . And not collect is running, so the train data is old version , and model is overfit , loss is almost zero .
The correct code is supposed to be

template <typename TypeKey>
void DataCollector<TypeKey>::collect() {
  if (counter_ % internal_buffers_.size() < cache_size_ || cache_size_ == 0) {
    collect_();
  } else {
    collect_blank_(); 
  }
}

[BUG] Runtime error: an illegal memory access

After processing Criteo dataset with NVTabular and generating the output parquet files, I get Runtime error: an illegal memory access when I try to train using HugeCTR and DLRM model.

[06d20h48m42s][HUGECTR][INFO]: Iter: 14000 Time(1000 iters): 51.684892s Loss: 0.131229 lr:24.000000
[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/embeddings/update_params_functor.cu:571 

[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/embeddings/update_params_functor.cu:571 

[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/session.cpp:427 

terminate called after throwing an instance of 'HugeCTR::internal_runtime_error'
  what():  [HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/include/general_buffer2.hpp:37

HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(276): error: identifier "__syncwarp" is undefined

HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(276): error: identifier "__syncwarp" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(287): error: identifier "__any_sync" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(300): error: identifier "__all_sync" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(313): error: identifier "__ballot_sync" is undefined

4 errors detected in the compilation of "/tmp/tmpxft_0000e907_00000000-6_embedding_creator.cpp1.ii".
HugeCTR/src/CMakeFiles/huge_ctr_static.dir/build.make:101: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/embedding_creator.cu.o' failed
make[2]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/embedding_creator.cu.o] Error 1
CMakeFiles/Makefile2:156: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all' failed
make[1]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Error of running './huge_ctr --train ./deepfm_bin.json'

Hi there,
I tried running HugeCTR Docker example of DeepFM with NVTabular preprocessing, but after running the command on the title, it shows errors and stops at the training start. Is there any bug?Thx.

System: Ubuntu 18.04.4 LTS
GPU: GeForce RTX 2080 Ti
Driver Version: 440.44
CUDA Version: 10.2

[0.001, init_start, ]
HugeCTR Version: 2.2.1
Config file: ./deepfm_bin.json
[21d09h02m26s][HUGECTR][INFO]: batchsize_eval is not specified using default: 512
[21d09h02m26s][HUGECTR][INFO]: Default evaluation metric is AUC without threshold value
[21d09h02m26s][HUGECTR][INFO]: algorithm_search is not specified using default: 1
[21d09h02m26s][HUGECTR][INFO]: Algorithm search: ON
[21d09h02m26s][HUGECTR][INFO]: cuda_graph is not specified using default: 1
[21d09h02m26s][HUGECTR][INFO]: CUDA Graph: ON
[21d09h02m26s][HUGECTR][INFO]: Initial seed is 3545387129
[21d09h02m28s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[21d09h02m30s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[21d09h02m30s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[21d09h02m30s][HUGECTR][INFO]: num_internal_buffers 1
[21d09h02m30s][HUGECTR][INFO]: num_internal_buffers 1
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR]
DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] [HCDEBUG][ERROR] 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderErrorDataHeaderErrorDataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::DataHeaderErrorDataHeaderError:58
58 [HCDEBUG][ERROR]
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::[HCDEBUG][ERROR] 58[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58

DataHeaderError[HCDEBUG][ERROR] DataHeaderError58[HCDEBUG][ERROR]
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] [HCDEBUG][ERROR] :DataHeaderError58 /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
DataHeaderError58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] 58
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError :58
:58DataHeaderError[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
::58:5858[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError58: 58

[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :58
58 [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError: /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58:

DataHeaderError[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::DataHeaderError58

[HCDEBUG][ERROR] DataHeaderError:[HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] 58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp 58
[HCDEBUG][ERROR] DataHeaderError58[HCDEBUG][ERROR] DataHeaderError:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError :[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::5858[HCDEBUG][ERROR]
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[21d09h02m30s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=1737709

/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69

58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69

58
:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: 5858
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR]
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58:

DataHeaderError 5858DataHeaderError58[HCDEBUG][ERROR] 58DataHeaderError
58:58

/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError:[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError
58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError:[HCDEBUG][ERROR] 58 ::
[HCDEBUG][ERROR] DataHeaderError:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :

58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58DataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError
58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
58
58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69