Comments (5)
We find the reason that cause the error:
the dense dim is 0 , in v2.1 , it's ok because label tensor and dense tensor share the same buffer and the model config work even if densen dim is 0. However, in v2.2 , they use different tensor , and when dense dim is 0 , the cudaMalloc will not allocate memory for the tensor.
There are two ways to fix this error:
- when tensor dim is 0 , the GeneralBuffer should throw some error that explicitly show the error.
- Or when the tensor dim is 0 , the GeneralBuffer ptr_ should not be nullptr.
// general_buffer.hpp
void init(int device_id) {
if (initialized_ != false) CK_THROW_(Error_t::IllegalCall, "Initilized general buffer");
device_id_ = device_id;
CudaDeviceContext context(device_id);
if (current_offset_ > 0) {
CK_CUDA_THROW_(cudaMalloc((void**)&ptr_, current_offset_ * sizeof(T)));
CK_CUDA_THROW_(cudaMemset(ptr_, 0, current_offset_ * sizeof(T)));
}
initialized_ = true;
}
from hugectr.
Thank you for your feedback, we have encountered this problem before but after the fix below. https://github.com/NVIDIA/HugeCTR/blob/master/HugeCTR/src/data_collector.cu#L64
It never comes up. We will try to reproduce the error.
from hugectr.
Thank you for your feedback, we have encountered this problem before but after the fix below. https://github.com/NVIDIA/HugeCTR/blob/master/HugeCTR/src/data_collector.cu#L64
It never comes up. We will try to reproduce the error.
The error can be reproduced when the dense dim is zero and the coming layer use the dense data( it's ok in v2.1):
{
"name": "concat1",
"type": "Concat",
"bottom": ["reshape1","dense"],
"top": "concat1"
}
from hugectr.
Thank you, we will fix this error in next patch by:
+++ b/HugeCTR/src/parser.cpp
@@ -99,6 +99,9 @@ InputOutputInfo get_input_tensor_and_output_name(
if (!find_item_in_map(tensor, bstr, tensor_list)) {
CK_THROW_(Error_t::WrongInput, "No such bottom: " + bstr);
}
+ if(tensor->get_num_elements() <= 0){
+ CK_THROW_(Error_t::WrongInput, "Empty bottom: " + bstr);
+ }
bottom_tensors.push_back(tensor);
}
return {bottom_tensors, top_strs};
from hugectr.
@zehuanw I think we can close this.
from hugectr.
Related Issues (20)
- [Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? HOT 1
- [Question] Difference between Embedding Training Cache and GPU Embedding Cache HOT 9
- Support for configuration issues HOT 1
- [Question] How can I pre-calculate the GPU memory required for embedding cache size? HOT 2
- [Question] nv_gpu_cache compiling problem HOT 1
- [BUG] Encountered ETC error of din model when training with multiple keyset. HOT 3
- Trouble installing hugectr_backend for Triton Server HOT 1
- sok-experiment static_map empty_key_sentinel and reclaimed_key_sentinel is not right for int64 [BUG] HOT 4
- [BUG] CUDNN_STATUS_MAPPING_ERROR with cudnnSetStream HOT 21
- build docker failed with 401 Unauthorized (Set Up the Development Environment With Merlin Containers) HOT 4
- [BUG]preprocess.sh 1 criteo failed with 'Schema' object has no attribute 'write' HOT 1
- [Question] Is there pipeline mechanism to help the lookup requests always be handled on device cache in HugeCTR? HOT 1
- [Question] How to dump incremental model to kafka in Release 23.12? HOT 2
- [BUG] Run sok tests error HOT 1
- [BUG] Seg Fault When Deploying TF+HPS Model with merlin-tensorflow HOT 9
- [BUG] cudaErrorIllegalAddress: an illegal memory access was encounteredThread HOT 4
- [BUG]build failed on gtest! HOT 5
- [Question] Can i read parquet data from HDFS? HOT 6
- [Question] Is there any related architecture design or documentation for embedding collection HOT 2
- [BUG] Enabling regularization causes CUDNN_STATUS_MAPPING_ERROR for deepfm example HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hugectr.