GithubHelp home page GithubHelp logo

changqi1 / deeprec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from deeprec-ai/deeprec

3.0 3.0 9.0 914.24 MB

DeepRec is a recommendation engine based on TensorFlow.

License: Apache License 2.0

Starlark 2.43% Shell 0.49% Batchfile 0.02% Python 33.00% Dockerfile 0.05% CMake 0.14% Makefile 0.07% HTML 3.04% C++ 55.94% Cuda 0.13% Jupyter Notebook 1.89% C 0.58% MLIR 1.32% SWIG 0.11% Cython 0.01% LLVM 0.01% Java 0.57% Objective-C 0.06% Objective-C++ 0.14% Ruby 0.01%

deeprec's People

Contributors

aaroey avatar alextp avatar allenlavoie avatar andrewharp avatar annarev avatar asimshankar avatar benoitsteiner avatar caisq avatar ebrevdo avatar ezhulenev avatar facaiy avatar feihugis avatar gunan avatar hawkinsp avatar ilblackdragon avatar jdduke avatar jsimsa avatar liutongxuan avatar markdaoust avatar martinwicke avatar mihaimaruseac avatar mrry avatar nouiz avatar petewarden avatar rohan100jain avatar skye avatar tensorflower-gardener avatar terrytangyuan avatar yifeif avatar yongtang avatar

Stargazers

 avatar  avatar  avatar

deeprec's Issues

[Graph][Optimization]split+concat fusion to improve performance

split+concat fusion optimization
Goal
Optimize performance through split+concat fusion

Problem Description
In some of recommendation model, there is potential performance gain through split and concat fusion.

The step to reproduce the performance issue will be updated later.

Requirement Details

Test

  • Unit test code and benchmark is needed.
  • Using 1 model from model zoo to validate the performance gain. The performance data and analysis result could be described and reproduced.

Code Style and commit

  • C++ and python: Keep aligned with DeepRec code.

Maintain

  • All of the issue and bugs related with this op need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get better performance.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[Bug] embedding-fusion precision analysis.

Hi @Duyi-Wang , the follow is some UT, which may help you to reduce your validation time.

# Python UT: Include one simple model implement, path at "tensorflow/python/feature_column/feature_column_v2_test.py"
$ bazel test --flaky_test_attempts 1 --test_output=all --nocache_test_results --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 //tensorflow/python/feature_column:feature_column_v2_test

# C++ UT: path at "tensorflow/core/kernels/fused_embedding/embedding_lookup_sparse_op_test.cc"
$ bazel test --flaky_test_attempts 1 --test_output=all --nocache_test_results --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 //tensorflow/core/kernels:embedding_lookup_sparse_op_test

FYI

[Graph] Remove some oneDNN perf drop ops.

     reco_ops_list_ = gtl::FlatSet<string> {
       "BatchMatMul", "BatchMatMulV2", "BiasAdd", "BiasAddGrad",
       "_FusedMatMul", "_FusedBatchMatMul", "_FusedBatchMatMulV2",
-      "Identity", "LeakyRelu", "LeakyReluGrad", "MatMul",
+      "LeakyRelu", "LeakyReluGrad", "MatMul",
       "Relu", "ReluGrad", "Relu6", "Relu6Grad", "Gelu", "GeluGrad",
-      "Tanh", "TanhGrad", "Reshape"
+      "Tanh", "TanhGrad"
     };

[Python] Undefined symbol: _ZTIN10tensorflow8OpKernelE when building deeprec for MLIR Python API.

System information

  • Docker Image: alideeprec/deeprec-build:deeprec-dev-cpu-py38-ubuntu20.04
  • DeepRec version or commit id: 3bc930a
  • Python version: 3.8.10
  • Bazel version (if compiling from source): 0.26.1
  • GCC/Compiler version (if compiling from source): 9.4.0

Describe the problem

I was enabling MLIR Python API in Deeprec. In BUILD, buiding MLIR depends on "//tensorflow/core:ops". So I added "//tensorflow/core:ops" in a BUILD file and built it. But l met an error: undefined symbol: _ZTIN10tensorflow8OpKernelE.

Provide the exact sequence of commands / steps that you executed before running into the problem

Add "//tensorflow/core:ops" in tensorflow/python/BUILD's cc_library( name = "_tf_stack" ) (line 4863). Here is the screenshot after adding"//tensorflow/core:ops":
image

After revising tensorflow/python/BUILD, run:
$ ./configure
$ bazel build -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package
Here is the screenshot of error:
image

Any other info / logs

Include any logs or source code that would be helpful to diagnose the problem.

[Operator][Optimization]Unsorted_segment_sum op optimization

Unsorted_setment_sum opeartor optimization
Goal
Optimize unsorted_segment_sum operator performance

Problem Description
In some of recommendation model, for example, DLRM, the operator unsorted_segment_sum will bring obvious overhead to performance. So it's very important to reduce its cost.

Here is the step to reproduce the performance issue.

  • Collect timeline information with DLRM from modelzoo, "numactl -C 8-15 -l python train.py --steps 100 --timeline 49 --no_eval --interaction_op dot". You will find the timeline shows below.

Capture-unsortedSegmentSum

Requirement Details

  • Rewrite the operator with C++ and intrinsic if possible. Follow the customized op mechanism of Tensorflow. It's better to get benefit of AVX512. And Python API is needed.
  • Integrate the operators into DeepRec and finish the unit test code.
  • Test case: Unit test code is needed. DLRM can be used to test the e2e performance gain. The higher, the better. The performance data needs to be described and reproduced.
  • Code Style and Commit: Keep aligned with DeepRec code for C++ and Python.
  • Maintain: All of the issues and bugs related to this optimization need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get better performance.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[Modelzoo]Rebuild ESMM to update API and enable DeepRec features

Rebuild ESMM to update API and Enable DeepRec Features
Goal
Rebuild ESMM to update API and enable DeepRec Features.

Requirement Details

  • Rebuild ESMM to update API according to the template.( https://github.com/changqi1/DeepRec/blob/modelzoo-template/modelzoo/template.py)
  • Enable DeepRec Features in the code and the features are shown below. The features have been enabled in WDL(#37), and please notice that the comments can be mapped to the features below. Add the flags to enable/disable the features in the code.
  • If there is any problem when enabling the feature below, please describe the details of how to reproduce and what is the issue, especially the known issues below we have submitted to Alibaba.

Features list
Enable the following DeepRec feature(Docs about the features from Alibaba https://deeprec.readthedocs.io/zh/latest/index.html):

  • Enabled By Default and test the AUC/ACC/Gsteps, which needs to be close to the result before rebuilding

8) Auto Micro Batch same with DeepRec-AI#127
9) FusedEmbedding API, embedding fusion
10) Smart Stage same with DeepRec-AI#122
11) Auto Graph Fusion DeepRec-AI#144
12) CPU Memory Optimization:START_STATISTIC_STEP, STOP_STATISTIC_STEP, jemalloc
14) AdamAsync Optimizer
15) BF16

  • Disabled by default and test pass is fine. Don't need to ensure the same performance as before

1) Embedding Variable
7) GRPC++ and StarServer
13) Incremental Checkpoint
14) AdagradDecay
2) EmbeddingVariable advanced features:Embedding Elimination
3) EmbeddingVariable advanced feature:Embedding Filter
4) Dynamic-dimension Embedding Variable
5) Adaptive Embedding
17) WorkQueue

  • Other Features : Disabled by default and test pass is fine. Don't need to ensure the same performance as before. This feature is not supported in feature_column API. We are waiting for Alibaba's update.

6) Multi-Hash Variable

Test

  • All of the features needs to be enabled in the code by adding flags.(WDL is the template)
  • Feature8~15 needs to be enabled by default and test passed with the same performance as before.
  • Other Features need to pass test, not ensure performance. Some of the features have known issues we submitted. If not passed, describe it clearly.

Other Requirements: Dockerfile and Documents

  • Waiting for Alibaba's requirements

Code Style and commit

  • Python: Keep aligned with DeepRec code.

Maintain

  • All of the issues and bugs related to this model need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get the same performance as the code before rebuilding.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[Framework][Optimization]Enabling RDT to improve performace

Enabling RDT technology in DeepRec
Goal
Achieve the feature of LLC cache management through low level API to improve performance

Problem Description
RDT technology may be helpful to improve performance if the cache management can be controlled through low level API in DeepRec, especially for the weights that could stay in LLC. Details design and requirement still needs to confirmed and will be updated once got alignment with customer.
https://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html

[Error Log] Unkown op error due to related op removed.

Because some ops have been removed, some 'unknown op ' errors will occur. Just run WDL model in modelzoo by python train.py --steps 1 --no_eval --tf
Other info / logs


2022-06-21 10:03:02.483799: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSum" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT32 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSum
2022-06-21 10:03:02.483837: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSum" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT32 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSum
2022-06-21 10:03:02.485436: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSum" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT32 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSum
2022-06-21 10:03:02.485458: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSum" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT32 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSum
2022-06-21 10:03:02.485469: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485479: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485488: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485497: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "QuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } }') for unknown op: QuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485510: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_FLOAT } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485520: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_FLOAT } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485530: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_FLOAT } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485540: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_FLOAT } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485549: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_QINT32 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485557: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_QINT32 } } } constraint { name: "out_type" allowed_values { list { type: DT_QINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485565: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_QINT32 } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize
2022-06-21 10:03:02.485576: E tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "_MklQuantizedConv2DWithBiasReluAndSumAndRequantize" device_type: "CPU" constraint { name: "Tinput" allowed_values { list { type: DT_QUINT8 } } } constraint { name: "Tfilter" allowed_values { list { type: DT_QINT8 } } } constraint { name: "Tbias" allowed_values { list { type: DT_QINT32 } } } constraint { name: "out_type" allowed_values { list { type: DT_QUINT8 } } } label: "QuantizedMklOp"') for unknown op: _MklQuantizedConv2DWithBiasReluAndSumAndRequantize


[Modelzoo]Rebuild MMoE to update API and Enable DeepRec Features

Rebuild MMoE to update API and Enable DeepRec Features
Goal
Rebuild MMoE to update API and enable DeepRec Features.

Requirement Details

  • Rebuild MMoE to update API according to the template.( https://github.com/changqi1/DeepRec/blob/modelzoo-template/modelzoo/template.py)
  • Enable DeepRec Features in the code and the features are shown below. The features have been enabled in WDL(#37), and please notice that the comments can be mapped to the features below. Add the flags to enable/disable the features in the code.
  • If there is any problem when enabling the feature below, please describe the details of how to reproduce and what is the issue, especially the known issues below we have submitted to Alibaba.

Features list
Enable the following DeepRec feature(Docs about the features from Alibaba https://deeprec.readthedocs.io/zh/latest/index.html):

  • Enabled By Default and test the AUC/ACC/Gsteps, which needs to be close to the result before rebuilding

8) Auto Micro Batch same with DeepRec-AI#127
9) FusedEmbedding API, embedding fusion
10) Smart Stage same with DeepRec-AI#122
11) Auto Graph Fusion DeepRec-AI#144
12) CPU Memory Optimization:START_STATISTIC_STEP, STOP_STATISTIC_STEP, jemalloc
14) AdamAsync Optimizer
15) BF16

  • Disabled by default and test pass is fine. Don't need to ensure the same performance as before

1) Embedding Variable
7) GRPC++ and StarServer
13) Incremental Checkpoint
14) AdagradDecay
2) EmbeddingVariable advanced features:Embedding Elimination
3) EmbeddingVariable advanced feature:Embedding Filter
4) Dynamic-dimension Embedding Variable
5) Adaptive Embedding
17) WorkQueue

  • Other Features : Disabled by default and test pass is fine. Don't need to ensure the same performance as before. This feature is not supported in feature_column API. We are waiting for Alibaba's update.

6) Multi-Hash Variable

Test

  • All of the features needs to be enabled in the code by adding flags.(WDL is the template)
  • Feature8~15 needs to be enabled by default and test passed with the same performance as before.
  • Other Features need to pass test, not ensure performance. Some of the features have known issues we submitted. If not passed, describe it clearly.

Other Requirements: Dockerfile and Documents

  • Waiting for Alibaba's requirements

Code Style and commit

  • Python: Keep aligned with DeepRec code.

Maintain

  • All of the issues and bugs related to this model need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get the same performance as the code before rebuilding.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[doc] change the compile option in the README.md

将文档中启用onednn的编译方式
Compile for CPU optimization: oneDNN + Unified Eigen Thread pool

$ bazel build  -c opt --config=opt  --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true //tensorflow/tools/pip_package:build_pip_package

Compile for CPU optimization and ABI=0

$ bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true //tensorflow/tools/pip_package:build_pip_package

修改成如下内容, 删除--define build_with_mkl_dnn_v1_only=true, 添加--copt=-march=skylake-avx512
Compile for CPU optimization: oneDNN + Unified Eigen Thread pool

$ bazel build  -c opt --config=opt  --config=mkl_threadpool --copt=-march=skylake-avx512 //tensorflow/tools/pip_package:build_pip_package

Compile for CPU optimization and ABI=0

$ bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt --config=mkl_threadpool --copt=-march=skylake-avx512 //tensorflow/tools/pip_package:build_pip_package

[Graph][Optimization]Reduce weights packing/unpacking overhead during the scenario of multi matmul

Reduce weights packing/unpacking overhead in multi matmul
Goal
Optimize performance through reducing packing/unpacking overhead in multi matmul operations

Problem Description
In some models, there will be multi continuous matmul operations. For each matmul operation, there will be packing/unpacking in order to improve cache locality. Any packing and unpacking will bring cost. So it's possible that packing could be done before the 1st matmul operation and then do unpacking after the last matmul. Below is the picture which can used to described the method.

Capture-packing-opt

Requirement Details

  • Write a matmul with C++ and prepare the baseline code.
  • Finish this optimization to achieve a PoC.
  • Supply unit test code to validate the function.
  • Integrate it into DeepRec through Grappler mechanism.
  • Apply the optimization on 1 model and show the performance data.

Test

  • Using 1 model from model zoo to validate the performance gain. The performance data and analysis result could be described and reproduced.

Code Style and commit

  • C++ and python: Keep aligned with DeepRec code.

Maintain

  • All of the issue and bugs related with this op need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get better performance.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[Operator][Optimization] Embedding Operator Optimizaition

Migrate 6 embedding op to DeepRec and make sure models from model zoo could get benefit from these optimized op.
sparseEmbedding-base (P0), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/sparseEmbedding-base-fp32-avx512.cc
sparseEmbedding-sparseInput (P1), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/sparseEmbedding-sparseInput-fp32-avx512.cc
sparseEmbedding-select (P1), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/sparseEmbedding-select-fp32-avx512.cc
sparseEmbedding-stringsplit (P1), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/sparseEmbedding-stringsplit-fp32-avx512.cc
sparesEmbedding-bucketized (P2), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/sparseEmbedding-bucketized-fp32-avx512.cc
sparseEmbedding-multiweights (P2), kernel from https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt/blob/main/ops/embedding_ops/multi-sparseEmbedding-base-fp32-avx512.cc
https://github.com/intel-sandbox/applications.ai.easyrec.inference-opt

Features Request

  • 6 operators are ready in DeepRec and could be applied in the real models from model zoo with fp32.
  • Finish the embedding op functions with C++ as an operator with grappler mechanism https://www.tensorflow.org/guide/graph_optimization
  • These 6 operators need to be abstracted to an unified embedding operator class

Test

  • At least 1 test case ready in model zoo. For example, apply 6 ops on 1 model from model zoo. The performance data and analysis result could be described and reproduced.

Code Style and commit

  • C++ and python: Keep aligned with DeepRec code.

Maintain

  • All of the issue and bugs related with these 6 op need to be covered in the future.

[BUG] Embedding fusion acc/auc issue

目前定位出两个问题:

  1. 我们fusion的op里面没有做查重筛选。
  2. 我们fusion的op里,重复的input的输出却不一致。
    Fusion 前向gather的输出,标红为重复部分,但是在反向的输出中并没有重复的元素,连相近的结果都不存在,差距都较大。
INFO:tensorflow:input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/GatherV2 = 
[[ 1.2214626   0.5241986   0.22209705 -0.56519794  0.12571095  0.26446024
   0.3907412   0.00005793 -0.45320386 -0.69033164  0.3531894  -0.1513126
   0.00713284 -1.1315851   0.12203985  0.23935615]
 [-1.5699558   0.63231647 -0.5136872   0.18575184 -0.12131955 -1.4859123
   0.8938838  -0.33873808 -0.24968442 -0.47764817  0.36187503 -0.14567816
  -0.24810648 -1.3606204  -0.08617076  0.4501951 ]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [ 0.6147636   0.33874637 -0.7812209   1.2390836   1.8089103  -1.2311537
  -0.43859923  1.3363832  -0.72441924  1.3167928   1.1064852   0.51790696
  -0.24631402  1.2318567   1.4000374  -0.30377945]
 [-0.3499662  -1.789908    0.48219246  0.2007537   0.7334909  -0.01890297
   0.08424582 -0.9799169  -0.35487846  0.17760478  0.7782412   0.01907562
  -0.5430275  -1.0409418  -0.06544966 -0.31106764]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [-0.5124917   0.45528954  0.7462012   0.20852847  1.4730995   0.8039012
  -0.5750134   0.22652298  1.5296302   0.779812    1.460728    0.8999218
   1.5914694   0.8920278  -1.1893805   1.916351  ]]

Fusion grad的输入输出

输出INFO:tensorflow:head/gradients/input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/FusedEmbeddingSparsePostLookUp_grad/FusedEmbeddingSparsePostLookUpGrad = 
[[ 0.00861822  0.0023349  -0.00688701  0.00269023 -0.00164793  0.00736784
  -0.01025489 -0.00652598  0.00471746  0.00888411 -0.00231681  0.00083448
  -0.00203576 -0.00289572  0.00719752 -0.00490604]
 [ 0.00153809 -0.00080411 -0.00177121  0.00086962 -0.00095507  0.00141255
  -0.00152518 -0.0010505   0.00122721  0.00121513 -0.00102509 -0.00052917
  -0.00006346 -0.00056932  0.00194405 -0.00034838]
 [ 0.00154482 -0.00019465 -0.00210896  0.00204525 -0.00301571  0.0016045
  -0.00139093 -0.00215399 -0.00047724  0.00222365  0.00055881 -0.00044712
  -0.00082448  0.00155544  0.00257766  0.00031559]
 [-0.00136614 -0.00111764  0.00218892 -0.00170602  0.00054201 -0.00347374
   0.00119696  0.00144338 -0.00078496 -0.00169556  0.00112028 -0.00118931
  -0.00130751 -0.00075804 -0.00326457 -0.0000487 ]
 [-0.00209746  0.001618    0.00076764  0.00073953  0.00029006 -0.00244137
   0.00196408  0.00168557  0.00034245 -0.00137542 -0.00048502 -0.00033666
  -0.00041434  0.00007309 -0.00190399 -0.00012118]
 [ 0.00003232 -0.00172924  0.00824459 -0.00492863  0.00488566 -0.01386313
   0.00917176  0.0094776   0.00208267 -0.01060377  0.00307963 -0.00334385
   0.00790155 -0.00217232 -0.00438204  0.00839924]
 [-0.003769    0.00456759  0.00200083 -0.00209491  0.00360792 -0.00388297
   0.00045974  0.00181912 -0.00040632 -0.00027408  0.0035839   0.00000899
   0.00114274  0.00211995 -0.00300257  0.00140855]
 [-0.00272977  0.00265015  0.00149372 -0.00279601  0.00245979 -0.00382371
   0.00186041 -0.00059908  0.00158117 -0.00263364  0.00022644 -0.00097163
   0.00043103  0.00025143 -0.00277124 -0.00126195]], 
输入head/gradients/input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/Reshape_grad/Reshape = 
[[-0.00861822 -0.0023349   0.00688701 -0.00269023  0.00164793 -0.00736784
   0.01025489  0.00652598 -0.00471746 -0.00888411  0.00231681 -0.00083448
   0.00203576  0.00289572 -0.00719752  0.00490604]
 [-0.00153809  0.00080411  0.00177121 -0.00086962  0.00095507 -0.00141255
   0.00152518  0.0010505  -0.00122721 -0.00121513  0.00102509  0.00052917
   0.00006346  0.00056932 -0.00194405  0.00034838]
 [-0.00154482  0.00019465  0.00210896 -0.00204525  0.00301571 -0.0016045
   0.00139093  0.00215399  0.00047724 -0.00222365 -0.00055881  0.00044712
   0.00082448 -0.00155544 -0.00257766 -0.00031559]
 [ 0.00136614  0.00111764 -0.00218892  0.00170602 -0.00054201  0.00347374
  -0.00119696 -0.00144338  0.00078496  0.00169556 -0.00112028  0.00118931
   0.00130751  0.00075804  0.00326457  0.0000487 ]
 [ 0.00209746 -0.001618   -0.00076764 -0.00073953 -0.00029006  0.00244137
  -0.00196408 -0.00168557 -0.00034245  0.00137542  0.00048502  0.00033666
   0.00041434 -0.00007309  0.00190399  0.00012118]
 [-0.00003232  0.00172924 -0.00824459  0.00492863 -0.00488566  0.01386313
  -0.00917176 -0.0094776  -0.00208267  0.01060377 -0.00307963  0.00334385
  -0.00790155  0.00217232  0.00438204 -0.00839924]
 [ 0.003769   -0.00456759 -0.00200083  0.00209491 -0.00360792  0.00388297
  -0.00045974 -0.00181912  0.00040632  0.00027408 -0.0035839  -0.00000899
  -0.00114274 -0.00211995  0.00300257 -0.00140855]
 [ 0.00272977 -0.00265015 -0.00149372  0.00279601 -0.00245979  0.00382371
  -0.00186041  0.00059908 -0.00158117  0.00263364 -0.00022644  0.00097163
  -0.00043103 -0.00025143  0.00277124  0.00126195]], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/GatherV2 = 
[[ 1.2214626   0.5241986   0.22209705 -0.56519794  0.12571095  0.26446024
   0.3907412   0.00005793 -0.45320386 -0.69033164  0.3531894  -0.1513126
   0.00713284 -1.1315851   0.12203985  0.23935615]
 [-1.5699558   0.63231647 -0.5136872   0.18575184 -0.12131955 -1.4859123
   0.8938838  -0.33873808 -0.24968442 -0.47764817  0.36187503 -0.14567816
  -0.24810648 -1.3606204  -0.08617076  0.4501951 ]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [ 0.6147636   0.33874637 -0.7812209   1.2390836   1.8089103  -1.2311537
  -0.43859923  1.3363832  -0.72441924  1.3167928   1.1064852   0.51790696
  -0.24631402  1.2318567   1.4000374  -0.30377945]
 [-0.3499662  -1.789908    0.48219246  0.2007537   0.7334909  -0.01890297
   0.08424582 -0.9799169  -0.35487846  0.17760478  0.7782412   0.01907562
  -0.5430275  -1.0409418  -0.06544966 -0.31106764]
 [ 1.395356   -0.85390687 -0.7608217   1.0669864   1.191038    0.88764894
   0.9451067   0.29302412  1.2512774   0.6840943  -0.20568915 -0.32980326
  -0.42660442  0.54374695 -0.9136276   0.04837677]
 [-0.5124917   0.45528954  0.7462012   0.20852847  1.4730995   0.8039012
  -0.5750134   0.22652298  1.5296302   0.779812    1.460728    0.8999218
   1.5914694   0.8920278  -1.1893805   1.916351  ]], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/FusedEmbeddingSparsePreLookUp:0 = 
[2816  903 6681 6681 1309 1777 6681 5311], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/FusedEmbeddingSparsePostLookUp:1 = [1 1 1 1 1 1 1 1], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/fused_embedding_lookup_sparse/FusedEmbeddingSparsePreLookUp:1 = 
[[0 0]
 [1 0]
 [2 0]
 [3 0]
 [4 0]
 [5 0]
 [6 0]
 [7 0]]

Unfusion grad的输入输出:

输出INFO:tensorflow:head/gradients/input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/embedding_lookup_sparse_grad/SparseSegmentMeanGrad = 
[[-0.00861822 -0.0023349   0.00688701 -0.00269023  0.00164793 -0.00736784
   0.01025489  0.00652598 -0.00471746 -0.00888411  0.00231681 -0.00083448
   0.00203576  0.00289572 -0.00719752  0.00490604]
 [-0.00153809  0.00080411  0.00177121 -0.00086962  0.00095507 -0.00141255
   0.00152518  0.0010505  -0.00122721 -0.00121513  0.00102509  0.00052917
   0.00006346  0.00056932 -0.00194405  0.00034838]
 [ 0.00359032 -0.00325531 -0.00208078  0.00175567 -0.00113423  0.00575221
  -0.00026577 -0.00110852  0.00166852 -0.00025401 -0.00526298  0.00162744
   0.00098925 -0.00291735  0.00368948 -0.00167544]
 [ 0.00209746 -0.001618   -0.00076764 -0.00073953 -0.00029006  0.00244137
  -0.00196408 -0.00168557 -0.00034245  0.00137542  0.00048502  0.00033666
   0.00041434 -0.00007309  0.00190399  0.00012118]
 [-0.00003232  0.00172924 -0.00824459  0.00492863 -0.00488566  0.01386313
  -0.00917176 -0.0094776  -0.00208267  0.01060377 -0.00307963  0.00334385
  -0.00790155  0.00217232  0.00438204 -0.00839924]
 [ 0.00272977 -0.00265015 -0.00149372  0.00279601 -0.00245979  0.00382371
  -0.00186041  0.00059908 -0.00158117  0.00263364 -0.00022644  0.00097163
  -0.00043103 -0.00025143  0.00277124  0.00126195]], 
输入head/gradients/input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/embedding_lookup_sparse_grad/strided_slice = 6, 
head/gradients/input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights_grad/tuple/control_dependency_1 = 
[[-0.00861822 -0.0023349   0.00688701 -0.00269023  0.00164793 -0.00736784
   0.01025489  0.00652598 -0.00471746 -0.00888411  0.00231681 -0.00083448
   0.00203576  0.00289572 -0.00719752  0.00490604]
 [-0.00153809  0.00080411  0.00177121 -0.00086962  0.00095507 -0.00141255
   0.00152518  0.0010505  -0.00122721 -0.00121513  0.00102509  0.00052917
   0.00006346  0.00056932 -0.00194405  0.00034838]
 [-0.00154482  0.00019465  0.00210896 -0.00204525  0.00301571 -0.0016045
   0.00139093  0.00215399  0.00047724 -0.00222365 -0.00055881  0.00044712
   0.00082448 -0.00155544 -0.00257766 -0.00031559]
 [ 0.00136614  0.00111764 -0.00218892  0.00170602 -0.00054201  0.00347374
  -0.00119696 -0.00144338  0.00078496  0.00169556 -0.00112028  0.00118931
   0.00130751  0.00075804  0.00326457  0.0000487 ]
 [ 0.00209746 -0.001618   -0.00076764 -0.00073953 -0.00029006  0.00244137
  -0.00196408 -0.00168557 -0.00034245  0.00137542  0.00048502  0.00033666
   0.00041434 -0.00007309  0.00190399  0.00012118]
 [-0.00003232  0.00172924 -0.00824459  0.00492863 -0.00488566  0.01386313
  -0.00917176 -0.0094776  -0.00208267  0.01060377 -0.00307963  0.00334385
  -0.00790155  0.00217232  0.00438204 -0.00839924]
 [ 0.003769   -0.00456759 -0.00200083  0.00209491 -0.00360792  0.00388297
  -0.00045974 -0.00181912  0.00040632  0.00027408 -0.0035839  -0.00000899
  -0.00114274 -0.00211995  0.00300257 -0.00140855]
 [ 0.00272977 -0.00265015 -0.00149372  0.00279601 -0.00245979  0.00382371
  -0.00186041  0.00059908 -0.00158117  0.00263364 -0.00022644  0.00097163
  -0.00043103 -0.00025143  0.00277124  0.00126195]], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/embedding_lookup_sparse/UniqueWithCounts:1 = [0 1 2 2 3 4 2 5], 
input_layer/sparse_input_layer/input_layer/C10_embedding/C10_embedding_weights/embedding_lookup_sparse/Cast = [0 1 2 3 4 5 6 7]

Undefined symbol: _ZN10tensorflow8GraphDefC1Ev when building Python MLIR

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.04): Ubuntu 20.04
  • DeepRec version or commit id: git clone -b add_mlir_python_support https://github.com/374365283/DeepRec-mlir-python.git
  • Python version: 3.6.12
  • Bazel version (if compiling from source): 0.26.1
  • GCC/Compiler version (if compiling from source): 7.5.0

Describe the problem
I am trying to add support for MLIR Python API in Deeprec.
Use PYBIND11 to define the Python API in tensorflow/python/util/tf_stack.cc which depends on tensorflow/core/compiler/mlir/python/mlir.cc and mlir.h.
Then add "//tensorflow/compiler/mlir/python:mlir" in tensorflow/python/BUILD _tf_stack's deps list.

After compiling, I met the error: Undefined symbol: _ZN10tensorflow8GraphDefC1Ev.

The most likely reason is that in line79 of tensorflow/core/compiler/mlir/python/mlir.cc, GraphDef depends on graph.pb.h and grap.pb.cc. Even if "protos_all_cc" is already in _tf_stack's deps tree, it still can't find the definition in grap.pb.cc.

Provide the exact sequence of commands / steps that you executed before running into the problem
$ ./configure
$ bazel build -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package
Undefined symbol: _ZN10tensorflow8GraphDefC1Ev
image

[Modelzoo]Rebuild DBMTL to update API and Enable DeepRec Features

Rebuild DBMTL to update API and Enable DeepRec Features
Goal
Rebuild DBMTL to update API and enable DeepRec Features.

Requirement Details

  • Rebuild DBMTL to update API according to the template.( https://github.com/changqi1/DeepRec/blob/modelzoo-template/modelzoo/template.py)
  • Enable DeepRec Features in the code and the features are shown below. The features have been enabled in WDL(#37), and please notice that the comments can be mapped to the features below. Add the flags to enable/disable the features in the code.
  • If there is any problem when enabling the feature below, please describe the details of how to reproduce and what is the issue, especially the known issues below we have submitted to Alibaba.

Features list
Enable the following DeepRec feature(Docs about the features from Alibaba https://deeprec.readthedocs.io/zh/latest/index.html):

  • Enabled By Default and test the AUC/ACC/Gsteps, which needs to be close to the result before rebuilding

8) Auto Micro Batch same with DeepRec-AI#127
9) FusedEmbedding API, embedding fusion
10) Smart Stage same with DeepRec-AI#122
11) Auto Graph Fusion DeepRec-AI#144
12) CPU Memory Optimization:START_STATISTIC_STEP, STOP_STATISTIC_STEP, jemalloc
14) AdamAsync Optimizer
15) BF16

  • Disabled by default and test pass is fine. Don't need to ensure the same performance as before

1) Embedding Variable
7) GRPC++ and StarServer
13) Incremental Checkpoint
14) AdagradDecay
2) EmbeddingVariable advanced features:Embedding Elimination
3) EmbeddingVariable advanced feature:Embedding Filter
4) Dynamic-dimension Embedding Variable
5) Adaptive Embedding
17) WorkQueue

  • Other Features : Disabled by default and test pass is fine. Don't need to ensure the same performance as before. This feature is not supported in feature_column API. We are waiting for Alibaba's update.

6) Multi-Hash Variable

Test

  • All of the features needs to be enabled in the code by adding flags.(WDL is the template)
  • Feature8~15 needs to be enabled by default and test passed with the same performance as before.
  • Other Features need to pass test, not ensure performance. Some of the features have known issues we submitted. If not passed, describe it clearly.

Other Requirements: Dockerfile and Documents

  • Waiting for Alibaba's requirements

Code Style and commit

  • Python: Keep aligned with DeepRec code.

Maintain

  • All of the issues and bugs related to this model need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get the same performance as the code before rebuilding.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[Graph][Optimization] Concat+cast fusion to improve performance

concat+cast fusion optimization
Goal
Optimize performance through concat+cast fusion

Problem Description
In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.

Here is the step to reproduce the performance issue.

  • Collect timeline information with DLRM from modelzoo, "numactl -C 8-15 -l python train.py --steps 100 --timeline 49 --no_eval --interaction_op dot --bf16". You will find the timeline shows below.
    image

Requirement Details

  • Fusion 2 operators concat and cast into 1 operator. Both of the forward and backward operations need to be covered. And make sure it could be applied in the real models DLRM at least.
  • Follow grappler mechanism https://www.tensorflow.org/guide/graph_optimization
  • Unit test code and benchmark code are needed.

Test

  • Using DLRM to validate the performance gain. The performance data and analysis result could be described and reproduced.

Code Style and commit

  • C++ and python: Keep aligned with DeepRec code.

Maintain

  • All of the issue and bugs related with this op need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get better performance.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[UT] status: Internal: Missing 0-th output from {{node MatMul_1}}

Steps to reproduce

default_opts="
             --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 \
             --copt=-O2 \
             --copt=-Wformat \
             --copt=-Wformat-security \
             --copt=-fstack-protector \
             --copt=-fPIC \
             --copt=-fpic \
             --linkopt=-znoexecstack \
             --linkopt=-zrelro \
             --linkopt=-znow \
             --linkopt=-fstack-protector"

mkl_opts="--config=mkl_threadpool \
           --define build_with_mkl_dnn_v1_only=true \
           --copt=-DENABLE_INTEL_MKL_BFLOAT16 \
           --copt=-march=skylake-avx512"

test_opts="--nocache_test_results \
           --test_output=all \
           --verbose_failures \
           --test_verbose_timeout_warnings \
           --flaky_test_attempts 1 \
           --test_timeout 99999999 \
           --test_size_filters=small,medium,large,enormous \
           -c opt \
           --keep_going"

bazel test ${default_opts} ${mkl_opts} ${test_opts} -- //tensorflow/core/grappler/optimizers:mkl_remapper_test

image

[Modelzoo]Rebuild SimpleMultiTask to update API and Enable DeepRec Features

Rebuild SimpleMultiTask to update API and Enable DeepRec Features
Goal
Rebuild SimpleMultiTask to update API and enable DeepRec Features.

Requirement Details

  • Rebuild SimpleMultiTask to update API according to the template.( https://github.com/changqi1/DeepRec/blob/modelzoo-template/modelzoo/template.py)
  • Enable DeepRec Features in the code and the features are shown below. The features have been enabled in WDL(#37), and please notice that the comments can be mapped to the features below. Add the flags to enable/disable the features in the code.
  • If there is any problem when enabling the feature below, please describe the details of how to reproduce and what is the issue, especially the known issues below we have submitted to Alibaba.

Features list
Enable the following DeepRec feature(Docs about the features from Alibaba https://deeprec.readthedocs.io/zh/latest/index.html):

  • Enabled By Default and test the AUC/ACC/Gsteps, which needs to be close to the result before rebuilding

8) Auto Micro Batch same with DeepRec-AI#127
9) FusedEmbedding API, embedding fusion
10) Smart Stage same with DeepRec-AI#122
11) Auto Graph Fusion DeepRec-AI#144
12) CPU Memory Optimization:START_STATISTIC_STEP, STOP_STATISTIC_STEP, jemalloc
14) AdamAsync Optimizer
15) BF16

  • Disabled by default and test pass is fine. Don't need to ensure the same performance as before

1) Embedding Variable
7) GRPC++ and StarServer
13) Incremental Checkpoint
14) AdagradDecay
2) EmbeddingVariable advanced features:Embedding Elimination
3) EmbeddingVariable advanced feature:Embedding Filter
4) Dynamic-dimension Embedding Variable
5) Adaptive Embedding
17) WorkQueue

  • Other Features : Disabled by default and test pass is fine. Don't need to ensure the same performance as before. This feature is not supported in feature_column API. We are waiting for Alibaba's update.

6) Multi-Hash Variable

Test

  • All of the features needs to be enabled in the code by adding flags.(WDL is the template)
  • Feature8~15 needs to be enabled by default and test passed with the same performance as before.
  • Other Features need to pass test, not ensure performance. Some of the features have known issues we submitted. If not passed, describe it clearly.

Other Requirements: Dockerfile and Documents

  • Waiting for Alibaba's requirements

Code Style and commit

  • Python: Keep aligned with DeepRec code.

Maintain

  • All of the issues and bugs related to this model need to be covered in the future.

Definition of Done

  • Run successfully in DeepRec and could get the same performance as the code before rebuilding.
  • Integrated into DeepRec successfully and commit the code follow DeepRec commit standard.

[UT] //tensorflow/python/kernel_tests/segment_reduction_ops_test does not work

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
    image

Here is the problem
https://github.com/changqi1/DeepRec-deprecated/issues/49#issuecomment-1015280420

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.