Comments (6)
Is the question related to some code from this repository?
from fastertransformer.
yes. the file path : DeepLearningExamples/FasterTransformer/tools/gemm_test/gemm_fp16.cu
from fastertransformer.
Compute Type means what precision will be used in the accumulator. CUDA_R_32F (fp32 precision) has higher precision. For Faster Transformer, CUDA_R_16F is ok.
from fastertransformer.
if I set compute type as CUDA_R_32F in gemm_fp16.cu ,what impact on the performance compare with compute type CUDA_R_16F
from fastertransformer.
Using CUDA_R_32F invokes different gemm kernels, which are a little bit slower than the kernels invoked by CUDA_R_16F. You can use nvprof to see what kernels are invoked.
from fastertransformer.
Thanks a lot.
from fastertransformer.
Related Issues (20)
- fastertransformer/utils/nccl_utils.cc:62 'unhandled cuda error'
- terminate called after throwing an instance of 'std::runtime_error'
- src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error
- How to calculate local batch size?
- CUDA code compile error with clang: function template partial specialization is not allowed
- Incorrect inline ptx device assembly code usage
- cuSPARSELt is slower? HOT 1
- Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
- Supporting for expert parallelism in MoE inference
- Is llama2 70b supported? Do you know minimal configuration? HOT 1
- How to serving multi-gpu inference? HOT 1
- How to get started?
- Sparsity support
- repetition_penalty logic in FT has bug HOT 1
- can support decoder only bart? such as MBartForCausalLM
- error You need C++17 to compile PyTorch
- Does FasterTransformer support multi-stream pipeline parallelism ?
- multi_block_mode performance issue HOT 1
- Confidence is not returned in the decoding example?
- on H800 can not exec nvidia/pytorch:23.09-py3 container success
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastertransformer.