GithubHelp home page GithubHelp logo

Comments (10)

byshiue avatar byshiue commented on May 15, 2024

I can successfully run the bert in FP16. But I do not find the file of "run_classifier_fastertf.py", I run the fp16 by the following commands:
" python profile_transformer_inference.py --init_checkpoint=tmp.ckpt --tf_profile=false --output_dir=mrpc_output --profiling_output_file=time_elapsed --xla=false --floatx=float16"
where tmp_ckpt is the converted fp16 ckpt.
Can you provide more details about how to reproduce your problem?

from fastertransformer.

chenlin038 avatar chenlin038 commented on May 15, 2024

I also face this problem, can anyone give me some suggestions.
Compiling command:
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow ..
GPU: TitanV
cuda:10.1
driver/nvidia/version:418.40.04
python2.7
Tensorflow 1.14.0
cmake:3.9.2

I can run ./bin/encoder_gemm 1 300 8 48 0
but step: /search/anaconda2/envs/pytf14/bin/python encoder_sample.py --batch_size 1 --seq_len 300 --head_number 8 --size_per_head 48 --num_layer 8 --data_type fp32 --test_time 1, i got:

Traceback (most recent call last):
File "encoder_sample.py", line 84, in
attention_mask=attention_mask)
File "/search/DeepLearningExamples/FasterTransformer/v2/build/utils/encoder.py", line 303, in op_encoder
os.path.join('./lib/libtf_fastertransformer.so'))
File "/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./lib/libtf_fastertransformer.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

from fastertransformer.

byshiue avatar byshiue commented on May 15, 2024

I also face this problem, can anyone give me some suggestions.
Compiling command:
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow ..
GPU: TitanV
cuda:10.1
driver/nvidia/version:418.40.04
python2.7
Tensorflow 1.14.0
cmake:3.9.2

I can run ./bin/encoder_gemm 1 300 8 48 0
but step: /search/anaconda2/envs/pytf14/bin/python encoder_sample.py --batch_size 1 --seq_len 300 --head_number 8 --size_per_head 48 --num_layer 8 --data_type fp32 --test_time 1, i got:

Traceback (most recent call last):
File "encoder_sample.py", line 84, in
attention_mask=attention_mask)
File "/search/DeepLearningExamples/FasterTransformer/v2/build/utils/encoder.py", line 303, in op_encoder
os.path.join('./lib/libtf_fastertransformer.so'))
File "/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./lib/libtf_fastertransformer.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

Have you tried the image "nvcr.io/nvidia/tensorflow:19.07-py2"?

from fastertransformer.

chenlin038 avatar chenlin038 commented on May 15, 2024

I also face this problem, can anyone give me some suggestions.
Compiling command:
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow ..
GPU: TitanV
cuda:10.1
driver/nvidia/version:418.40.04
python2.7
Tensorflow 1.14.0
cmake:3.9.2
I can run ./bin/encoder_gemm 1 300 8 48 0
but step: /search/anaconda2/envs/pytf14/bin/python encoder_sample.py --batch_size 1 --seq_len 300 --head_number 8 --size_per_head 48 --num_layer 8 --data_type fp32 --test_time 1, i got:
Traceback (most recent call last):
File "encoder_sample.py", line 84, in
attention_mask=attention_mask)
File "/search/DeepLearningExamples/FasterTransformer/v2/build/utils/encoder.py", line 303, in op_encoder
os.path.join('./lib/libtf_fastertransformer.so'))
File "/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./lib/libtf_fastertransformer.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

Have you tried the image "nvcr.io/nvidia/tensorflow:19.07-py2"?

I don't use the image, is it necessary? I think my environment should meet the requirements.

from fastertransformer.

byshiue avatar byshiue commented on May 15, 2024

I also face this problem, can anyone give me some suggestions.
Compiling command:
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow ..
GPU: TitanV
cuda:10.1
driver/nvidia/version:418.40.04
python2.7
Tensorflow 1.14.0
cmake:3.9.2
I can run ./bin/encoder_gemm 1 300 8 48 0
but step: /search/anaconda2/envs/pytf14/bin/python encoder_sample.py --batch_size 1 --seq_len 300 --head_number 8 --size_per_head 48 --num_layer 8 --data_type fp32 --test_time 1, i got:
Traceback (most recent call last):
File "encoder_sample.py", line 84, in
attention_mask=attention_mask)
File "/search/DeepLearningExamples/FasterTransformer/v2/build/utils/encoder.py", line 303, in op_encoder
os.path.join('./lib/libtf_fastertransformer.so'))
File "/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./lib/libtf_fastertransformer.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

Have you tried the image "nvcr.io/nvidia/tensorflow:19.07-py2"?

I don't use the image, is it necessary? I think my environment should meet the requirements.

It is not necessary. It is hard for us to make sure that the FT can run in any environment. For example, installing the TensorFlow by different methods may cause different problem.

Generally, Customer should be able to run the FasterTransformer (FT) in the image successfully, and make sure that they understand how to use the FT. And then test the FT in their environments. If they cannot run the FT in your environment, this means that they may need to install or modify somethings for the environment.

You can provide more information and details and we will try to help.

from fastertransformer.

chenlin038 avatar chenlin038 commented on May 15, 2024

@byshiue thanks for your help.
Firstly, the main reason I don't try the image "nvcr.io/nvidia/tensorflow:19.07-py2" is that the environment I'm using is a docker container with tf-1.12. So I think I can only install FT without the image.
Secondly, because the tf version(tf1.12) does not meet the requirements, I created an tf1.14 environment through anaconda(the path is: /search/anaconda2/envs/pytf14/bin/python).
Then I make sure that my environment meets the following requirements:
CMake >= 3.8(3.9.2)
CUDA 10.1
Python 2.7
Tensorflow 1.14(conda envs)
GPU TitanV.
Then I followed the instructions to install FT:
①Clone the repository.
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/FasterTransformer/v2
git submodule init
git submodule update
②Build the project.
ln -s /search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so.1 /search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so
(Is there a problem with this command?)
mkdir -p build
cd build
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_TF=ON -DTF_PATH=/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow ..
③Generate the gemm_config.in file:
./bin/encoder_gemm 1 300 8 48 0 (there is no errors)
④Finally, i run the encoder in TensorFlow:
/search/anaconda2/envs/pytf14/bin/python encoder_sample.py
--batch_size 1
--seq_len 300
--head_number 8
--size_per_head 48
--num_layer 8
--data_type fp32
--test_time 1
and i got:
Traceback (most recent call last):
File "encoder_sample.py", line 84, in
attention_mask=attention_mask)
File "/search/DeepLearningExamples/FasterTransformer/v2/build/utils/encoder.py", line 303, in op_encoder
os.path.join('./lib/libtf_fastertransformer.so'))
File "/search/anaconda2/envs/pytf14/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./lib/libtf_fastertransformer.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs.

The above are the details of all my attempts to install FT, thanks for any help!

from fastertransformer.

byshiue avatar byshiue commented on May 15, 2024

Sorry, I try to install the TensorFlow by the anaconda2, but still not able to reproduce your problem.
A possible solution is, try to use the gcc/g++4.8 to build the project. In my experience, TensorFlow 1.14 has some problem when I use other version of gcc/g++.

from fastertransformer.

chenlin038 avatar chenlin038 commented on May 15, 2024

@byshiue
the gcc version in my container is 4.8.5. So I think the problem might be in TensorFlow of anaconda.Besides, I tried another method, i.e., I used a new container with tf1.14 version. Then the above problem does not exist, but there is another problem(Please refer to this link for more information, https://github.com/NVIDIA/DeepLearningExamples/issues/436).
Thanks again for your help!

from fastertransformer.

byshiue avatar byshiue commented on May 15, 2024

It seems that this bug is solved.
Please re-open this bug if you still have problem.

from fastertransformer.

ChrisMii avatar ChrisMii commented on May 15, 2024

@byshiue
the gcc version in my container is 4.8.5. So I think the problem might be in TensorFlow of anaconda.Besides, I tried another method, i.e., I used a new container with tf1.14 version. Then the above problem does not exist, but there is another problem(Please refer to this link for more information, https://github.com/NVIDIA/DeepLearningExamples/issues/436).
Thanks again for your help!

HI, I met the same problem as you do, could you share your solution to me. I create a virtual conda env with configuration of python2.7 ,tensorflow-gpu1.14, and other settings like gcc are the same as yours. thanks.

from fastertransformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.