Comments (23)
Since DebugString() is to print warning/error info, I commented this line out in generate_proto_def.cc. But other errors follow with protoc not finding tensorflow proto definitions:
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.
from lingvo.
Can you re-run with --print_actions --verbose_failures
?
from lingvo.
bazel build --verbose_failures -c opt //lingvo:trainer
WARNING: Output base '/ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c' is on NFS. This may lead to surprising failures and undetermined behavior.
INFO: Analysed target //lingvo:trainer (0 packages loaded).
INFO: Found 1 target...
ERROR: /ec/site/disks/aipg_lab_home_pool_01/kdatta1/TensorFlow/lingvo/lingvo/core/BUILD:339:1: Executing genrule //lingvo/core:inference_graph_py_pb2_genpy failed (Exit 1): bash failed: error executing command
(cd /ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c/execroot/__main__ && \
exec env - \
LD_LIBRARY_PATH=/nfs/pdx/home/kdatta1/MKL-DNN/mklml_lnx_2019.0.3.20190220/lib:/usr/lib64:/nfs/pdx/home/kdatta1/openmpi/lib \
PATH=/opt/intel/compilers_and_libraries_2018.3.222/linux/bin/intel64:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/bin:/nfs/pdx/home/kdatta1/anaconda2/envs/anaconda2-python-tf-1.12/bin:/nfs/pdx/home/kdatta1/anaconda2/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/nfs/pdx/home/kdatta1/openmpi/bin:/nfs/pdx/home/kdatta1/openmpi/bin \
/bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh;
mkdir -p bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$;
tar -C bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ -xf bazel-out/host/genfiles/lingvo/tf_protos.tar;
external/protobuf_protoc/bin/protoc --proto_path=bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ --proto_path=. --python_out=bazel-out/k8-opt/genfiles lingvo/core/inference_graph.proto;
rm -rf bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$
')
Use --sandbox_debug to see verbose messages from the sandbox
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/graph.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/types.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/protobuf/saver.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/protobuf/saver.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/types.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/graph.proto but not used.
Target //lingvo:trainer failed to build
from lingvo.
That's very strange. Could you modify the tool to add this line:
std::cout << "File: " << output_filepath << " = " << dot_proto->DebugString() << std::endl;
from lingvo.
How will that help? The build with my current toolchain fails now as it can't find DebugSting()
from lingvo.
The tool first generates the protos. For some reason, we suspect that the .proto
files generated are empty. We want to know if it gets there and why they would be empty. You can check the tarball at bazel-out/host/genfiles/lingvo/tf_protos.tar
.
from lingvo.
OK, reading more carefully, I see what you mean. You can't uncomment that line out. Can go back to the original version, then run bazel with --print_actions --verbose_failures
?
from lingvo.
#23 has same problem.
from lingvo.
Can you run with --verbose_failures --print_actions
? I need to see the command that was used to link. Also --link_opts=-vv
. Then the next step is to use nm
on generate_proto_def.o
and nm
the library to find out which symbol is defined and why they don't match.
from lingvo.
(tf1.12_py3.5) [luban@luban-351 lingvo]$ bazel print_action -c opt //lingvo:trainer_test --verbose_failures
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer_test (31 packages loaded).
INFO: Found 1 target...
ERROR: /nfs/project/zhanghui/lingvo/lingvo/tools/BUILD:98:1: Linking of rule '//lingvo/tools:generate_proto_def' failed (Exit 1): gcc failed: error executing command
(cd /home/luban/.cache/bazel/_bazel_luban/b5ef85f1c360696308ba7ab9000cfd03/execroot/__main__ && \
exec env - \
LD_LIBRARY_PATH=/usr/local/lib:/nfs/project/tools/anaconda3/pkgs/cudnn-7.2.1-cuda9.2_0/lib:/nfs/project/tools/anaconda3/pkgs/cudatoolkit-9.2-0/lib:/usr/local/nccl_2.3.7-1+cuda10.0_x86_64/lib/:/usr/local/cuda-9.0/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64: \
PATH=/nfs/project/tools/openfst1.6.2/bin/:/nfs/project/tools/packages/kaldi-master/src/bin:/nfs/project/tools/packages/kaldi-master/src/fstbin/:/nfs/project/tools/packages/kaldi-master/src/gmmbin/:/nfs/project/tools/packages/kaldi-master/src/featbin/:/nfs/project/tools/packages/kaldi-master/src/lm/:/nfs/project/tools/packages/kaldi-master/src/sgmmbin/:/nfs/project/tools/packages/kaldi-master/src/sgmm2bin/:/nfs/project/tools/packages/kaldi-master/src/fgmmbin/:/nfs/project/tools/packages/kaldi-master/src/latbin/:/nfs/project/tools/packages/kaldi-master/src/nnetbin:/nfs/project/tools/packages/kaldi-master/src/nnet2bin/:/nfs/project/tools/packages/kaldi-master/src/kwsbin:/nfs/project/tools/packages/kaldi-master/tools/sph2pipe_v2.5:/nfs/project/tools/packages/kaldi-master/src/ivectorbin:/tools/kaldi-io/build/bin:/nfs/project/tools/anaconda3/envs/tf1.12_py3.5/bin:/nfs/project/tools/anaconda3/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/luban/.local/bin:/home/luban/bin:/home/luban/.local/bin:/home/luban/bin \
PWD=/proc/self/cwd \
/usr/bin/gcc -o bazel-out/host/bin/lingvo/tools/generate_proto_def '-Wl,-rpath,$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib' -Lbazel-out/host/bin/_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -B/usr/bin -pass-exit-codes -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/lingvo/tools/generate_proto_def-2.params)
Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDescriptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer_test failed to build
INFO: Elapsed time: 94.812s, Critical Path: 7.43s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
(tf1.12_py3.5) [luban@luban-351 lingvo]$ nm bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o | grep U
U _Unwind_Resume
U _ZN10tensorflow19DataType_descriptorEv
U _ZN10tensorflow8GraphDef10descriptorEv
U _ZN10tensorflow8SaverDef10descriptorEv
U _ZNK6google8protobuf14FileDescriptor10dependencyEi
U _ZNK6google8protobuf14FileDescriptor11DebugStringEv
U _ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm
U _ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm
U _ZNSs4_Rep10_M_destroyERKSaIcE
U _ZNSs4_Rep10_M_disposeERKSaIcE
U _ZNSs4_Rep20_S_empty_rep_storageE
U _ZNSs6appendEPKcm
U _ZNSs6appendERKSs
U _ZNSsC1EPKcRKSaIcE
U _ZNSsC1ERKSs
U _ZNSt12__basic_fileIcED1Ev
U _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode
U _ZNSt13basic_filebufIcSt11char_traitsIcEE5closeEv
U _ZNSt13basic_filebufIcSt11char_traitsIcEEC1Ev
U _ZNSt13basic_filebufIcSt11char_traitsIcEED1Ev
U _ZNSt14basic_ofstreamIcSt11char_traitsIcEED1Ev
U _ZNSt6localeD1Ev
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
U _ZNSt8ios_baseC2Ev
U _ZNSt8ios_baseD2Ev
U _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E
U _ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate
U _ZSt11_Hash_bytesPKvmm
U _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
U _ZSt17__throw_bad_allocv
U _ZTTSt14basic_ofstreamIcSt11char_traitsIcEE
U _ZTVSt13basic_filebufIcSt11char_traitsIcEE
U _ZTVSt14basic_ofstreamIcSt11char_traitsIcEE
U _ZTVSt15basic_streambufIcSt11char_traitsIcEE
U _ZTVSt9basic_iosIcSt11char_traitsIcEE
U _ZdlPv
U _Znwm
U __cxa_atexit
U __cxa_begin_catch
U __cxa_end_catch
U __cxa_rethrow
U __dso_handle
U __gxx_personality_v0
U __stack_chk_fail
U memcmp
U memset
from lingvo.
So, it's trying to link:
$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib
which maps to
_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib/libtensorflow_framework.so
which should be a symlink to something like
/usr/local/lib/python2.7/dist-packages/tensorflow/libtensorflow_framework.so
if you nm
this, you should find the symbol with a T
.
<address> T _ZNK6google8protobuf14FileDescriptor11DebugStringEv
from lingvo.
for how many steps the model will be trained ?
from lingvo.
Was the problem resolved?
from lingvo.
Was the problem resolved?
Dear @Raviteja1996 , looks like the problem was resolved. The build log is for reference
mironov@23ba9b0d756c:~/lingvo$ python -c "import tensorflow as tf;print(tf.__version__)"
1.14.1-dev20190327
mironov@23ba9b0d756c:~/lingvo$ bazel build -c opt //lingvo:trainer
WARNING: detected http_proxy set in env, setting no_proxy for localhost.
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer (35 packages loaded).
INFO: Found 1 target...
Target //lingvo:trainer up-to-date:
bazel-bin/lingvo/trainer
INFO: Elapsed time: 14.628s, Critical Path: 8.28s
INFO: 22 processes: 22 processwrapper-sandbox.
INFO: Build completed successfully, 29 total actions
Thank you.
from lingvo.
@drpngx @grwlf @zh794390558 I meet the similar problem. But after reading your discussion, I still have no idea about how to solve it. Could you please tell me more detailed operations step by step? Thank you so much!
Here is my log:
from lingvo.
@iamxiaoyubei I have the same problem, can you tell me how to resolve it
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::GraphDef::descriptor()'
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::SaverDef::descriptor()'
from lingvo.
@fangelyuan I have a bug with "undefined reference to tensorflow..." because I installed both tensorflow and tf-nightly. Just uninstall tensorflow and install tf-nightly.
In addition, I am using the tf-nightly-gpu version 1.14.1-dev20190426, and I have encountered some other problems when installing the latest version. So I suggest you install this version.
Hope can help.
from lingvo.
@iamxiaoyubei can i add your WECHAT
from lingvo.
@iamxiaoyubei lingvo is based on tensorflow. when you uninstall tensorflow, can it work normal?
from lingvo.
@fangelyuan Sorry, I don't want to add people on WeChat and I don't often read WeChat except after work. So, if you have any questions, you can communicate on github or send an email. If I see and have time, I will respond to you in time.
Yes, it can work. Tf-nightly is the latest version of tensorflow. You can check the intro of tf-nightly online.
from lingvo.
@iamxiaoyubei thanks I success to build trainer. now i test transformer model , i hope you can help me when i encounter problem
thanks
from lingvo.
@Raviteja1996 I have the same problem. I build tensorflow (v1.15.0 commit: 590d6ee) from source with gcc 5.4 and bazel 0.25.2. Then build lingvo (commit: 8926ece), the problem occurred. I found that there's a flag "-D_GLIBCXX_USE_CXX11_ABI=0" in the file lingvo/lingvo/lingvo.bzl, so it can not find the symbol "_ZNK6google8protobuf14FileDescriptor11DebugStringEv" in libtensorflow_framework.so, it's "_ZNK6google8protobuf14FileDescriptor11DebugStringB5cxx11Ev" exactly in libtensorflow_framework.so. So, changing "-D_GLIBCXX_USE_CXX11_ABI=0" to "-D_GLIBCXX_USE_CXX11_ABI=1" solve the problem. Hope it can help you.
from lingvo.
i can confirm that this problem still exists at HEAD but it probably only happens in specific build environment
the following one-liner will fix it
zhangqiaorjc@xxx:~/lingvo/lingvo$ git diff
diff --git a/lingvo/lingvo.bzl b/lingvo/lingvo.bzl
index 01928bbc..eb69faa3 100644
--- a/lingvo/lingvo.bzl
+++ b/lingvo/lingvo.bzl
@@ -4,7 +4,7 @@ load("@subpar//:subpar.bzl", "par_binary")
def tf_copts():
# TODO(drpng): autoconf this.
- return ["-D_GLIBCXX_USE_CXX11_ABI=0", "-Wno-sign-compare", "-mavx"] + select({
+ return ["-D_GLIBCXX_USE_CXX11_ABI=1", "-Wno-sign-compare", "-mavx"] + select({
"//lingvo:cuda": ["-DGOOGLE_CUDA=1"],
"//conditions:default": [],
})
from lingvo.
Related Issues (20)
- How can I get the logits for one whole sequence in the asr task? HOT 3
- when will the deepfusion code be released? HOT 1
- Car models seem to be disabled for now
- Bazel build failure
- Learnable Align Attention Implementation HOT 1
- DeepFusion Readme HOT 4
- DeepFusion reproduce HOT 11
- Cannot run trainer.py with --model=car.waymo_deepfusion.DeepFusionCenterPointPed, undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb HOT 4
- Cannot import py_camera_model_ops from waymo_open_dataset.camera.ops HOT 3
- images
- Question about DeepFusion
- question about RandomVectorQuantizer
- cannot import name "hyperparams_pb2' from lingvo.core' how to deal with HOT 1
- Switch from prebuilt protoc to build from source
- Raw dependency on "//third_party/py/flax/training:checkpoints"
- unreplicate_metrics=True fails on my training
- Feature request: lingvo.jax.asserts.HasShape HOT 1
- RFC: lingvo.jax exception flag mechanism
- how to intall lingvo on mac m1
- Failed to compile with bazel-7.2.0 (on arm64) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lingvo.