GithubHelp home page GithubHelp logo

Comments (23)

kdatta avatar kdatta commented on June 26, 2024

Since DebugString() is to print warning/error info, I commented this line out in generate_proto_def.cc. But other errors follow with protoc not finding tensorflow proto definitions:
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

Can you re-run with --print_actions --verbose_failures?

from lingvo.

kdatta avatar kdatta commented on June 26, 2024
bazel build --verbose_failures -c opt //lingvo:trainer
WARNING: Output base '/ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c' is on NFS. This may lead to surprising failures and undetermined behavior.
INFO: Analysed target //lingvo:trainer (0 packages loaded).
INFO: Found 1 target...
ERROR: /ec/site/disks/aipg_lab_home_pool_01/kdatta1/TensorFlow/lingvo/lingvo/core/BUILD:339:1: Executing genrule //lingvo/core:inference_graph_py_pb2_genpy failed (Exit 1): bash failed: error executing command
  (cd /ec/site/disks/aipg_lab_home_pool_01/kdatta1/.cache/bazel/_bazel_kdatta1/5093b1640050e5eba5263415894f442c/execroot/__main__ && \
  exec env - \
    LD_LIBRARY_PATH=/nfs/pdx/home/kdatta1/MKL-DNN/mklml_lnx_2019.0.3.20190220/lib:/usr/lib64:/nfs/pdx/home/kdatta1/openmpi/lib \
    PATH=/opt/intel/compilers_and_libraries_2018.3.222/linux/bin/intel64:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/bin:/nfs/pdx/home/kdatta1/anaconda2/envs/anaconda2-python-tf-1.12/bin:/nfs/pdx/home/kdatta1/anaconda2/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/nfs/pdx/home/kdatta1/openmpi/bin:/nfs/pdx/home/kdatta1/openmpi/bin \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh;
          mkdir -p bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$;
          tar -C bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ -xf bazel-out/host/genfiles/lingvo/tf_protos.tar;
          external/protobuf_protoc/bin/protoc --proto_path=bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$ --proto_path=. --python_out=bazel-out/k8-opt/genfiles lingvo/core/inference_graph.proto;
          rm -rf bazel-out/k8-opt/genfiles/lingvo/core/tf_proto.$$
        ')

Use --sandbox_debug to see verbose messages from the sandbox
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/graph.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/framework/types.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
[libprotobuf WARNING ../../../../../src/google/protobuf/compiler/parser.cc:562] No syntax specified for the proto file: tensorflow/core/protobuf/saver.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.)
lingvo/core/inference_graph.proto:81:14: "tensorflow.DataType" is not defined.
lingvo/core/inference_graph.proto:31:12: "tensorflow.GraphDef" is not defined.
lingvo/core/inference_graph.proto:37:12: "tensorflow.SaverDef" is not defined.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/protobuf/saver.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/types.proto but not used.
lingvo/core/inference_graph.proto: warning: Import tensorflow/core/framework/graph.proto but not used.
Target //lingvo:trainer failed to build

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

That's very strange. Could you modify the tool to add this line:

std::cout << "File: " << output_filepath << " = " << dot_proto->DebugString() << std::endl;

from lingvo.

kdatta avatar kdatta commented on June 26, 2024

How will that help? The build with my current toolchain fails now as it can't find DebugSting()

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

The tool first generates the protos. For some reason, we suspect that the .proto files generated are empty. We want to know if it gets there and why they would be empty. You can check the tarball at bazel-out/host/genfiles/lingvo/tf_protos.tar.

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

OK, reading more carefully, I see what you mean. You can't uncomment that line out. Can go back to the original version, then run bazel with --print_actions --verbose_failures?

from lingvo.

zh794390558 avatar zh794390558 commented on June 26, 2024

#23 has same problem.

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

Can you run with --verbose_failures --print_actions? I need to see the command that was used to link. Also --link_opts=-vv. Then the next step is to use nm on generate_proto_def.o and nm the library to find out which symbol is defined and why they don't match.

from lingvo.

zh794390558 avatar zh794390558 commented on June 26, 2024
(tf1.12_py3.5) [luban@luban-351 lingvo]$ bazel print_action -c opt //lingvo:trainer_test --verbose_failures                                                                                                        
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer_test (31 packages loaded).
INFO: Found 1 target...
ERROR: /nfs/project/zhanghui/lingvo/lingvo/tools/BUILD:98:1: Linking of rule '//lingvo/tools:generate_proto_def' failed (Exit 1): gcc failed: error executing command 
  (cd /home/luban/.cache/bazel/_bazel_luban/b5ef85f1c360696308ba7ab9000cfd03/execroot/__main__ && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/lib:/nfs/project/tools/anaconda3/pkgs/cudnn-7.2.1-cuda9.2_0/lib:/nfs/project/tools/anaconda3/pkgs/cudatoolkit-9.2-0/lib:/usr/local/nccl_2.3.7-1+cuda10.0_x86_64/lib/:/usr/local/cuda-9.0/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64: \
    PATH=/nfs/project/tools/openfst1.6.2/bin/:/nfs/project/tools/packages/kaldi-master/src/bin:/nfs/project/tools/packages/kaldi-master/src/fstbin/:/nfs/project/tools/packages/kaldi-master/src/gmmbin/:/nfs/project/tools/packages/kaldi-master/src/featbin/:/nfs/project/tools/packages/kaldi-master/src/lm/:/nfs/project/tools/packages/kaldi-master/src/sgmmbin/:/nfs/project/tools/packages/kaldi-master/src/sgmm2bin/:/nfs/project/tools/packages/kaldi-master/src/fgmmbin/:/nfs/project/tools/packages/kaldi-master/src/latbin/:/nfs/project/tools/packages/kaldi-master/src/nnetbin:/nfs/project/tools/packages/kaldi-master/src/nnet2bin/:/nfs/project/tools/packages/kaldi-master/src/kwsbin:/nfs/project/tools/packages/kaldi-master/tools/sph2pipe_v2.5:/nfs/project/tools/packages/kaldi-master/src/ivectorbin:/tools/kaldi-io/build/bin:/nfs/project/tools/anaconda3/envs/tf1.12_py3.5/bin:/nfs/project/tools/anaconda3/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/home/luban/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/luban/.local/bin:/home/luban/bin:/home/luban/.local/bin:/home/luban/bin \
    PWD=/proc/self/cwd \
  /usr/bin/gcc -o bazel-out/host/bin/lingvo/tools/generate_proto_def '-Wl,-rpath,$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib' -Lbazel-out/host/bin/_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -B/usr/bin -pass-exit-codes -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/lingvo/tools/generate_proto_def-2.params)

Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDescriptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer_test failed to build
INFO: Elapsed time: 94.812s, Critical Path: 7.43s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
(tf1.12_py3.5) [luban@luban-351 lingvo]$ nm bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o | grep U          
                 U _Unwind_Resume
                 U _ZN10tensorflow19DataType_descriptorEv
                 U _ZN10tensorflow8GraphDef10descriptorEv
                 U _ZN10tensorflow8SaverDef10descriptorEv
                 U _ZNK6google8protobuf14FileDescriptor10dependencyEi
                 U _ZNK6google8protobuf14FileDescriptor11DebugStringEv
                 U _ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm
                 U _ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm
                 U _ZNSs4_Rep10_M_destroyERKSaIcE
                 U _ZNSs4_Rep10_M_disposeERKSaIcE
                 U _ZNSs4_Rep20_S_empty_rep_storageE
                 U _ZNSs6appendEPKcm
                 U _ZNSs6appendERKSs
                 U _ZNSsC1EPKcRKSaIcE
                 U _ZNSsC1ERKSs
                 U _ZNSt12__basic_fileIcED1Ev
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEE5closeEv
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEEC1Ev
                 U _ZNSt13basic_filebufIcSt11char_traitsIcEED1Ev
                 U _ZNSt14basic_ofstreamIcSt11char_traitsIcEED1Ev
                 U _ZNSt6localeD1Ev
                 U _ZNSt8ios_base4InitC1Ev
                 U _ZNSt8ios_base4InitD1Ev
                 U _ZNSt8ios_baseC2Ev
                 U _ZNSt8ios_baseD2Ev
                 U _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E
                 U _ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate
                 U _ZSt11_Hash_bytesPKvmm
                 U _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
                 U _ZSt17__throw_bad_allocv
                 U _ZTTSt14basic_ofstreamIcSt11char_traitsIcEE
                 U _ZTVSt13basic_filebufIcSt11char_traitsIcEE
                 U _ZTVSt14basic_ofstreamIcSt11char_traitsIcEE
                 U _ZTVSt15basic_streambufIcSt11char_traitsIcEE
                 U _ZTVSt9basic_iosIcSt11char_traitsIcEE
                 U _ZdlPv
                 U _Znwm
                 U __cxa_atexit
                 U __cxa_begin_catch
                 U __cxa_end_catch
                 U __cxa_rethrow
                 U __dso_handle
                 U __gxx_personality_v0
                 U __stack_chk_fail
                 U memcmp
                 U memset

from lingvo.

drpngx avatar drpngx commented on June 26, 2024

So, it's trying to link:

$ORIGIN/../../_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib

which maps to

_solib_k8/_U@tensorflow_Usolib_S_S_Cframework_Ulib___Uexternal_Stensorflow_Usolib_Stensorflow_Usolib/libtensorflow_framework.so

which should be a symlink to something like

/usr/local/lib/python2.7/dist-packages/tensorflow/libtensorflow_framework.so

if you nm this, you should find the symbol with a T.

<address> T _ZNK6google8protobuf14FileDescriptor11DebugStringEv

from lingvo.

nim456 avatar nim456 commented on June 26, 2024

for how many steps the model will be trained ?

from lingvo.

Raviteja1996 avatar Raviteja1996 commented on June 26, 2024

Was the problem resolved?

from lingvo.

grwlf avatar grwlf commented on June 26, 2024

Was the problem resolved?

Dear @Raviteja1996 , looks like the problem was resolved. The build log is for reference

mironov@23ba9b0d756c:~/lingvo$ python -c "import tensorflow as tf;print(tf.__version__)"
1.14.1-dev20190327
mironov@23ba9b0d756c:~/lingvo$ bazel build -c opt //lingvo:trainer
WARNING: detected http_proxy set in env, setting no_proxy for localhost.
Starting local Bazel server and connecting to it...
INFO: Analysed target //lingvo:trainer (35 packages loaded).
INFO: Found 1 target...
Target //lingvo:trainer up-to-date:
  bazel-bin/lingvo/trainer
INFO: Elapsed time: 14.628s, Critical Path: 8.28s
INFO: 22 processes: 22 processwrapper-sandbox.
INFO: Build completed successfully, 29 total actions

Thank you.

from lingvo.

iamxiaoyubei avatar iamxiaoyubei commented on June 26, 2024

@drpngx @grwlf @zh794390558 I meet the similar problem. But after reading your discussion, I still have no idea about how to solve it. Could you please tell me more detailed operations step by step? Thank you so much!
Here is my log:
WXWorkCapture_15572025722301

from lingvo.

fangelyuan avatar fangelyuan commented on June 26, 2024

@iamxiaoyubei I have the same problem, can you tell me how to resolve it

bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::GraphDef::descriptor()'
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function main: error: undefined reference to 'tensorflow::SaverDef::descriptor()'

from lingvo.

iamxiaoyubei avatar iamxiaoyubei commented on June 26, 2024

@fangelyuan I have a bug with "undefined reference to tensorflow..." because I installed both tensorflow and tf-nightly. Just uninstall tensorflow and install tf-nightly.

In addition, I am using the tf-nightly-gpu version 1.14.1-dev20190426, and I have encountered some other problems when installing the latest version. So I suggest you install this version.

Hope can help.

from lingvo.

fangelyuan avatar fangelyuan commented on June 26, 2024

@iamxiaoyubei can i add your WECHAT

from lingvo.

fangelyuan avatar fangelyuan commented on June 26, 2024

@iamxiaoyubei lingvo is based on tensorflow. when you uninstall tensorflow, can it work normal?

from lingvo.

iamxiaoyubei avatar iamxiaoyubei commented on June 26, 2024

@fangelyuan Sorry, I don't want to add people on WeChat and I don't often read WeChat except after work. So, if you have any questions, you can communicate on github or send an email. If I see and have time, I will respond to you in time.

Yes, it can work. Tf-nightly is the latest version of tensorflow. You can check the intro of tf-nightly online.

from lingvo.

fangelyuan avatar fangelyuan commented on June 26, 2024

@iamxiaoyubei thanks I success to build trainer. now i test transformer model , i hope you can help me when i encounter problem
thanks

from lingvo.

cranehuang avatar cranehuang commented on June 26, 2024

@Raviteja1996 I have the same problem. I build tensorflow (v1.15.0 commit: 590d6ee) from source with gcc 5.4 and bazel 0.25.2. Then build lingvo (commit: 8926ece), the problem occurred. I found that there's a flag "-D_GLIBCXX_USE_CXX11_ABI=0" in the file lingvo/lingvo/lingvo.bzl, so it can not find the symbol "_ZNK6google8protobuf14FileDescriptor11DebugStringEv" in libtensorflow_framework.so, it's "_ZNK6google8protobuf14FileDescriptor11DebugStringB5cxx11Ev" exactly in libtensorflow_framework.so. So, changing "-D_GLIBCXX_USE_CXX11_ABI=0" to "-D_GLIBCXX_USE_CXX11_ABI=1" solve the problem. Hope it can help you.

from lingvo.

zhangqiaorjc avatar zhangqiaorjc commented on June 26, 2024

i can confirm that this problem still exists at HEAD but it probably only happens in specific build environment

the following one-liner will fix it

zhangqiaorjc@xxx:~/lingvo/lingvo$ git diff
diff --git a/lingvo/lingvo.bzl b/lingvo/lingvo.bzl
index 01928bbc..eb69faa3 100644
--- a/lingvo/lingvo.bzl
+++ b/lingvo/lingvo.bzl
@@ -4,7 +4,7 @@ load("@subpar//:subpar.bzl", "par_binary")

 def tf_copts():
     # TODO(drpng): autoconf this.
-    return ["-D_GLIBCXX_USE_CXX11_ABI=0", "-Wno-sign-compare", "-mavx"] + select({
+    return ["-D_GLIBCXX_USE_CXX11_ABI=1", "-Wno-sign-compare", "-mavx"] + select({
         "//lingvo:cuda": ["-DGOOGLE_CUDA=1"],
         "//conditions:default": [],
     })

from lingvo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.