neargye / hello_tf_c_api Goto Github PK

View Code? Open in Web Editor NEW

464.0 464.0 134.0 14.29 MB

Neural Network TensorFlow C API

License: MIT License

CMake 9.12% C++ 90.88%

api c cpp deep-learning deep-neural-networks machine-learning neural-network tensorflow

hello_tf_c_api's People

Contributors

Stargazers

Watchers

Forkers

anguoyang leeyevi barongeng klast fcccode linecode codacy-badger mehditlili dmxj finger563 alexliyang nmarwen kevindarby laisun zgsxwsdxg matiasmolinas ankit-derq smanthe zpllz shorexshot suraj-pai xwang0415 wangguangyuan allensmile 10183308 h00shi trendingtechnology mapoet el1995 hajungong007 wangdeyu geotyper hbulaoma duocang tbphu corberan itinterpret stonemason11 harshkn rozgo olivereriklarsson neocats syuko5460 xero-lib tianxiancode ailib jvelilla vesor xingxingxudong zcy618 tjuhenryli bowenzhao123 sweezin lmwxj manuelschmitzberger nicholasshatokhin ifrit98 jasonfinish gaohunter colin-pr xonxt eyalpe skycloudwind liuweiming 92xianshen gongjie437 cndcnd085 cmeta yijia2012 xingjieqin hellobwjung ghostjat yxliang xiaowenhe seewoo79 tkschmidt highbury2006 xiaming11 thanasispap litleo elephantjohn artneer orangebaowang kanul xzmagic copperdong lkampoli skyformat99 lusu2004 tyu-mit sebastianbk yashiang1986 superowner kanuleader daisywd ergouu salary-only-17k 5l1v3r1 letter24 seven-xu

hello_tf_c_api's Issues

how to limit GPU usage in this api?

I want to set the GPU memory fraction and allow growth options for C.

batch inference

Hi @Neargye and thank you for all your great examples. I'm trying to achieve batch classification but with very limited success.

I have a graph that can accept an arbitrary number of images as input (tensor dims = {-1, 32, 32, 3}) and let's say that a patch can be one of five classes. I will call IM_i (with i in [0,4]) the images belonging to those classes.

If I populate the input tensor with data from a single image IM_i the output tensor is correct, its data[i] is the biggest.

I then proceeded to append 2 or more images data in the input tensor's buffer but the output is not correct. I expected to see data[0-4] for the first image, data[5-9] for the second one ecc... but this behavior is only for the first image even though each image is classified correctly if taken in isolation.

But then I experienced something weird. If the input tensor is populated with this order of data:
IM_2, IM_3, IM_x, IM_y, IM_w (x, y, w in [0,4]) data[0-4] will hold the right output values for IM_2 and data[20-24] will hold the output values for IM_3 which made me think that input data shouldn't be simply appended, but I cannot find any documentation for bare C language.

Have you experienced something similar? If so can you provide an example where batch inference is performed?

Thank you in advance!

failed work on win10

Following your steps, win10 did not succeed. There is no symbol file loaded for tensorflow.dll.I put tensorflow.dll in the code directory

TF_INVALID_ARGUMENT

Same as this issue https://github.com/Neargye/hello_tf_c_api/issues/46
I got TF_INVALID_ARGUMENT when TF_SessionRun
I still no found any solution can solve it.

My OS is Windows 10, use VS 2017.
I base on src/session_run.cpp modify for my need as below

It's used FaceNet model, It has two input operation

input : dimensional is 1 * 160 * 160 * 3, float type
phase_train : dimensional is 1, boolean type

and one float type output operation "embeddings"
On Python, it's a success.

If needs .pb file to help me test.
I stored to google drive https://drive.google.com/drive/folders/1VJM2fkOW1saPtk3NZp3WSlGLR1jerpkO?usp=sharing

Here have two .pb files :
FaceNet.pb is my issue.
FaceNet_Non_BoolPlaceholder.pb is disable "phase_train" placeholder. when use this .pb, I can got TF_OK

#include "tf_utils.hpp"
#include <scope_guard.hpp>
#include <iostream>
#include <vector>

int main() {
  auto graph = tf_utils::LoadGraph("FaceNet.pb");
  SCOPE_EXIT{ tf_utils::DeleteGraph(graph); }; // Auto-delete on scope exit.
  if (graph == nullptr) {
    std::cout << "Can't load graph" << std::endl;
    return 1;
  }

  std::vector<const char*> input_op_name;
  input_op_name.push_back("input");
  input_op_name.push_back("phase_train");

  std::vector<TF_Output> input_op;
  for (int i = 0; i < input_op_name.size(); i++) {
	  auto op = TF_Output{ TF_GraphOperationByName(graph, input_op_name[i]), 0 };
	  if (op.oper == nullptr) {
		  std::cout << "Can't init operation : " << input_op_name[i] << std::endl;
		  return 2;
	  }
	  input_op.push_back(op);
  }

  const std::vector<std::int64_t> input_dims = {1, 160, 160, 3};
  std::vector<float> input_vals(input_dims[1] * input_dims[2] * input_dims[3]);
  for (int i = 0; i < input_vals.size(); i++)
	  input_vals[i] = 1;

  const std::vector<std::int64_t> input_dims2 = { 1 };
  std::vector<std::uint8_t> input_vals2 = { 0 };


  const std::vector<TF_Tensor*> input_tensor = { tf_utils::CreateTensor(TF_FLOAT, input_dims, input_vals)
	  , tf_utils::CreateTensor(TF_BOOL, input_dims2, input_vals2) };
  
  //auto input_tensor = tf_utils::CreateTensor(TF_FLOAT, input_dims, input_vals);
  SCOPE_EXIT{ tf_utils::DeleteTensor(input_tensor[0]); }; // Auto-delete on scope exit.
  SCOPE_EXIT{ tf_utils::DeleteTensor(input_tensor[1]); };

  auto out_op = TF_Output{TF_GraphOperationByName(graph, "embeddings"), 0};
  if (out_op.oper == nullptr) {
    std::cout << "Can't init out_op" << std::endl;
    return 3;
  }

  TF_Tensor* output_tensor = nullptr;
  SCOPE_EXIT{ tf_utils::DeleteTensor(output_tensor); }; // Auto-delete on scope exit.

  auto status = TF_NewStatus();
  SCOPE_EXIT{ TF_DeleteStatus(status); }; // Auto-delete on scope exit.
  auto options = TF_NewSessionOptions();
  auto sess = TF_NewSession(graph, options, status);
  TF_DeleteSessionOptions(options);

  if (TF_GetCode(status) != TF_OK) {
    return 4;
  }

  TF_SessionRun(sess,
                nullptr, // Run options.
                input_op.data(), input_tensor.data(), 1, // Input tensors, input tensor values, number of inputs.
                &out_op, &output_tensor, 1, // Output tensors, output tensor values, number of outputs.
                nullptr, 0, // Target operations, number of targets.
                nullptr, // Run metadata.
                status // Output status.
                );

  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error run session : " << (int)TF_GetCode(status);
    return 5;
  }

  TF_CloseSession(sess, status);
  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error close session";
    return 6;
  }

  TF_DeleteSession(sess, status);
  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error delete session";
    return 7;
  }

  auto data = static_cast<float*>(TF_TensorData(output_tensor));

  std::cout << "Output vals: " << data[0] << ", " << data[1] << ", " << data[2] << ", " << data[3] << std::endl;

  return 0;
}

win10 vs2019 tensorflow.dll not found

Hi. I had run this in my mac and windows.

For mac, it can work. But for windows, it complains about "tensorflow.dll not found" when I try to execute the built hello_tf.exe.

The German is "The code cannot be run because tensorflow.dll cannot be found".

No error, but it constantly says about "no need to re-run" balabala...

When I try to run:

I have checked the CMakelists.txt, it looks good. And tensorflow folder with lib and include are good too.

I am a bit struggling. Can you give some advices？

Just a small error-check bug

hello_tf_c_api/src/session_run.cpp

Line 54 in ad1c752

if (input_op.oper == nullptr) {

how to set TF_Operation(target_opers) in TF_SessionRun;

hi , I am confused about TF_SessionRun and how to set target_opers and ntargets
would you like to tell me how to set it?

Memory leak during inference with frozen graph

Hi Neargye!

Thanks once again for all the help on my previous issue - I've made some pretty neat progress on using the C API for interesting problems. However, as I try to scale up inference on larger codes - I am running into some pretty troubling issues when it comes to memory leaks. Let me try and explain it through a minimum working example. In the following, I have a simple fully connected neural network that takes in 9 inputs and predicts one output. The code performs exactly as I would expect it to and the answers it gives me are accurate.

#include "tf_utils.hpp"
#include "scope_guard.hpp"
#include <iostream>
#include <vector>

int main() {
  auto graph_ = tf_utils::LoadGraph("ML_LES.pb");
  SCOPE_EXIT{ tf_utils::DeleteGraph(graph_); }; // Auto-delete on scope exit.
  if (graph_ == nullptr) {
    std::cout << "Can't load graph" << std::endl;
    return 1;
  }

  auto input_ph_ = TF_Output{TF_GraphOperationByName(graph_, "input_placeholder"), 0};
  if (input_ph_.oper == nullptr) {
    std::cout << "Can't init input_ph_" << std::endl;
    return 2;
  }

  const std::vector<std::int64_t> input_dims = {1, 9};
  const std::vector<float> input_vals = {4.948654692193851069e-05,-1.416845935576197153e-03,1.695804398322601982e-04,-4.909234209068177434e-05,7.200956380997814788e-04,-3.949331152012949186e-07,1.155548212380012041e-01,-1.447936297672789625e-05,-1.249577196433397854e-05,4.991843687885162174e-03};

  auto input_tensor = tf_utils::CreateTensor(TF_FLOAT, input_dims, input_vals);
  SCOPE_EXIT{ tf_utils::DeleteTensor(input_tensor); }; // Auto-delete on scope exit.

  auto output_ = TF_Output{TF_GraphOperationByName(graph_, "output_value/BiasAdd"), 0};
  if (output_.oper == nullptr) {
    std::cout << "Can't init output_" << std::endl;
    return 3;
  }

  TF_Tensor* output_tensor = nullptr;

  auto status = TF_NewStatus();
  SCOPE_EXIT{ TF_DeleteStatus(status); }; // Auto-delete on scope exit.
  auto options = TF_NewSessionOptions();
  auto sess = TF_NewSession(graph_, options, status);
  TF_DeleteSessionOptions(options);

  if (TF_GetCode(status) != TF_OK) {
    return 4;
  }

  TF_SessionRun(sess,
                nullptr, // Run options.
                &input_ph_, &input_tensor, 1, // Input tensors, input tensor values, number of inputs.
                &output_, &output_tensor, 1, // Output tensors, output tensor values, number of outputs.
                nullptr, 0, // Target operations, number of targets.
                nullptr, // Run metadata.
                status // Output status.
                );

  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error run session";
    return 5;
  }

  TF_CloseSession(sess, status);
  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error close session";
    return 6;
  }

  TF_DeleteSession(sess, status);
  if (TF_GetCode(status) != TF_OK) {
    std::cout << "Error delete session";
    return 7;
  }

  auto data = static_cast<float*>(TF_TensorData(output_tensor));

  std::cout << "Output vals: " << data[0] << std::endl;

  return 0;
}

I compile this code with your tensorflow utilities as g++ mwe.cpp tf_utils.cpp -ltensorflow and obtain an executable a.out. Running this as ./a.out works exactly as expected (should give you an answer 1.02169 (using the *.pb file I have attached to this post). Now what I need is to run this inference multiple times across multiple nodes (using MPI) which once again behaves as I'd expect it to. However, I am very quickly running out of memory when I try to do this. I performed a valgrind check on this simple example using:

valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes ./a.out

and I get a laundry list of errors which I am having trouble interpreting. Do you have some experience dealing with this? Any help is much appreciated.

Thanks once again for the excellent work!

MWE.zip

How to generated the graph.pb file

Hi @Neargye ,

I am curious how did you generate the https://github.com/Neargye/hello_tf_c_api/blob/master/models/graph.pb file?

I tried to call tf.saved_model.Builder as in my sample program; it generates exported_model/saved_model.pb. However, when I re-ran your load_graph.cpp example, the import graph API failed.

I have another program that calls the Go API, which, to my understanding, is a binding built on top of the C API, also failed at importing the graph.

Is that you created graph.pb not by calling tf.saved_model.Builder but some other API?

Memory leak?

In load_graph shouldn't there be a TF_DeleteStatus(status) at about line 90?

I'm debugging my own (Pascal) code which is based on yours so want to keep everything balanced

It looks like I need a TF_DeleteStatus but as you didn't do it I'm not sure

Error when Using Tensorflow-1.13.1-gpu lib.

Thanks for your work on TF c api.
When I was trying to compile your work on Ubuntu18.04 and g++ with the Tensorflow 1.13.1-gpu library, error occurred.
The error information is as follow:
tf_utils.hpp:94:24: error: there are no arguments to ‘TF_TensorElementCount’ that depend on a template parameter, so a declaration of ‘TF_TensorElementCount’ must be available.
So I guess the tf_utils is using some functions has not been defined in Tensorflow gpu library?

GPU dll

Hi,

Do you have GPU version dll?
where could I download it?
do you have example code to run GPU session?

Thanks,

learn from images

Hello,

Would like to pass a dataset of images (PNG images) to tensorflow to make future prediction.
Like for example I have an image of a certain type of flower, I know the name, size etc and I want to create a model of this flower.

How to do that with C++ API ?
Thanks

How to use text file as a input of frozen graph?

hi,

I wanted to read a text file as an input for a frozen graph. My input data size is (160,140) that means text file has 160 rows and 140 columns. I don't know how can I use it in this function const std::vector input_vals = { }

session_run hangs on GPU (libtensorflow-gpu)

aws p3.2xlarge
Ubuntu 18.04
apt install cuda-10-0 libcudnn7
NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC 2019
GCC version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

I use libtensorflow-gpu downloaded from releases/download/v1.15.0

I tried to run ./session_run with default cpu library - it works.

But when I use libtensorflow-gpu and cuda-10-0 ./session_run hangs

hello_tf_c_api/build$ ./session_run 
2020-01-03 04:13:51.031664: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-03 04:13:51.057833: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300090000 Hz
2020-01-03 04:13:51.058443: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55861d788270 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-03 04:13:51.058470: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-03 04:13:51.059562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-03 04:13:51.094764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.095723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
2020-01-03 04:13:51.095977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-01-03 04:13:51.097513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-01-03 04:13:51.098956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-01-03 04:13:51.099274: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-01-03 04:13:51.101088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-01-03 04:13:51.102464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-01-03 04:13:51.107224: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-03 04:13:51.107310: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.108270: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.109164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-01-03 04:13:51.109195: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-01-03 04:13:51.208390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-03 04:13:51.208441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-01-03 04:13:51.208458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-01-03 04:13:51.208610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.209584: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.210544: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-03 04:13:51.211475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15052 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
2020-01-03 04:13:51.669698: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0

one cpu core is utilized for 100%

%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                         
 2118 ubuntu    20   0 22.663g 1.577g 261832 S  99.7  2.6   3:58.73 session_run

Build instructions for VS 2017-2019

Hi, I want to test code in VS 2017 and get compile errors. I tried to copy all code to new VS project and add session_run.cpp to it sources, also I add c_api.h tensorflow.lib folders to project paths and copy tensorflow.dll to exeecution path, but have error
C2065 TF_TensorElementCount: undeclared identifier NeargyeTest ...\neargyetest\neargyetest\tf_utils.hpp 94
Can you provide more detailed guide how to import and compile tests using VS?

question about this library

hi,

what is the difference between this library and the official tf library https://www.tensorflow.org/install/lang_c?

it seems your library can also load a tensorflow model file - and run the scoring? how fast is it - do you have any benchmarks?

Thanks.

GetTensorsData function error

Hi, I have just loaded a model trained in TensorFlow python, after run the session for a given input some problems happen:

If this function is used, good results are obtained BUT after some iterations the output size is not the same (should be 256 and I get error getting data[128])
const auto data = static_cast<float*>(TF_TensorData(output_tensor));

the output is:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22.0965, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77.3298, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 85.9351, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 190.352, 0, 0, 0, 47.3773, 0, 0, 0, 0, 114.226, 0, 0, 0, 0, 0, 0, 57.9978, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66.6906, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0698555, -0.0425288, 0.251769, -0.146165, 0.0398469, 0.0603849, -0.0709948, 0.0525762, 0.0905565, -0.139356, -0.0768467, -0.106491, 0.272678, -0.00842083, -0.151077, -0.077257, 0.244632, 0.191824, 0.12273, -0.0615471, 0.0851486, -0.0163544, -0.0207504, -0.110528, 0.364326, -0.011672, -0.0672661, 0.095574, 0.195031, -0.0513856, 0.148535, 0.0222169, 0.0812069, -0.0500609, 0.167545, 0.0214508, -0.21389, 0.134474, -0.00433608, 0.00449877, -0.0329179, -0.0248889, -0.10523, -0.0159402, -0.156834, -0.0937081, 0.112361, 0.104516, 0.0367866, 0.0374388, -0.158911, -0.0944544, -0.119521, 0.1857

If this function from tf_utils is used, the results are so different, there are some nan and high values.
const auto data = tf_utils::GetTensorsData<float>(output_tensor);

the output is:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22.0965, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 77.3298, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 85.9351, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 190.352, 0, 0, 0, 47.3773, 0, 0, 0, 0, 114.226, 0, 0, 0, 0, 0, 0, 57.9978, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66.6906, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.91582e+25, 0, -2.43938e+35, 0, 5.5223e+11, 0, -2.63315e+35, 0, 0, 0, -nan, 0, -2.93408e+35, 0, 3.09013e+25, 0, 0, 0, 0, 0, 1.91574e+25, 0, -2.43936e+35, 0, 0.00164726, 0, 1.75442e+12, 0, 0, 0, 0, 0, 1.91574e+25, 0, 0.00164726, 0, 0, 0, -nan, 0, -2.9095e+35, 0, 3.09015e+25, 0, 0, 0, -nan, 0, -2.92585e+35, 0, 3.09014e+25, 0, 0, 0, -2.43937e+35, 0, 1.91582e+25, 0, 5.5223e+11, 0, 0, 0, -nan, 0, -2.91753e+35, 0, 3.09014e+25, 0, 0, 0, 0, 0, 1.9159e+25, 0, -2.4394e+35, 0, 8.35422e-10, 0, -2.63228e+35, 0, 0, 0, 0, 0, 1.9159e+25, 0, 8.35422e-10, 0, 0, 0, 0, 0, 0, 0, -2.43987

There is a issue in the GetTensorsData function?
Anybody know why the output could have different size?

Issue in running sample c_api example

Hello @Neargye :
I have download the pre-built tensorflow lib file from your github link and trying to run the sample program on Windows 10 64-built.
I am getting the following error:
main.obj : error LNK2019: unresolved external symbol __imp_TF_Version referenced in function main
I am running the program using qt creator please find the details:
Pro file(Linking the library and defining the include path):

INCLUDEPATH +=C:\Users\jhon\Downloads\sample\include
LIBS += -LC:\Users\jhon\Downloads\sample\lib
-ltensorflow

Sample Program:

#include <QCoreApplication>
#if defined(_MSC_VER) && !defined(COMPILER_MSVC)
#  define COMPILER_MSVC // Set MSVC visibility of exported symbols in the shared library.
#endif

#if defined(_MSC_VER)
#  pragma warning(push)
#  pragma warning(disable : 4996)
#  pragma warning(disable : 4190)
#endif

#include <tensorflow/c/c_api.h> // TensorFlow C API header
#include <iostream>

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    std::cout << "TensorFlow Version: " << TF_Version() << std::endl;

    return a.exec();
}

Actually i am facing this issue with all the dll's which i have downloaded.
I want to know whether its issue of the dll or i am doing some silly mistake to run the program

32-bit Compilation

How can I compile for 32-bit please?

How to create Tensor of TF_BOOL?

I want to create a Tensor of 1-dimensional boolean type
I tried
`
const std::vectorstd::int64_t bool_dims = { 1 };
std::vector bool_vals(1);
bool_vals[0] = false;

std::vector<TF_Tensor*> input_tensors = { CreateTensor(TF_BOOL, bool_dims, bool_vals) }
`
Unfortunately, I found std::vector nonsupport data member function
http://www.cplusplus.com/reference/vector/vector-bool/

So I modified to
std::vector<TF_Tensor*> input_tensors = { CreateTensor(TF_BOOL , bool_dims.data(), bool_dims.size() , &bool_vals[0], bool_vals.size() * sizeof(bool)) }

then I can successfully get an input_tensors success.
but when TF_SessionRun, I got the TF_Code = TF_INVALID_ARGUMENT

I tried to use the same model, just disable the placeholder of boolean type, it's work fine
but for some reason, I need this bool placeholder.

So, how can I correctly create a Tensor of boolean type?

Thanks for Help

Linking errors after following all the steps

Hello,

I followed all of the steps that you mentioned in the readMe file, but I am still facing linking errors.
I tried the hello_tf_win_c_api example and I get the follwoing error in visual studio 2017 : <cannot open source file "c_api.h"> and also : <identifier "TF_Version" is undefined>.

I followed all the steps including adding the path to tensorflow.dll to the PATH environment variable.

I am on windows 10, visual studio 2017.

Problem solved! It was just a dumb mistake, I needed to use x64 and Release mode. Thanks for the tutorials!

Windows GPU library

From the CMakeLists.txt file I found that the library only provides CPU C API

Do you have the GPU version of windows C API library?

Multiple models inference

Hi, thanks for repo, it was very useful and I was able to successfuly load model and predict with it. Now i'm looking for a way to predict using several models. Is it possible to load second model using C API and use both of them for prediction?

Running target operation for training in C++

Hello! Before I proceed - I would like to thank you for this excellent repository. I have managed to set up an inference ecosystem in C++ with far less pain than I imagined. I am currently attempting to run a very simple training operation in C++ and was having some trouble.

I started by looking at the function definition of TF_SessionRun() which is

void TF_SessionRun(TF_Session* session, const TF_Buffer* run_options,
                   const TF_Output* inputs, TF_Tensor* const* input_values,
                   int ninputs, const TF_Output* outputs,
                   TF_Tensor** output_values, int noutputs,
                   const TF_Operation* const* target_opers, int ntargets,
                   TF_Buffer* run_metadata, TF_Status* status)

Accordingly I devised the following code to run my training step

// Some tensorflow pointer requirements
TF_Status* status_ = TF_NewStatus();
TF_SessionOptions* options_ = TF_NewSessionOptions();
TF_Session* sess_ = TF_NewSession(graph_, options_, status_);

// This is where the code breaks
const TF_Operation* const* train_ = TF_GraphOperationByName(graph_, "train_step"); 
// Inputs and targets
TF_Tensor* output_tensor_ = tf_utils::CreateTensor(TF_FLOAT,
                                                  output_dims.data(), output_dims.size(),
                                                  target_vals.data(), target_vals.size() * sizeof(float));
TF_Tensor* input_tensor_ = tf_utils::CreateTensor(TF_FLOAT,
                                                  input_dims.data(), input_dims.size(),
                                                  input_vals.data(), input_vals.size() * sizeof(float));
TF_SessionRun(sess_, nullptr, // Run options.
	                &input_op_, &input_tensor_, 1, // Input tensors, input tensor values, number of inputs.
	                &out_op_, &output_tensor_, 1, // Output tensors, output tensor values, number of outputs.
	                train_, 1, // Target operations, number of targets.
	                nullptr, // Run metadata.
	                status_ // Output status.
	                );
printf("Performing backpropagation within C++");

Now I know that that TF_GraphOperationByName returns a pointer to a TF_Operation but the target operation in the TF_SessionRun function asks for const TF_Operation* const* (a pointer to a constant pointer to a constant TF_Operation by my reckoning. Do you have any ideas/examples on how to tackle this issue?

This would be super helpful! Thanks :)

Why C api runs much slower than Python ?

I got the same result in python and c api. But the speed is 10x slower.
What is the difference between them?
thanks

What is this actually doing?

From the code, it seems that it only reads the Graph definition and then runs data through. However, I do not see loading pretrained weights anywhere. Am I mistaken? Is there a place to find code for reading pretrained weights?

image prediction

how to load images and prediction？Thx

Inference is running very slow on CPU

`
// Load graph
auto graph = tf_utils::LoadGraph("C:\Users\Soumya Mohanty\Documents\Dev\TensorflowApplication\frozen_inference_graph_10K_Steps.pb");

SCOPE_EXIT{ tf_utils::DeleteGraph(graph); };
if (graph == nullptr) {
//std::cout << "Can't load graph" << std::endl;
}
else {
//std::cout << "Graph loaded successfully" << std::endl;
}

// Setup session to run inference
auto session = tf_utils::CreateSession(graph);
SCOPE_EXIT{ tf_utils::DeleteSession(session); }; // Auto-delete on scope exit.
if (session == nullptr) {
	std::cout << "Can't create session" << std::endl;
}
else{
	std::cout << "Session setup successfully " << std::endl;
}

cv::Mat image = stack[0]; // Taking one image
int num_dims = 4;
std::int64_t input_dims[4] = { 1, image.rows, image.cols, 3 }; //1 is number of batch, and 3 is the no of channels.
int num_bytes_in = image.cols * image.rows * 3; //3 is the number of channels.
// Input tensor, and assign node that will accept input
const std::vector<TF_Output> input_ops = { {TF_GraphOperationByName(graph, "image_tensor"), 0} };
//const std::vector<TF_Tensor*> input_tensors = { tf_utils::CreateTensor(TF_UINT8, input_dims, num_dims, image.data, num_bytes_in) }; // tenso with data
const std::vector<TF_Tensor*> input_tensors = { tf_utils::CreateEmptyTensor(TF_UINT8, input_dims, num_dims, num_bytes_in) }; // Doen not have image data yet
SCOPE_EXIT{ tf_utils::DeleteTensors(input_tensors); }; // Auto-delete on scope exit.


// Output tensor, and assign node to get output from
std::vector<TF_Output> out_ops;
std::vector<TF_Tensor*> output_tensors;

out_ops.push_back({ TF_GraphOperationByName(graph, "num_detections"), 0 });
output_tensors.push_back(nullptr);

out_ops.push_back({ TF_GraphOperationByName(graph, "detection_classes"), 0 });
output_tensors.push_back(nullptr);

out_ops.push_back({ TF_GraphOperationByName(graph, "detection_boxes"), 0 });
output_tensors.push_back(nullptr);

out_ops.push_back({ TF_GraphOperationByName(graph, "detection_scores"), 0 });
output_tensors.push_back(nullptr);

SCOPE_EXIT{ tf_utils::DeleteTensors(output_tensors); }; // Auto-delete on scope exit.


for (int i = 0; i < stack.size(); i++) {
	cv::Mat image = stack[i]; // Taking one image
	//std::cout << "rows -  " << image.rows << " cols-  " << image.cols << " channels = " << image.channels() << std::endl;
	
	tf_utils::SetTensorData(input_tensors[0], image.data, num_bytes_in);

	// Run the session
	auto code = tf_utils::RunSession(session, input_ops, input_tensors, out_ops, output_tensors);

	if (code == TF_OK) {

		float* num_detections = (float*)TF_TensorData(output_tensors[0]);
		float* detection_classes = (float*)TF_TensorData(output_tensors[1]);
		float* boxes = (float*)TF_TensorData(output_tensors[2]);
		float* scores = (float*)TF_TensorData(output_tensors[3]);

		int number_detections = (int)num_detections[0];
		int box_cnt = 0;

		frame_detection_boxes frameData; // Will hold all the data for this frame
		frameData.frameNo = i; // set frame number

		//std::cout << "Frame No: " << i << std::endl;
		for (int i = 0; i < number_detections; i++) {
			if (scores[i] >= 0.5) {
				int xmin = (int)(boxes[i * 4 + 1] * image.cols);
				int ymin = (int)(boxes[i * 4 + 0] * image.rows);
				int xmax = (int)(boxes[i * 4 + 3] * image.cols);
				int ymax = (int)(boxes[i * 4 + 2] * image.rows);
				//std::cout << "Box_" << box_cnt << "(" << scores[i] << ", " << detection_classes[i] << "): [" << xmin << ", " << ymin << ", " << xmax << ", " << ymax << "]" << std::endl;
				box_cnt++;

				
				if (detection_classes[i] == 1) { // Guide-Wire
					frameData.guideWire_raw.push_back({ (float)xmin, (float)ymin, (float)xmax, (float)ymax }); //Convert to float for non-max supression and other operations 
				}
				else if (detection_classes[i] == 2) { // Strut
					frameData.strut_raw.push_back({ (float)xmin, (float)ymin, (float)xmax, (float)ymax });
				}
			}
		}
		resultVec.push_back(frameData);
	}
	else {
		std::cout << "Error run session TF_CODE: " << code;
	}

}`

I have a I9 CPU and inference is running very slowly.
Any suggestions?

How to set session option?

Hi,how to set the session options using tensorflow c api? for example, how to set the gpu_memory_fraction? how to set the gpu number my program use?

Specify device

Thanks for this it is very helpful. Is it possible to specify a specific device for the graph (ie. gpu:0) as per the c++ api?

read checkpoint instead of pb file

I know I can freeze the checkpoint and get pb model. But there is a possibility the frozen procure can be wrong (I read in somewhere else).

The case I have is I trained a model in python-tf, I froze it into .pb. It is not able to be loaded by TF_GraphImportGraphDef(graph, buffer, opts, status). No error message, and the model visualized by netron looks ok for me. I prepared the model in OneDrive: checkpoint, meta, index and pb.
OneDrive link

Try:

export the model directly into .pb;
read checkpoint via tf_c api.

thanks.

how to turn off verbose and idle threads?

hi, we are using this for the NIST frvt submission and have the following issues:

Need to run on single thread (Tensorflow spins-up multi-threads according to CPU number)
Need to turn off all debugging message (i.e.to run quietly, it should not write messages to “standard output”)

for issue 1. we already tried:
std::array<std::uint8_t, 13> config = {{ 0x0a ,0x07, 0x0a, 0x03, 0x43, 0x50, 0x55, 0x10, 0x01, 0x10, 0x01, 0x28, 0x01}};
TF_SetConfig(options, config.data(), config.size(), status);
But still we get 9 threads (7 idle 2 running), and NIST only permit 2 threads:
[WARNING] We've detected that your software may be threading or using other multiprocessing techniques during template creation. The number of threads detected was 9 and it should be 2. Per the API document, implementations must run single-threaded. In the test environment, there is no advantage to threading, because NIST will distribute workload across multiple blades and multiple processes. We highly recommend that you fix this issue prior to submission.
(NIST uses the command: "top -H -b -n1 | grep validate11" to monitor thread usage, validate11 is the program task's name):
[joytsay@localhost ~]$ top -H -b -n1 | grep validate11
30300 joytsay 20 0 600744 233972 11060 R 70.6 1.5 3:51.85 validate11
30469 joytsay 20 0 600744 233972 11060 S 35.3 1.5 0:09.37 validate11
30220 joytsay 20 0 480640 240168 30444 S 0.0 1.5 0:00.78 validate11
30266 joytsay 20 0 480640 240168 30444 S 0.0 1.5 0:00.00 validate11
30463 joytsay 20 0 600744 233972 11060 S 0.0 1.5 0:00.00 validate11
30466 joytsay 20 0 600744 233972 11060 S 0.0 1.5 0:00.00 validate11
30467 joytsay 20 0 600744 233972 11060 S 0.0 1.5 0:00.00 validate11
30468 joytsay 20 0 600744 233972 11060 S 0.0 1.5 0:00.00 validate11

as for issue 2. we can't find where to turn off messages produced by Tensorflow:
2020-03-02 17:25:32.458659: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-03-02 17:25:32.458815: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1c27740 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-02 17:25:32.458846: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Any suggestions will help, thank you!

TF C++ Wrapper

Add with C++ wrappers over TF functions C API
Status, Graph, Tensor, Session, ...

what's the relation between c++ client session and c_api TF_Session and core/public/session in tensorflow source code?

This is a question mentioned in stackoverflow.

I am also confused, when people use Session* session, while TF_session* session is used here.

The question comes to here: why do we need a tensorflow.dll

what's the relation between c++ client session and c_api TF_Session and core/public/session in
tensorflow source code? I'm reading the source code of tensorflow, but I'm confused to found
that there are many parts about session,

1. tensorflow/c/c_api_internal.cc defined TF_Session
2. tensorflow/c/c_api_test.cc defined a class CSession
3. tensorflow/cc/client/client_session.cc
4. tensorflow/core/public/session.h

then what's exactly a 'session'?Where's the concrete 'run' function of session?

Create Tensor of strings (vector<char*>) with C API

How to create Tensor of strings?
Maybe somebody did this task, but I found just for creating Tensor of single string:
How to create a string type tensor in tensorflow C api?
“Hello TensorFlow!” using the C API

unresolved external symbol "struct TF_Graph *

Hi.

I have posted several questions, sorry for that. I am trying to solve them by myself.

After I can build the code, I created a new cpp file (simply copy-paste hello_tf.cpp):

#if defined(_MSC_VER) && !defined(COMPILER_MSVC)
#  define COMPILER_MSVC // Set MSVC visibility of exported symbols in the shared library.
#endif

#if defined(_MSC_VER)
#  pragma warning(push)
#  pragma warning(disable : 4996)
#  pragma warning(disable : 4190)
#endif

#include <c_api.h> // TensorFlow C API header
#include <iostream>
#include "tf_utils.hpp"
#include <vector>
#include <string>

int main() {
  std::cout << "TensorFlow Version: " << TF_Version() << std::endl;
  TF_Graph* graph = tf_utils::LoadGraph("graph.pb");
  if (graph == nullptr) {
    std::cout << "Can't load graph" << std::endl;
    return 1;
  }

  return 0;
}

#if defined(_MSC_VER)
#  pragma warning(pop)
#endif

It complains about segnet.obj : error LNK2019: unresolved external symbol "struct TF_Graph * __cdecl tf_utils::LoadGraph(char const *)"

I am not sure where is the error.

Thank you in advance.

graph_info crah

Hi,

graph_info crashs this way:

21: Preprocessor/map/TensorArray_1 type: TensorArrayV3 device:  number inputs: 1 number outputs: 2
Number inputs: 1
0 type : TF_INT32
Number outputs: 2
0 type : TF_RESOURCE dims: 1 [2]
1 type : TF_FLOAT dims: 0 []

22: Preprocessor/map/while/Enter type: Enter device:  number inputs: 1 number outputs: 1
Number inputs: 1
0 type : TF_INT32
Number outputs: 1
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Annullato (core dump creato)

with the model here:

https://github.com/yeephycho/tensorflow-face-detection/tree/master/model

here is the backtrace:

#0  0x00007ffff0f79d7f in raise () from /usr/lib/libc.so.6
#1  0x00007ffff0f64672 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff132e58e in __gnu_cxx::__verbose_terminate_handler ()
    at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff1334dfa in __cxxabiv1::__terminate (handler=<optimized out>)
    at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007ffff1334e57 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007ffff13350ac in __cxxabiv1::__cxa_throw (obj=<optimized out>, 
    tinfo=0x7ffff1425d10 <typeinfo for std::bad_alloc>, dest=0x7ffff13331e0 <std::bad_alloc::~bad_alloc()>)
    at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007ffff135e748 in std::__throw_bad_alloc ()
    at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/exception.h:63
#7  0x0000555555559836 in __gnu_cxx::new_allocator<long>::allocate(unsigned long, void const*) ()
#8  0x0000555555559719 in std::allocator_traits<std::allocator<long> >::allocate(std::allocator<long>&, unsigned long)
    ()
#9  0x000055555555953a in std::_Vector_base<long, std::allocator<long> >::_M_allocate(unsigned long) ()
#10 0x00005555555593dd in std::_Vector_base<long, std::allocator<long> >::_M_create_storage(unsigned long) ()
#11 0x000055555555921f in std::_Vector_base<long, std::allocator<long> >::_Vector_base(unsigned long, std::allocator<long> const&) ()
#12 0x000055555555907a in std::vector<long, std::allocator<long> >::vector(unsigned long, std::allocator<long> const&)
    ()
#13 0x0000555555558987 in PrintOpOutputs(TF_Graph*, TF_Operation*) ()
#14 0x0000555555558cfb in PrintOp(TF_Graph*) ()
#15 0x0000555555558d87 in main ()

I would like to run a session with this model, input tensor name is image_tensor, and these are the output tensors

detection_boxes
detection_scores
detection_classes
num_detections

do you have any suggestions? thanks

3D input to model returns different output than python

Hello,
I have a 3D image segmentation model. It takes 4 images at a time as input. input shape is (1,4,256,512,1). Where 1 is batch size 4 is number of images at a time, and image dimensions.
All images are single channel images.

To feed the input data I do the following:

for (int frame = 0; frame < stack.size() - depth + 1; frame++) {
std::vectorcv::Mat lumen_images(depth); // 4 images at a time
for (int num = 0; num < 4; num++) {
int frame_num = frame + num;
cv::Mat img_resized;
cv::Mat img = stack[frame_num];
cv::resize(img, img_resized, cv::Size(512, 256), 0, 0, cv::INTER_LINEAR);
lumen_images[num] = img_resized;
}

		cv::Mat lumen_input[4] = { lumen_images[0], lumen_images[1], lumen_images[2], lumen_images[3] };

		const std::vector<std::int64_t> dims = { 1, 4, 256, 512, 1 };
		const auto data_size = std::accumulate(dims.begin(), dims.end(), sizeof(float), std::multiplies<std::int64_t>{});
		auto lumendata = static_cast<float*>(std::malloc(data_size));
		//std::cout << "datasize " << data_size << std::endl;
		for (int i = 0; i < 4; i++) {
			for (int j = 0; j < 256; j++) {
				for (int k = 0; k < 512; k++) {
					//std::cout << i << " " << j << " " << k << " " << lumen_input[i].at<unsigned char>(j, k);//(float) 1.0 * ((unsigned char *)lumen_input[i].data)[j * 512 + k];
					lumendata[i*(256 * 512) + j * 512 + k] = (float) 1.0*lumen_input[i].at<float>(j, k);
				}
			}
		}

		// Updating input tensor with current iteration data
		tf_utils::SetTensorData(input_tensors[0], lumendata, num_bytes_in);

		// Run the session
		auto code = tf_utils::RunSession(session, input_ops, input_tensors, out_ops, output_tensors);
		//std::cout << "Code = " << code << std::endl;
		if (code == TF_OK) {
			//size_t output_size = TF_TensorByteSize(output_tensors[0]) / TF_DataTypeSize(TF_FLOAT);
			float* model_oputput = (float*)TF_TensorData(output_tensors[0]);
			for (int frameOffset = 0; frameOffset < 4; frameOffset++) {
				cv::Mat outputslice(cv::Size(512, 256), CV_32F);
				for (int i = 0; i < 256; i++) {
					for (int j = 0; j < 512; j++) {
						outputslice.at<float>(i, j) = model_oputput[512 * 256 * frameOffset + 512 * i + j];
					}
				}
				cv::namedWindow("Display window", cv::WINDOW_AUTOSIZE);
				cv::imshow("Display window", outputslice);                   
				cv::waitKey(0);

				predictionsPerSlice[frame + frameOffset].push_back(outputslice);
			}
		}
		else {
			std::cout << "Code = " << code << std::endl;
		}
		lumen_images.clear(); // Clearing the vector
	}

But the output slice is different from the output I get using python and the same weights.

I made sure that after pre-processing the image, the pixel values of the input stack were exactly the same as that in python. But the same weights yield different results.

What am I missing? kindly help.

Soumya

TF_SessionRun with multiple outputs gives Segmentation Fault

i clone this branch https://github.com/Neargye/hello_tf_c_api/tree/45c94e65fd80dd5715676f29baa15eb7c35f1f82
my model has two output, so modify the tf_image::TF_Model::loadModel().
when i save the infer result that got a Segmentation Fault, but just a output is normal.
i find the same problem at [https://github.com/tensorflow/tensorflow/issues/22670] , i dont know how to resolve it...
Any suggest?

TF_INVALID_ARGUMENT

When I run session_run.cpp, it occurs an error "TF_INVALID_ARGUMENT".
What's wrong?

How to built with mkl support?

Is there any way that the we can build the library with mkl support using cmake?

how to compile tensorflow.dll

Hi.
I am not sure if I am asking a silly question: how can I compile the tensorflow source into a dll?

I googled for a whole and I tried:

System:

Python: python 3.7
Compiler: Visual Studio 2019
Tensorflow: r1.6 and r1.13

cmake .. -G"Visual Studio 16 2019" -Thost=x64 ^
-DCMAKE_BUILD_TYPE=Release ^
-Dtensorflow_VERBOSE=ON ^
-Dtensorflow_ENABLE_GRPC_SUPPORT=OFF ^
-Dtensorflow_BUILD_PYTHON_BINDINGS=OFF ^
-Dtensorflow_BUILD_CC_EXAMPLE=OFF ^
-Dtensorflow_BUILD_SHARED_LIB=ON ^
-Dtensorflow_ENABLE_GPU=ON ^
-DPYTHON_EXECUTABLE=C:\Users\wang\AppData\Local\Programs\Python\Python37\python.exe ^
-Dtensorflow_BUILD_SHARED_LIB=ON ^ 
-DCUDA_HOST_COMPILER="C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\cl.exe"

Then, I build

MSBuild /p:Configuration=Release /verbosity:detailed tensorflow.vcxproj

There are errors. I have no clues about where to start to fix.

cuda_driver.cc:175] Check failed

hi,

My environment is:
Windows 10, VS2017, GPU 2080Ti, Driver Version: 442.50, CUDA Version: cuda_10.1.243_426.00_win10, cudnn-10.1-windows10-x64-v7.6.3.30.

I have a trained weights model of deep learning neural network. The model could be correctly run on my computer on tensorflow-gpu 1.14.0 with GPU driver, CUDA, and cudnn.
On the same computer. I could run model correctly on CPU with C++ and without GPU. My C++ code is almost the same as the code in interface.cpp file in Neargye's git repo.
The upper 2 experiments prove that my GPU dirver, CUDA and cudnn are compatible and the model/C++ code work well as well. Tensorflow-gpu 1.14.0 is also compatible with my GPU. My next step is to run the C++ code with GPU support, but I encountered the upper issues.

Do you have more ideas about the issue I am facing?
I am doubting whether the GPU dll 1.14.0 (https://github.com/Neargye/tensorflow/releases) supports my GPU(2080Ti, Driver Version: 442.50, CUDA Version: 10.1)

2020-03-11 15:04:53.921472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
2020-03-11 15:04:53.928998: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-11 15:04:53.933142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-03-11 15:04:54.058685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-11 15:04:54.063230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2020-03-11 15:04:54.067249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2020-03-11 15:04:54.070680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5632 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-11 15:04:57.051935: F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: invalid argument```C

Thanks,
Ardeal

Why GPU is slower than CPU

hello i tried this project .I have trained a lstm mode and i load the model to predict my data.I find the GPU is slowly than CPU.
gpu:nvidia TITAN X
CUDA：9.0
cudnn:7.0
cpu: intel E5
i predicted about 200 data and every data call the function ：
TF_SessionRun(sess,
nullptr, // Run options.
&input_op, &input_tensor, 1, // Input tensors, input tensor values, number of inputs.
&out_op, &output_tensor, 1, // Output tensors, output tensor values, number of outputs.
nullptr, 0, // Target operations, number of targets.
nullptr, // Run metadata.
status // Output status.
);
every time GPU is slowly than CPU
Is my method wrong?Is there a way to increase the speed? and can I enter data in batches for prediction?
thanks

incorrect check in GetTensorData

Should be || instead of && here:

hello_tf_c_api/src/tf_utils.hpp

Line 95 in 48ecd5f

if (data == nullptr && size <= 0) {

Nice project, thanks!

neargye / hello_tf_c_api Goto Github PK

hello_tf_c_api's People

Contributors

Stargazers

Watchers

Forkers

hello_tf_c_api's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs