GithubHelp home page GithubHelp logo

java.lang.UnsatisfiedLinkError: dlopen failed: library "/Users/junchenzhao/Dist-CPU-Learn/android/distributed_inference_demo/test1/src/main/cpp/lib/libonnxruntime4j_jni.so" not found about onnxruntime-training-examples HOT 12 CLOSED

microsoft avatar microsoft commented on May 20, 2024
java.lang.UnsatisfiedLinkError: dlopen failed: library "/Users/junchenzhao/Dist-CPU-Learn/android/distributed_inference_demo/test1/src/main/cpp/lib/libonnxruntime4j_jni.so" not found

from onnxruntime-training-examples.

Comments (12)

baijumeswani avatar baijumeswani commented on May 20, 2024

What is the reason behind using the `libonnxruntime4_jni.so

If you're intending to use java bindings in your application, you could directly leverage the ort Java bindings using the aar file here: https://central.sonatype.com/artifact/com.microsoft.onnxruntime/onnxruntime-training-android/1.15.1

I'll try to see if I can reproduce the error on my end.

from onnxruntime-training-examples.

baijumeswani avatar baijumeswani commented on May 20, 2024

I just tried it on my end and was able to successfully build the application. Not sure what might be going on with your env.

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

Hi Baiju,

I have solved this issue just this morning and forget to update this issue. My way of solving it is creating an folder under the directory src/main/ like what's shown below:

image

then load the native libonnxruntime4j_jni.so library in Mainactivity.kt like this:

    companion object {
        // Used to load the 'ortpersonalize' library on application startup.
        init {
            System.loadLibrary("onnxruntime4j_jni")
            System.loadLibrary("distributed_inference_demo")
        }
    }

The reason behind using it is we want to convert java OnnxTensor to C++ Ort::Value tensor.

Currently, we are deploying quantized llm to android device using onnx ~ around 1.2gb per model. However, if we directly create session using the onnxruntime in java environment, it will directly result in out of memory Error due to the jvm memory limitation issue. So, we used the android NDK with onnxruntime C++ API for loading the model and conducting inference to solve this out of memory Error.

New Issue - Tensor Conversion between Java and C++

We tried to conduct inference directly in the C++ backend by creating a random int64 datatype Ort::Value tensor and it was successfully done. However, the problems are:

  1. The input value should be a OnnxTensor created in java end which needs the onnxruntime package in java and should be converted into C++ Ort::Value tensor for conducting inference in the C++ backend.
  2. The result Ort::Value tensor from C++ backend is needed to be converted back to OnnxTensor in Java.

We are wondering whether or not there is an API that allows us to do this automatically, or we have to write our own API for doing this?

Please let me know.

Thanks a lot!

Best,
Junchen

from onnxruntime-training-examples.

baijumeswani avatar baijumeswani commented on May 20, 2024

Hi Junchen.

I am not aware of any API that does this directly. But you can refer to how we do it for our train step function in the java bindings:

  1. The java bindings
  2. The corresponding jni code

Let me know if there are any further questions.

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

Thanks Baiju. We have fixed the issue. I will close this issue.

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

Hi Baiju,

I have a further question wishing to ask you if it's possible. I'm currently trying to serialize and deserialize the onnx model output in C++ which is a std::vector<Ort::Value> object. Currently, I have not found any related C++ API in onnxruntime. If it's possible, may I know whether there are approaches of doing so?

from onnxruntime-training-examples.

baijumeswani avatar baijumeswani commented on May 20, 2024

Hi Junchen,

You could get the underlying float buffer from each of the Ort::Values inside the vector. Then you can chose to serialize it however you wish.

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

Hi Baiju,

Thanks a lot for your help!

I have successfully serialized the tensor vector using float buffer.

Here is my approach of serializing the std::vector<Ort::Value>:

std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
        std::vector<char> bytes;

        size_t numTensors = tensors.size();
        const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
        bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));

        for (const auto& tensor : tensors) {
            if (!tensor.IsTensor()) {
                std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
                continue;
            }

            const float* floatArr = tensor.GetTensorData<float>();
            Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
            size_t elementCount = info.GetElementCount();

            // Get the shape of the tensor
            std::vector<int64_t> shape = info.GetShape();
            size_t numDimensions = shape.size();

            const char* elementCountPtr = reinterpret_cast<const char*>(&elementCount);
            bytes.insert(bytes.end(), elementCountPtr, elementCountPtr + sizeof(size_t));

            const char* tensorDataPtr = reinterpret_cast<const char*>(floatArr);
            bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));

            // Write the number of dimensions to the bytes
            const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
            bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));

            // Write each dimension to the bytes
            for (int64_t dimension : shape) {
                const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
                bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
            }
        }

        return bytes;
    }

I encountered an issue during the deserialization process of the model's output vector, which contains multiple Ort::Values with different data types. To deserialize the data, I need to recreate the Ort::Values using Ort::Value::CreateTensor based on the tensorType information that was serialized. However, the tensorType values come from TensorProto, and I need to convert them to their corresponding C++ data types like int8, uint8, etc., for creating the Ort::Values.

At the moment, I'm not sure if there's an alternative way to handle this issue, or if I need to manually handle the data types during deserialization. I'm providing the code snippet of my current deserialization implementation where I'm facing this problem

  std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
      std::vector<Ort::Value> tensors;

      const char* dataPtr = bytes.data();
      const char* endPtr = bytes.data() + bytes.size();

      if (endPtr - dataPtr < sizeof(size_t)) {
          std::cerr << "Not enough data to deserialize." << std::endl;
          return tensors;
      }

      size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
      dataPtr += sizeof(size_t);

      for (size_t i = 0; i < numTensors; ++i) {
          if (endPtr - dataPtr < sizeof(size_t)) {
              std::cerr << "Not enough data to deserialize tensor." << std::endl;
              return tensors;
          }

          size_t elementCount = *reinterpret_cast<const size_t*>(dataPtr);
          dataPtr += sizeof(size_t);

          if (endPtr - dataPtr < elementCount * sizeof(float)) {
              std::cerr << "Not enough data to deserialize tensor data." << std::endl;
              return tensors;
          }

          const float* floatArr = reinterpret_cast<const float*>(dataPtr);
          dataPtr += elementCount * sizeof(float);

          if (endPtr - dataPtr < sizeof(size_t)) {
              std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
              return tensors;
          }

          size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
          dataPtr += sizeof(size_t);

          if (endPtr - dataPtr < numDimensions * sizeof(int64_t)) {
              std::cerr << "Not enough data to deserialize tensor shape." << std::endl;
              return tensors;
          }

          std::vector<int64_t> shape(numDimensions);
          for (size_t j = 0; j < numDimensions; ++j) {
              shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
              dataPtr += sizeof(int64_t);
          }

          Ort::AllocatorWithDefaultOptions allocator;

          // ? indicates where I'm currently stocking with
          Ort::Value tensor = Ort::Value::CreateTensor< ? >(allocator, shape.data(), shape.size());
          float* tensorData = tensor.GetTensorMutableData< ? >();
          std::copy(floatArr, floatArr + elementCount, tensorData);

          tensors.push_back(std::move(tensor));
      }

      return tensors;
  }

Any suggestions or insights on how to efficiently handle the data types during deserialization would be greatly appreciated. Thank you!

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

I think I have solved the issue:

Here is my updated code:

For serialization:

std::vector<char> SerializeTensorVectorToBytes(const std::vector<Ort::Value>& tensors) {
    std::vector<char> bytes;

    size_t numTensors = tensors.size();
    const char* dataPtr = reinterpret_cast<const char*>(&numTensors);
    bytes.insert(bytes.end(), dataPtr, dataPtr + sizeof(size_t));

    for (const auto& tensor : tensors) {
        if (!tensor.IsTensor()) {
            std::cerr << "Skipping non-tensor Ort::Value." << std::endl;
            continue;
        }

        Ort::TensorTypeAndShapeInfo info = tensor.GetTensorTypeAndShapeInfo();
        size_t elementCount = info.GetElementCount();

        // Record the current size of bytes to calculate the size of the added data
        size_t initialSize = bytes.size();

        // Write the tensor type to the bytes
        ONNXTensorElementDataType tensorType = info.GetElementType();
        const char* tensorTypePtr = reinterpret_cast<const char*>(&tensorType);
        bytes.insert(bytes.end(), tensorTypePtr, tensorTypePtr + sizeof(ONNXTensorElementDataType));


        // Get the shape of the tensor
        std::vector<int64_t> shape = info.GetShape();
        size_t numDimensions = shape.size();


        // Write the number of dimensions to the bytes
        const char* numDimensionsPtr = reinterpret_cast<const char*>(&numDimensions);
        bytes.insert(bytes.end(), numDimensionsPtr, numDimensionsPtr + sizeof(size_t));

        // Write each dimension to the bytes
        for (int64_t dimension : shape) {
            const char* dimensionPtr = reinterpret_cast<const char*>(&dimension);
            bytes.insert(bytes.end(), dimensionPtr, dimensionPtr + sizeof(int64_t));
        }

        size_t elementSize;
        // Write the tensor data to the bytes
        switch (tensorType) {
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
                const float* tensorData = tensor.GetTensorData<float>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(float));
                elementSize = sizeof(float);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
                const int8_t* tensorData = tensor.GetTensorData<int8_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int8_t));
                elementSize = sizeof(int8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
                const uint8_t* tensorData = tensor.GetTensorData<uint8_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint8_t));
                elementSize = sizeof(uint8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
                const uint16_t* tensorData = tensor.GetTensorData<uint16_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint16_t));
                elementSize = sizeof(uint16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
                const int16_t* tensorData = tensor.GetTensorData<int16_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int16_t));
                elementSize = sizeof(int16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
                const int32_t* tensorData = tensor.GetTensorData<int32_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int32_t));
                elementSize = sizeof(int32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
                const int64_t* tensorData = tensor.GetTensorData<int64_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(int64_t));
                elementSize = sizeof(int64_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
                const bool* tensorData = tensor.GetTensorData<bool>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(bool));
                elementSize = sizeof(bool);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
                const double* tensorData = tensor.GetTensorData<double>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(double));
                elementSize = sizeof(double);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
                const uint32_t* tensorData = tensor.GetTensorData<uint32_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint32_t));
                elementSize = sizeof(uint32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
                const uint64_t* tensorData = tensor.GetTensorData<uint64_t>();
                const char* tensorDataPtr = reinterpret_cast<const char*>(tensorData);
                bytes.insert(bytes.end(), tensorDataPtr, tensorDataPtr + elementCount * sizeof(uint64_t));
                elementSize = sizeof(uint64_t);
                break;
            }
            default:
                std::cerr << "Unsupported tensor type for serialization: " << tensorType << std::endl;
                break;
        }


        // Calculate the expected size
        size_t expectedSize = sizeof(ONNXTensorElementDataType) // size of the tensor type
                              + sizeof(size_t) // size of the number of dimensions
                              + (sizeof(int64_t) * numDimensions) // size of the tensor shape
                              + (elementSize * elementCount); // size of the tensor data

        // Verify the total size of the serialized data for each tensor
        size_t actualSize = bytes.size() - initialSize;  // size of the added data for the current tensor
        if (actualSize != expectedSize) {
            std::cerr << "Error: Serialized tensor size (" << actualSize
                      << ") does not match expected size (" << expectedSize << ")." << std::endl;
        }
    }
    return bytes;
}

For Deserialization:

std::vector<Ort::Value> DeserializeTensorVectorFromBytes(const std::vector<char>& bytes) {
    std::vector<Ort::Value> tensors;

    const char* dataPtr = bytes.data();

    size_t numTensors = *reinterpret_cast<const size_t*>(dataPtr);
    dataPtr += sizeof(size_t);

    for (size_t i = 0; i < numTensors; ++i) {
        ONNXTensorElementDataType tensorType = *reinterpret_cast<const ONNXTensorElementDataType*>(dataPtr);
        dataPtr += sizeof(ONNXTensorElementDataType);

        size_t numDimensions = *reinterpret_cast<const size_t*>(dataPtr);
        dataPtr += sizeof(size_t);

        std::vector<int64_t> shape(numDimensions);
        size_t elementCount = 1;
        for (size_t j = 0; j < numDimensions; ++j) {
            shape[j] = *reinterpret_cast<const int64_t*>(dataPtr);
            dataPtr += sizeof(int64_t);
            elementCount *= shape[j];
        }

        Ort::AllocatorWithDefaultOptions allocator;
        Ort::Value tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);

        switch (tensorType) {
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: {
                tensor = Ort::Value::CreateTensor<float>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<float>(), dataPtr, elementCount * sizeof(float));
                dataPtr += elementCount * sizeof(float);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: {
                tensor = Ort::Value::CreateTensor<int8_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int8_t>(), dataPtr, elementCount * sizeof(int8_t));
                dataPtr += elementCount * sizeof(int8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: {
                tensor = Ort::Value::CreateTensor<uint8_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint8_t>(), dataPtr, elementCount * sizeof(uint8_t));
                dataPtr += elementCount * sizeof(uint8_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: {
                tensor = Ort::Value::CreateTensor<uint16_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint16_t>(), dataPtr, elementCount * sizeof(uint16_t));
                dataPtr += elementCount * sizeof(uint16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: {
                tensor = Ort::Value::CreateTensor<int16_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int16_t>(), dataPtr, elementCount * sizeof(int16_t));
                dataPtr += elementCount * sizeof(int16_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: {
                tensor = Ort::Value::CreateTensor<int32_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int32_t>(), dataPtr, elementCount * sizeof(int32_t));
                dataPtr += elementCount * sizeof(int32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: {
                tensor = Ort::Value::CreateTensor<int64_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<int64_t>(), dataPtr, elementCount * sizeof(int64_t));
                dataPtr += elementCount * sizeof(int64_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: {
                tensor = Ort::Value::CreateTensor<bool>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<bool>(), dataPtr, elementCount * sizeof(bool));
                dataPtr += elementCount * sizeof(bool);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: {
                tensor = Ort::Value::CreateTensor<double>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<double>(), dataPtr, elementCount * sizeof(double));
                dataPtr += elementCount * sizeof(double);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: {
                tensor = Ort::Value::CreateTensor<uint32_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint32_t>(), dataPtr, elementCount * sizeof(uint32_t));
                dataPtr += elementCount * sizeof(uint32_t);
                break;
            }
            case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: {
                tensor = Ort::Value::CreateTensor<uint64_t>(allocator, shape.data(), numDimensions);
                std::memcpy(tensor.GetTensorMutableData<uint64_t>(), dataPtr, elementCount * sizeof(uint64_t));
                dataPtr += elementCount * sizeof(uint64_t);
                break;
            }
            default:
                std::cerr << "Unsupported tensor type for deserialization: " << tensorType << std::endl;
                break;
        }

        tensors.push_back(std::move(tensor));
    }

    return tensors;
}

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

May I send a PR to update Onnx codebase for updating the functionality since I think some other people may need it in the future?

from onnxruntime-training-examples.

zjc664656505 avatar zjc664656505 commented on May 20, 2024

Also, my current approach is not supporting the bfloat16, std::string, 'complex64', complex32 and float16. I may add them later on.

from onnxruntime-training-examples.

wschin avatar wschin commented on May 20, 2024

This issue has no update for 6+ months. Please let me close it and feel free to reopen.

from onnxruntime-training-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.