ankane / onnxruntime-ruby Goto Github PK

View Code? Open in Web Editor NEW

92.0 5.0 5.0 14.08 MB

Run ONNX models in Ruby

License: MIT License

Ruby 98.90% Python 1.10%

onnxruntime-ruby's Introduction

ONNX Runtime Ruby

🔥 ONNX Runtime - the high performance scoring engine for ML models - for Ruby

Check out an example

Installation

Add this line to your application’s Gemfile:

gem "onnxruntime"

Getting Started

Load a model and make predictions

model = OnnxRuntime::Model.new("model.onnx")
model.predict({x: [1, 2, 3]})

Download pre-trained models from the ONNX Model Zoo

Get inputs

model.inputs

Get outputs

model.outputs

Get metadata

model.metadata

Load a model from a string or other IO object

io = StringIO.new("...")
model = OnnxRuntime::Model.new(io)

Get specific outputs

model.predict({x: [1, 2, 3]}, output_names: ["label"])

Session Options

OnnxRuntime::Model.new(path_or_io, {
  enable_cpu_mem_arena: true,
  enable_mem_pattern: true,
  enable_profiling: false,
  execution_mode: :sequential,    # :sequential or :parallel
  free_dimension_overrides_by_denotation: nil,
  free_dimension_overrides_by_name: nil,
  graph_optimization_level: nil,  # :none, :basic, :extended, or :all
  inter_op_num_threads: nil,
  intra_op_num_threads: nil,
  log_severity_level: 2,
  log_verbosity_level: 0,
  logid: nil,
  optimized_model_filepath: nil,
  profile_file_prefix: "onnxruntime_profile_",
  session_config_entries: nil
})

Run Options

model.predict(input_feed, {
  output_names: nil,
  log_severity_level: 2,
  log_verbosity_level: 0,
  logid: nil,
  terminate: false,
  output_type: :ruby       # :ruby or :numo
})

Inference Session API

You can also use the Inference Session API, which follows the Python API.

session = OnnxRuntime::InferenceSession.new("model.onnx")
session.run(nil, {x: [1, 2, 3]})

The Python example models are included as well.

OnnxRuntime::Datasets.example("sigmoid.onnx")

GPU Support

To enable GPU support on Linux and Windows, download the appropriate GPU release and set:

OnnxRuntime.ffi_lib = "path/to/lib/libonnxruntime.so" # onnxruntime.dll for Windows

and use:

model = OnnxRuntime::Model.new("model.onnx", providers: ["CUDAExecutionProvider"])

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development and testing:

git clone https://github.com/ankane/onnxruntime-ruby.git
cd onnxruntime-ruby
bundle install
bundle exec rake vendor:all
bundle exec rake test

onnxruntime-ruby's People

Contributors

Stargazers

Watchers

Forkers

stjordanis liv09370 secryst cxz pjmelling

onnxruntime-ruby's Issues

execution_mode argument raises ArgumentError

Hey @ankane, thank you for this awesome gem! It works perfect with all the other session options except execution_mode.

model = OnnxRuntime::Model.new('model.onnx', { execution_mode: :sequential })

For some reason, this fails with error message:

ArgumentError (wrong number of arguments (2 for 0))

Is that a bug? I tried to look into it but seems like it is linked to FFI, which is out of my skills at the moment.

Can't get inference running on GPU

First off, great work! I have loaded the yolov8n.onnx model from ultralytics and it runs on CPU no problem.

When attempting to run on my GPU I can see the CUDAExecutionProvider is available, but

require "onnxruntime"
require "mini_magick"
require "numo/narray"
OnnxRuntime.ffi_lib = "/app/onnxruntime-linux-x64-gpu-1.15.1/lib/libonnxruntime.so"

session = OnnxRuntime::InferenceSession.new "yolov8n.onnx"
session.providers

=> ["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"]

but when I run predictions it's only using the CPU

When input is bool tensor

I am trying to make it work with a network that accepts the input as a boolean tensor, but something is wrong.

In inference_session.rb:226 there is a code

if tensor_type == :bool
  tensor_type = :uchar
  flat_input = flat_input.map { |v| v ? 1 : 0 }
end

So it detects a 'bool' type from the ONNX model, which means that the model is designed to accept bool. Then, it sets the type to uchar.
And for me, what happens next, is that continued inference produces error OnnxRuntime::Error: type 17 is not supported in this function, and as I understand, that kinda makes sense.

One workaround would be to I guess make the ONNX model accept tensor as uchar, and inside of it's forward function convert it back to bool. But for some reason I get weird and inconsistent Gather errors from that. And even more, for this I need to change the architecture of the model, that in the end I actually can not use it for the models, that were trained with network prior to changing architecture.

Other thing I tried, is to do this there (took that code from FFI::Pointer#write_array_of_type)

if tensor_type == :bool
    size = ::FFI.type_size(::FFI::TYPE_BOOL)
    flat_input.each_with_index { |val, i|
      break unless i < input_tensor_values.size
      input_tensor_values.write(::FFI::TYPE_BOOL, val)
    }
end

But that totally doesn't work.

I would love to hear if you have any experience with this. What I don't understand is why it's not possible to just send the array of bools natively, why you even had to make this case for if tensor_type == :bool and convert them to bytes?

Ideas

Add support for TensorRT - tensor-rt branch
Add support for CoreML (waiting for C API)
Add support for global threadpool - global_threadpool branch

Calling predict with input_feed of wrong length doesn't raise an error?

Seems like the Ruby API is inconsistent with the Python API in that this is an error condition. In Ruby we can do things like model.predict(float_input: []), whereas in Python this would rase an error when the model expects inputs.

Note: This is true any input length that doesn't match the expected size

Happy to take a stab at a PR for this, but wanted feedback as this would be a breaking change...

Plan

Support more types
Better output for sequence and map types for inputs and outputs methods

Blocked - microsoft/onnxruntime#2746

ModelMetadata class
model.end_profiling method

Not doing (for now)

OnnxRuntime.device method - not available for C API by design
Backend API - probably not needed

Python API: https://microsoft.github.io/onnxruntime/python/api_summary.html

Inquiry About Merging the TensorRT Branch

I see there is a branch: tensor-rt that added TensorRT support, but it hasn't been merged. May I ask if there are any issues encountered? And is there an ETA for when it may be merged? Thank you.

Instantiated model occasionally... goofs? Requiring re-instantiation

Hello! I wish I could better describe the problem I'm encountering, but hopefully I can demonstrate it effectively.

I find myself scoring a whole bunch of things serially. The dummy approach of loading a new model every time was a waste of time, so I decided to create a lil singleton that would load the model once and then run predictions as needed. A problem I'm encountering is that after some indeterminate number of predictions the model will kinda... stop working good. I'll get errors like Invalid Feed Input Name:��Y� or input name cannot be empty.

If I then reload the model and run the same inputs that just failed it'll carry on successfully for a while.

I wrote up a little test scenario to demonstrate. Forgive me if I don't attach the model as it's, you know, the corporate sauce. Hopefully this will be useful to you without that.

require "singleton"
require "onnxruntime"

class TrainedModel
  include Singleton

  MODEL_NAME = "trained_model.onnx"

  attr_reader :model
  attr_accessor :consecutive_successes

  def self.predict(input)
    results = instance.model.predict(input)
    instance.consecutive_successes += 1
    results
  rescue OnnxRuntime::Error => e
    puts "Reloading model due to exception: #{e}. Consecutive successes: #{instance.consecutive_successes}"
    instance.consecutive_successes = 0
    instance.load_model
    instance.model.predict(input)
  end

  def initialize
    @consecutive_successes = 0
    load_model
  end

  def predict(input)
    model.predict(input)
  end

  def load_model
    @model = OnnxRuntime::Model.new(model_path)
  end

  def model_path
    File.join(File.dirname(__FILE__), self.class::MODEL_NAME)
  end
end

# Our model accepts an array of 10 #s betwixt 0.0 and 1.0
input = Array.new(10) { 1.0 }

n = 100_000

n.times do
  TrainedModel.predict({"input:0" => [input]})
end

puts "Final consecutive successes: #{TrainedModel.instance.consecutive_successes}"

What's curious is that the failures seem to be in the earliest set of predictions, as shown in a few runs of the test:

➜  onnxruntime_test ruby script.rb
Reloading model due to exception: input name cannot be empty. Consecutive successes: 1669
Final consecutive successes: 98330
➜  onnxruntime_test ruby script.rb
Reloading model due to exception: Invalid Feed Input Name:��Wb�. Consecutive successes: 80
Final consecutive successes: 99919
➜  onnxruntime_test ruby script.rb
Reloading model due to exception: input name cannot be empty. Consecutive successes: 761
Reloading model due to exception: Invalid Feed Input Name:��F�. Consecutive successes: 1674
Final consecutive successes: 97563
➜  onnxruntime_test ruby script.rb
Reloading model due to exception: Invalid Feed Input Name:�����. Consecutive successes: 216
Reloading model due to exception: Invalid Feed Input Name:�����. Consecutive successes: 43
Final consecutive successes: 99739

Anyways, this retry strategy is gonna get my feature delivered, but it sure is curious.

I'm on an M1 Mac Mini but a coworker saw similar results on an Intel-based Mac.

YAMNet error

I'm getting into the following error when trying to run a prediction using a converted yamnet tflite model.

[E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Slice node. Name:'yamnet_frames/tf_op_layer_StridedSlice_2/StridedSlice_2;StatefulPartitionedCall/yamnet_frames/tf_op_layer_StridedSlice_2/StridedSlice_2' Status Message: slice.cc:197 FillVectorsFromInput Starts and axes shape mismatch

I started by downloading the tflite model from the TensorFlow Hub. I then used tf2onnx to convert the model to onnx from Terminal.

python -m tf2onnx.convert --tflite lite-model_yamnet_tflite_1.tflite --output yamnet.onnx

Here's the Ruby code that I used to invoke the model. I've used this exact same sample array to run predictions in Node.js using the YAMNet TensorFlow.js model, so I'm expecting the same results here in Ruby/ONNX.

model = OnnxRuntime::Model.new("yamnet.onnx")
samples = [-0.000396728515625, 0, 0.000457763671875, 0.000457763671875, ...]
model.predict({ "waveform" => samples })

This is the shape of the inputs

[{:name=>"waveform", :type=>"tensor(float)", :shape=>[-1]}]

Do you have any thoughts as to why I'm getting this error returned? Do you have any suggestions on how to resolve? Thanks!