GithubHelp home page GithubHelp logo

laurentmazare / ocaml-torch Goto Github PK

View Code? Open in Web Editor NEW
399.0 14.0 39.0 9.08 MB

OCaml bindings for PyTorch

License: Apache License 2.0

OCaml 52.22% Makefile 0.05% C++ 24.34% Jupyter Notebook 0.18% Python 0.04% C 23.17%
ocaml pytorch machine-learning artificial-intelligence neural-network tensor gpu deep-learning

ocaml-torch's Introduction

Warning

Development for this repo has moved to https://github.com/janestreet/torch As of 2023-05-10, the version in the new repo only supports PyTorch 1.13.0 so you should continue using the old repo for PyTorch 2.0.0 support.

ocaml-torch

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Main workflow

These bindings use the PyTorch C++ API and are mostly automatically generated. The current GitHub tip and the opam package v0.7 corresponds to PyTorch v2.0.0.

On Linux note that you will need the PyTorch version using the cxx11 abi cpu version, cuda 11.7 version.

Opam Installation

The opam package can be installed using the following command. This automatically installs the CPU version of libtorch.

opam install torch

You can then compile some sample code, see some instructions below. ocaml-torch can also be used in interactive mode via utop or ocaml-jupyter.

Here is a sample utop session.

utop

Build a Simple Example

To build a first torch program, create a file example.ml with the following content.

open Torch

let () =
  let tensor = Tensor.randn [ 4; 2 ] in
  Tensor.print tensor

Then create a dune file with the following content:

(executables
  (names example)
  (libraries torch))

Run dune exec example.exe to compile the program and run it!

Alternatively you can first compile the code via dune build example.exe then run the executable _build/default/example.exe (note that building the bytecode target example.bc may not work on macos).

Tutorials and Examples

Some more advanced applications from external repos:

Sample Code

Below is an example of a linear model trained on the MNIST dataset (full code).

  (* Create two tensors to store model weights. *)
  let ws = Tensor.zeros [image_dim; label_count] ~requires_grad:true in
  let bs = Tensor.zeros [label_count] ~requires_grad:true in

  let model xs = Tensor.(mm xs ws + bs) in
  for index = 1 to 100 do
    (* Compute the cross-entropy loss. *)
    let loss =
      Tensor.cross_entropy_for_logits (model train_images) ~targets:train_labels
    in

    Tensor.backward loss;

    (* Apply gradient descent, disable gradient tracking for these. *)
    Tensor.(no_grad (fun () ->
        ws -= grad ws * f learning_rate;
        bs -= grad bs * f learning_rate));

    (* Compute the validation error. *)
    let test_accuracy =
      Tensor.(argmax ~dim:(-1) (model test_images) = test_labels)
      |> Tensor.to_kind ~kind:(T Float)
      |> Tensor.sum
      |> Tensor.float_value
      |> fun sum -> sum /. test_samples
    in
    printf "%d %f %.2f%%\n%!" index (Tensor.float_value loss) (100. *. test_accuracy);
  done
  • Some ResNet examples on CIFAR-10.
  • A simplified version of char-rnn illustrating character level language modeling using Recurrent Neural Networks.
  • Neural Style Transfer applies the style of an image to the content of another image. This uses some deep Convolutional Neural Network.

Models and Weights

Various pre-trained computer vision models are implemented in the vision library. The weight files can be downloaded at the following links:

Running the pre-trained models on some sample images can the easily be done via the following commands.

dune exec examples/pretrained/predict.exe path/to/resnet18.ot tiger.jpg

Alternative Installation Option

This alternative way to install ocaml-torch could be useful to run with GPU acceleration enabled.

The libtorch library can be downloaded from the PyTorch website (2.0.0 cpu version).

Download and extract the libtorch library then to build all the examples run:

export LIBTORCH=/path/to/libtorch
git clone https://github.com/LaurentMazare/ocaml-torch.git
cd ocaml-torch
make all

ocaml-torch's People

Contributors

arulselvanmadhavan avatar crackcomm avatar laurentmazare avatar mirca avatar mossbanay avatar mreppen avatar tachukao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocaml-torch's Issues

backward pass with manually specified gradient

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py has an example showing:

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

The key line here is y.backward(v)

The OCaml bindings for backward only take a single tensor:

grep "val backward" . -R
./torch/optimizer.mli:val backward_step : ?clip_grad:Clip_grad.t -> t -> loss:Tensor.t -> unit
./wrapper/wrapper.mli:  val backward : ?keep_graph:bool -> ?create_graph:bool -> t -> unit

If we search for "t -> t -> unit", nothing useful shows up:

rep "t -> t -> unit" . -R
./torch/tensor.mli:val ( += ) : t -> t -> unit
./torch/tensor.mli:val ( -= ) : t -> t -> unit
./torch/tensor.mli:val ( *= ) : t -> t -> unit
./torch/tensor.mli:val ( /= ) : t -> t -> unit
./torch/optimizer.mli:val step : ?clip_grad:Clip_grad.t -> t -> unit

What is the OCaml way of running a backward pass with a custom gradient ?

Confused about types

I think this should work:

open Torch
Tensor.(arange1 ~start:(f 0.) ~end:(f 1.))

But I get

Error: This expression has type Tensor.t but an expression was expected of type
         Torch_core.Wrapper.Scalar.t

RL examples no longer compile (again)

Eg. running ocaml-torch/_build/default/examples/reinforcement-learning/dqn_atari.exe produces

actions: NOOP,FIRE,RIGHT,LEFT,RIGHTFIRE,LEFTFIRE
0 0 (0/0 frames)
zsh: segmentation fault

EDIT: Tried with char_rnn and it works fine, so possibly unique to RL?

examples/mnist/dune torch_core dependency

https://github.com/LaurentMazare/ocaml-torch/blob/master/examples/mnist/dune refers to torch_core

cat dune; echo "==="; dune build mnist_linear.exe
(executable
 (name mnist_linear)
 (libraries base torch_core torch stdio))

===
File "dune", line 3, characters 17-27:
3 |  (libraries base torch_core torch stdio))
                     ^^^^^^^^^^
Error: Library "torch_core" not found.
Hint: try: dune external-lib-deps --missing mnist_linear.exe

However, when I try to find torch_core

opam search torch_core
# Packages matching: match(*torch_core*)
# No matches found

It seems this module does not exist.

vgg19.ot file

Dearest maintainer,

I have spent a while searching around the internet for a compatible vgg19.ot file for https://github.com/LaurentMazare/tch-rs . I was wondering if you had plans to provide the provide the file or could give advice on how I can generate the file? I have found .pth, .caffemodel, .h5 and a .cpkt but I honestly have little clue to what I am doing.

Thank you for any advice!
Becker

Support for probability distributions

Hi,

It seems to me that ocaml-torch does not provide API for torch.distributions.
Specifically, it does not implement various probability distributions with some common functions such as sample and log_prob.
I think they are important enough and should be implemented here.

Best,
Gwonsoo

Reading a .pt files gives an error

Hi! I have been trying to use ocaml-torch to load a PyTorch model that has already been trained. I have installed PyTorch 1.9.0 which is what is compatible with ocaml-torch. Upon trying to load the file, it throws the following error

Fatal error: exception (Failure
"version_number <= kMaxSupportedFileFormatVersion ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 4, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:131)\

Can you suggest what can be done in this case? Thank you!

Edit: Source code
python file -

net = CNN()
net.eval()
example = torch.rand(16384)
traced = torch.jit.trace(net, example)
traced.save("./models/feat.pt")

ocaml file -

open Base
open Torch

let model = Module.load "../models/feat.pt" ;;

Installing for GPU acceleration

Hello,

When I first installed ocaml-torch earlier, I used opam install torch, and I was running the CPU version.
But now I want to accelerate this with GPU, and to do that I did the followings:

cd ~
wget https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.7.0.zip
unzip libtorch-cx11.....zip   // then now I have ~/libtorch
sudo rm -r ocaml-torch

export LIBTORCH=~/libtorch
git clone https://github.com/LaurentMazare/ocaml-torch.git
cd ocaml-torch
make all

In my code, I used

let device = T.Device.cuda_if_available () in
let vs = T.Var_store.create ~name:"my-project" ~device () in ...

But while running my code, when I typed nvidia-smi, the result was below:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:18:00.0 Off |                  N/A |
| 31%   41C    P8    17W / 250W |      1MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:3B:00.0 Off |                  N/A |
| 31%   41C    P8    21W / 250W |      1MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:86:00.0 Off |                  N/A |
| 32%   40C    P8    16W / 250W |      1MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Maybe my code is still relying on the CPU version. What should I do more to use the GPU acceleration?

Thanks,
Gwonsoo

Linking error when building using the conda PyTorch package

First of all, thanks for the great work on ocaml-torch. :-)

In order to use GPU acceleration, I tried the Alternative Installation Options proposed in the README:

  • The first alternative (using the pre-built binaries) works for me.
  • However, I get a linking error if I try to build it using the conda package (see details at the end of this message).

In both cases, I get the following warning:

ocamlmklib src/wrapper/dlltorch_core_stubs.so,src/wrapper/libtorch_core_stubs.a
Unknown option -Wl,--no-as-needed

Details on the linking error

My configuration: Ubuntu 18.04, CUDA 10.2, Ocaml 4.10+flambda.

Running

conda create -n torch
source activate torch
conda install pytorch=1.5.0 -c pytorch

git clone https://github.com/LaurentMazare/ocaml-torch.git
cd ocaml-torch
make all

results in the following error.

(torch) jonathan@jonathan-aurora:~/Software/ocaml-torch$ make all
dune build examples/basics/basics.exe examples/char_rnn/char_rnn.exe examples/cifar/cifar_train.exe examples/gan/began.exe examples/gan/gan_stability.exe examples/gan/mnist_cgan.exe examples/gan/mnist_dcgan.exe examples/gan/mnist_gan.exe examples/gan/progressive_growing_gan.exe examples/gan/relativistic_dcgan.exe examples/jit/load_and_run.exe examples/mnist/conv.exe examples/mnist/linear.exe examples/mnist/nn.exe examples/neural_transfer/neural_transfer.exe examples/pretrained/finetuning.exe examples/pretrained/predict.exe examples/yolo/yolo.exe examples/vae/vae.exe examples/translation/seq2seq.exe examples/transformer/transformer.exe bin/tensor_tools.exe
  ocamlmklib src/wrapper/dlltorch_core_stubs.so,src/wrapper/libtorch_core_stubs.a
Unknown option -Wl,--no-as-needed
    ocamlopt examples/basics/basics.exe (exit 2)
(cd _build/default && /home/jonathan/.opam/default/bin/ocamlopt.opt -w @[email protected]@30..39@[email protected]@[email protected] -strict-sequence -strict-formats -short-paths -keep-locs -g -o examples/basics/basics.exe /home/jonathan/.opam/default/lib/base/base_internalhash_types/base_internalhash_types.cmxa -I /home/jonathan/.opam/default/lib/base/base_internalhash_types /home/jonathan/.opam/default/lib/base/caml/caml.cmxa /home/jonathan/.opam/default/lib/sexplib0/sexplib0.cmxa /home/jonathan/.opam/default/lib/base/shadow_stdlib/shadow_stdlib.cmxa /home/jonathan/.opam/default/lib/base/base.cmxa -I /home/jonathan/.opam/default/lib/base /home/jonathan/.opam/default/lib/ocaml/unix.cmxa -I /home/jonathan/.opam/default/lib/ocaml /home/jonathan/.opam/default/lib/ocaml/bigarray.cmxa -I /home/jonathan/.opam/default/lib/ocaml /home/jonathan/.opam/default/lib/ocaml/threads/threads.cmxa -I /home/jonathan/.opam/default/lib/ocaml /home/jonathan/.opam/default/lib/integers/integers.cmxa -I /home/jonathan/.opam/default/lib/integers /home/jonathan/.opam/default/lib/ctypes/ctypes.cmxa -I /home/jonathan/.opam/default/lib/ctypes /home/jonathan/.opam/default/lib/ctypes/ctypes-foreign-base.cmxa -I /home/jonathan/.opam/default/lib/ctypes /home/jonathan/.opam/default/lib/ctypes/ctypes-foreign-threaded.cmxa -I /home/jonathan/.opam/default/lib/ctypes -I /home/jonathan/.opam/default/lib/ctypes /home/jonathan/.opam/default/lib/ocaml/str.cmxa -I /home/jonathan/.opam/default/lib/ocaml /home/jonathan/.opam/default/lib/ctypes/cstubs.cmxa -I /home/jonathan/.opam/default/lib/ctypes src/wrapper/torch_core.cmxa -I src/wrapper /home/jonathan/.opam/default/lib/stdio/stdio.cmxa src/torch/torch.cmxa examples/basics/.basics.eobjs/native/basics.cmx)
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::dispatchKeyToBackend(c10::DispatchKey)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Backend.h:98: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::backendToDeviceType(c10::Backend)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Backend.h:153: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Backend.h:155: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::scalarTypeToTypeMeta(c10::ScalarType)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:202: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::typeMetaToScalarType(caffe2::TypeMeta)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/ScalarType.h:228: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `at_save_multi':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:201: undefined reference to `torch::serialize::OutputArchive::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor const&, bool)'
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:201: undefined reference to `torch::serialize::OutputArchive::save_to(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `at_load_multi':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:210: undefined reference to `torch::serialize::InputArchive::load_from(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>)'
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:210: undefined reference to `torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `at_load_callback':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:224: undefined reference to `torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `at_load_multi_':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:234: undefined reference to `torch::serialize::InputArchive::load_from(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>)'
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:234: undefined reference to `torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool)'
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:234: undefined reference to `torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `atm_load':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:464: undefined reference to `torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `atm_load_str(char*, unsigned long)':
/home/jonathan/Software/ocaml-torch/_build/default/src/wrapper/torch_api.cpp:471: undefined reference to `torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::Device::validate()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Device.h:96: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Device.h:98: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::operator<<(std::ostream&, c10::Layout)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/Layout.h:37: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr_target::~intrusive_ptr_target()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:96: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:99: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/core/DispatchKeySet.h:61: more undefined references to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::IValue::IValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h:748: undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::QualifiedName::QualifiedName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/qualified_name.h:16: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/qualified_name.h:23: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/qualified_name.h:31: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::slice(unsigned long, unsigned long) const':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/ArrayRef.h:167: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::ClassType::numAttributes() const':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/jit_type.h:1551: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/core/jit_type.h:1556: more undefined references to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `at::Context::defaultGenerator(c10::Device)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/Context.h:37: undefined reference to `c10::DeviceTypeName[abi:cxx11](c10::DeviceType, bool)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/ATen/Context.h:37: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::autograd::AutogradMeta::set_requires_grad(bool, c10::TensorImpl*)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:215: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::autograd::AutogradMeta::AutogradMeta(c10::TensorImpl*, bool, torch::autograd::Edge)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:243: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:246: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::jit::Node::inBlockList() const':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/jit/ir/ir.h:837: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/jit/ir/ir.h:1185: more undefined references to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::jit::Object::get_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/jit/api/object.h:90: undefined reference to `torch::jit::Object::find_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/jit/api/object.h:93: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::arange(c10::Scalar, c10::TensorOptions const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:154: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::arange(c10::Scalar, c10::Scalar, c10::TensorOptions const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:181: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::arange(c10::Scalar, c10::Scalar, c10::Scalar, c10::TensorOptions const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:209: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::bartlett_window(long, c10::TensorOptions const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:238: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::bartlett_window(long, bool, c10::TensorOptions const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:265: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/generated/variable_factories.h:293: more undefined references to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/jit/api/module.h:112: undefined reference to `torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::operator<<(torch::serialize::OutputArchive&, at::Tensor const&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/serialize/tensor.h:10: undefined reference to `torch::serialize::OutputArchive::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor const&, bool)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::operator>>(torch::serialize::InputArchive&, at::Tensor&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/serialize/tensor.h:17: undefined reference to `torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `torch::optim::Adam::Adam(std::vector<torch::optim::OptimizerParamGroup, std::allocator<torch::optim::OptimizerParamGroup> >, torch::optim::AdamOptions)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/adam.h:53: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/adam.h:54: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/adam.h:56: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/adam.h:57: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/adam.h:58: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/optim/rmsprop.h:58: more undefined references to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `void torch::save<at::Tensor, char*&>(at::Tensor const&, char*&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/serialize.h:44: undefined reference to `torch::serialize::OutputArchive::save_to(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `void torch::load<at::Tensor, char*&>(at::Tensor&, char*&)':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/serialize.h:108: undefined reference to `torch::serialize::InputArchive::load_from(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::retain_()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >::retain_()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr<c10::detail::ListImpl, c10::detail::intrusive_target_default_null_type<c10::detail::ListImpl> >::retain_()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr<c10::detail::DictImpl, c10::detail::intrusive_target_default_null_type<c10::detail::DictImpl> >::retain_()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o): In function `c10::intrusive_ptr<c10::ivalue::ConstantString, c10::detail::intrusive_target_default_null_type<c10::ivalue::ConstantString> >::retain_()':
/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
src/wrapper/libtorch_core_stubs.a(torch_api.o):/home/jonathan/Software/anaconda3/envs/torch/lib/python3.8/site-packages/torch/include/c10/util/intrusive_ptr.h:188: more undefined references to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
collect2: error: ld returned 1 exit status
File "caml_startup", line 1:
Error: Error during linking

Minor error in A2C example, probably?

In your RL examples in rollout.ml you create the initial observation by E.reset and put it into the frame stack but not into the s_states.

For A2C, IMHO, it leads to the very first calculation of the model outputs after creation of the rollout is made over all zeros.

installation does not seem to properly install libtorch?

according to the README libtorch should be installed automatically. however, i get

utop # #require "torch.toplevel";;
Cannot load required shared library dlltorch_core_stubs.
Reason: /Users/nbecker/.opam/4.07.1+flambda/lib/stublibs/dlltorch_core_stubs.so: dlopen(/Users/nbecker/.opam/4.07.1+flambda/lib/stublibs/dlltorch_core_stubs.so, 10): Library not loaded: @rpath/libc10.dylib
  Referenced from: /Users/nbecker/.opam/4.07.1+flambda/lib/stublibs/dlltorch_core_stubs.so
  Reason: image not found.
Error: Reference to undefined global `Torch_core__Wrapper'

after installing opam reinstall torch

Compile from source with pre-built libtorch

Hi, Thank you for a great library.

I've tried to build with pyTorch pre-built binaries (for GPU support). It was working few days ago. However, this morning I've got the following messages. My guess is that there is some changes in the latest pre-buit libtorch.

### output ###
#   ocamlmklib src/wrapper/dlltorch_core_stubs.so,src/wrapper/libtorch_core_stubs.a (exit 2)
# (cd _build/default && /root/.opam/default/bin/ocamlmklib.opt -g -o src/wrapper/torch_core_stubs src/wrapper/torch_stubs.o src/wrapper/torch_api.o -lstdc++ -Wl,-rpath,/root/.opam/default/lib/libtorch/lib -L/root/.opam/default/lib/libtorch/lib -lc10 -lcaffe2 -ltorch -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcuda -lnvrtc)
# /usr/bin/ld: src/wrapper/torch_api.o: relocation R_X86_64_PC32 against symbol `_ZN3c10eqERKNS_12TensorTypeIdES2_' can not be used when making a shared object; recompile with -fPIC
# /usr/bin/ld: final link failed: Bad value
# collect2: error: ld returned 1 exit status

How to know the device on which a tensor is stored?

There is a function Tensor.to_device to move a tensor to another device but I could not find a function Tensor.get_device: Tensor.t -> Device.t to know the device on which a tensor is stored.

Is there any workaround and do you think it might be worth adding such a function?

Stack overflow from the compiler.

When trying to build the current master (corresponding to pytorch 1.7), I get a stack overflow from the compiler for src/wrapper/torch_bindings_generated.ml. It is indeed a very long file, but still I'm surprised. Here is the backtrace:

Fatal error: exception Stack overflow
Raised by primitive operation at Mach.instr_cons_debug in file "asmcomp/mach.ml", line 137, characters 2-185
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Asmgen.(++) in file "asmcomp/asmgen.ml" (inlined), line 79, characters 15-18
Called from Asmgen.compile_fundecl in file "asmcomp/asmgen.ml", line 84, characters 2-624
Called from Stdlib__list.iter in file "list.ml", line 110, characters 12-15
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Asmgen.(++) in file "asmcomp/asmgen.ml" (inlined), line 79, characters 15-18
Called from Asmgen.end_gen_implementation in file "asmcomp/asmgen.ml", line 153, characters 2-128
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Asmgen.compile_unit.(fun) in file "asmcomp/asmgen.ml", line 134, characters 7-231
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Optcompile.clambda.(fun) in file "driver/optcompile.ml", line 78, characters 7-336
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Compile_common.implementation.(fun) in file "driver/compile_common.ml", line 121, characters 71-113
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Misc.try_finally in file "utils/misc.ml", line 31, characters 8-15
Re-raised at Misc.try_finally in file "utils/misc.ml", line 45, characters 10-56
Called from Compenv.process_action in file "driver/compenv.ml", line 596, characters 6-59
Called from Stdlib__list.iter in file "list.ml", line 110, characters 12-15
Called from Compenv.process_deferred_actions in file "driver/compenv.ml", line 672, characters 2-61
Called from Optmain.main in file "driver/optmain.ml", line 55, characters 6-163
Re-raised at Location.report_exception.loop in file "parsing/location.ml", line 926, characters 14-25
Called from Optmain.main in file "driver/optmain.ml", line 133, characters 6-37
Called from Optmain in file "driver/optmain.ml", line 137, characters 2-9

I'm a bit puzzled because it is short, while I'd expected it to be very long since there's a stack overflow. Have you also met this problem at some point? There might be something fishy with my environment I don't know, but at least if I comment a part of the file it compiles fine.

Update for PyTorch 1.5

The rust version of these bindings has been updated for the latest PyTorch 1.5 release through this PR, we should make the same changes in ocaml-torch.

freeing up temporary CUDA tensors w/o Caml.Gc.full_major() ?

Here's the issue I'm running into:

  1. Rust has RAII. Thus, when a Tensor no longer has references, it's Drop is called, which I suspect triggers calling cuda_free.

  2. OCaml has GC. Each Tensor has a very small foot print (pointer?) on the CPU but possibly huge on the GPU (entire contents of the tensor.)

  3. The only way I know to tell OCaml to do cuda_free is to call "Caml.Gc.full_major()" which seems to slow down training quite a bit (doing a full GC on every training step.)

  4. Is there a way in OCaml to do "do a GC of all tensors, but without GCing the entire OCaml VM" ?

Tensor.randn fails for small matrices

Hi. I have a problem with creating small matrices using randn:

let t = Tensor.randn [ 3; 4 ] in
Tensor.print t

It does not work for sizes smaller than 16. Built with libtorch 1.5.0 that's linked in README.

Strace:

openat(AT_FDCWD, "/dev/urandom", O_RDONLY) = 3
read(3, "N*\273\201\7\204\226z", 8)     = 8
close(3)                                = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x7fdd32008823} ---
+++ killed by SIGILL +++
Illegal instruction

Entire valgrind log.

In the meantime I can do that successfully:

let from = 0.0 and to_ = 1.0 in
Tensor.zeros ~requires_grad:false [ 3 ]
|> Tensor.uniform_ ~from ~to_
|> Tensor.print

Also a question, how can I run tests in ocaml-torch?

Edit: It's highly likely that the libtorch is compiled with SSE4 or other that my CPU doesn't support.

Exception thrown on CUDA with Tensor.max of Tensor.f

Works

Tensor.(max (f 2.) (ones [3] ~device:Device.Cpu))
Tensor.(add (f 2.) (ones [3] ~device:Device.Cpu))
Tensor.(add (f 2.) (ones [3] ~device:(Device.Cuda 0)))

Fails

Tensor.(max (f 2.) (ones [3] ~device:(Device.Cuda 0)))
Exception:
(Failure
   "iter.device(arg).is_cuda() INTERNAL ASSERT FAILED at \"/pytorch/aten/src/ATen/native/cuda/Loops.cuh\":94, please report a bug to PyTorch. \
  \nException raised from gpu_kernel at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:94 (most recent call first):\
  \nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f42b5d9ab89 in ~/opt/libtorch/lib/libc10.so)\
  \nframe #1: void at::native::gpu_kernel<__nv_hdl_wrapper_t<false, true, __nv_dl_tag<void (*)(at::TensorIterator&), &at::native::maximum_kernel_cuda, 8u>, float (float, float)> >(at::TensorIterator&, __nv_hdl_wrapper_t<false, true, __nv_dl_tag<void (*)(at::TensorIterator&), &at::native::maximum_kernel_cuda, 8u>, float (float, float)> const&) + 0x204 (0x7f4261cb8c64 in ~/opt/libtorch/lib/libtorch_cuda.so)\
  \nframe #2: at::native::maximum_kernel_cuda(at::TensorIterator&) + 0xfc (0x7f4261c9bf6c in ~/opt/libtorch/lib/libtorch_cuda.so)\
  \nframe #3: <unknown function> + 0xcdd4a4 (0x7f42a5cf14a4 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #4: at::native::maximum(at::Tensor const&, at::Tensor const&) + 0x93 (0x7f42a5ce4fe3 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #5: <unknown function> + 0x3259a3d (0x7f4262d88a3d in ~/opt/libtorch/lib/libtorch_cuda.so)\
  \nframe #6: <unknown function> + 0xaf1c34 (0x7f42a5b05c34 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #7: at::Tensor c10::Dispatcher::callWithDispatchKey<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor const&, at::Tensor const&) const + 0x1ce (0x7f42a64ee39e in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #8: at::maximum(at::Tensor const&, at::Tensor const&) + 0xb7 (0x7f42a63d23d7 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #9: <unknown function> + 0x29d3070 (0x7f42a79e7070 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #10: <unknown function> + 0xaf1c34 (0x7f42a5b05c34 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #11: at::Tensor c10::Dispatcher::callWithDispatchKey<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor const&, at::Tensor const&) const + 0x1ce (0x7f42a64ee39e in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #12: at::maximum(at::Tensor const&, at::Tensor const&) + 0xb7 (0x7f42a63d23d7 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #13: at::native::max(at::Tensor const&, at::Tensor const&) + 0x1d (0x7f42a5cdbb9d in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #14: <unknown function> + 0x15af0cd (0x7f42a65c30cd in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #15: <unknown function> + 0xaf1c34 (0x7f42a5b05c34 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #16: at::Tensor c10::Dispatcher::callWithDispatchKey<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKey, at::Tensor const&, at::Tensor const&) const + 0x1ce (0x7f42a64ee39e in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #17: at::max(at::Tensor const&, at::Tensor const&) + 0xb7 (0x7f42a63d2517 in ~/opt/libtorch/lib/libtorch_cpu.so)\
  \nframe #18: atg_max1 + 0x3f (0x55ebd562aa28 in _build/default/src/torch/.utop/utop.exe)\
  \nframe #19: caml__929_atg_max1 + 0x21 (0x55ebd55ccf11 in _build/default/src/torch/.utop/utop.exe)\
  \nframe #20: caml_interprete + 0x905 (0x55ebd56cbfd5 in _build/default/src/torch/.utop/utop.exe)\
  \nframe #21: caml_startup_code + 0x81 (0x55ebd56a7cb1 in _build/default/src/torch/.utop/utop.exe)\
  \nframe #22: main + 0x3b (0x55ebd55bd43b in _build/default/src/torch/.utop/utop.exe)\
  \nframe #23: __libc_start_main + 0xf2 (0x7f42a0726152 in /usr/lib/libc.so.6)\
  \nframe #24: _start + 0x2e (0x55ebd55bd47e in _build/default/src/torch/.utop/utop.exe)\
  \n")

It tells me to report to PyTorch, but after trying to imitate Tensor.f in C++, I could not cause the crash.

I noticed that max is treated differently in the following file. Is that related?

let max = max1

Tensor.f defaults to Cpu, but Tensor.add appears to understand/convert. I am at a loss as to what is going on.

Directly constructing a Tensor

In PyTorch, we can do:

x = torch.tensor([5.5, 3])
print(x)
Out:

tensor([5.5000, 3.0000])

Is there a corresponding function in OCaml-Torch? I don't see the corresponding function in Torch. or Torch.Tensor .

Thanks!

opam reinstalls libtorch unconditionally

even directly after a successful opam update && opam upgrade, opam will try to reintall libtorch because of 'upstream changes'. to reproduce, just execute the above command two times in a row

Add some ImageNet training example

Run some training example on the whole ImageNet dataset. There may be some performance implications, e.g. loading images should be done in parallel.

Warning when running the char_rnn example

When running the char_rnn example:

dune exec examples/char_rnn/char_rnn.exe

I get the following warning repeatedly (probably once per training batch):

Warning: RNN module weights are not part of single contiguous chunk of memory.
This means they need to be compacted at every call, possibly greatly increasing memory usage.
To compact weights again call flatten_parameters().
(_cudnn_impl at ../aten/src/ATen/native/cudnn/RNN.cpp:1269)

I am not sure how much it impacts training speed. Right now, each training epoch takes about 2.5 minutes on my Nvidia RTX 2070. (By the way, I think it would be useful to add an indication about training time in the README.)

My configuration

  • Ocaml 4.10+flambda
  • CUDA 10.2
  • ocaml-torch installed using he PyTorch binaries (version 1.5.0, CXX11).

Status of the Transformer Example

I was wondering what the status of the transformer example was. Indeed, the model seems to be implemented in full but it comes with no demo task. From the comments, it seems that the plan was to train it on a simple copy task but this hasn't been implemented yet.

More concretely, here are my questions:

  • Has the current implemented model been tested?
  • Would you be interested in a demo task PR?

Update for PyTorch 1.3.1

The newly released PyTorch version is mostly a bugfix release so hopefully there shouldn't be much work involved.

The simple example does not build

I can run the simple example from the README.md in the toplevel:

utop # #require "torch.toplevel";;
─( 14:21:46 )─< command 1 >────────────────────────────────────────────────────────{ counter: 0 }─
utop # Torch_toplevel.register_all_pps () ;;

  • : unit = ()
    ─( 14:22:02 )─< command 2 >────────────────────────────────────────────────────────{ counter: 0 }─
    utop # open Torch;;
    ─( 14:22:15 )─< command 3 >────────────────────────────────────────────────────────{ counter: 0 }─
    utop # let () =
    let tensor = Tensor.randn [ 4; 2 ] in
    Tensor.print tensor ;;
    -0.8721 -0.6984
    1.2961 -0.3749
    0.9753 -0.9037
    -0.4380 1.5404
    [ CPUFloatType{4,2} ]

However, when I try to build it according to the instructions (created example.ml and dune file):

$ dune build example.exe
gcc src/wrapper/torch_api.o (exit 1)
(cd _build/default/src/wrapper && /usr/bin/gcc -I /home/phil/.opam/4.10.0/lib/ocaml -I /home/phil/.opam/4.10.0/lib/bytes -I /home/phil/.opam/4.10.0/lib/ctypes -I /home/phil/.opam/4.10.0/lib/integers -I /home/phil/.opam/4.10.0/lib/ocaml/threads -std=c++14 -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /home/phil/.opam/4.10.0/lib/libtorch/include -isystem /home/phil/.opam/4.10.0/lib/libtorch/include/torch/csrc/api/include -g -o torch_api.o -c torch_api.cpp)
In file included from torch_api.cpp:6:0:
torch_api.cpp: In function ‘torch::optim::Optimizer* ato_adam(double, double, double, double)’:
torch_api.cpp:326:10: error: ‘struct torch::optim::AdamOptions’ has no member named ‘betas’; did you mean ‘beta1’?
.betas(std::tuple<double, double>(beta1, beta2))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp: In function ‘void ato_add_parameters(optimizer, at::Tensor**, int)’:
torch_api.cpp:372:10: error: ‘class torch::optim::Optimizer’ has no member named ‘param_groups’; did you mean ‘parameters’?
t->param_groups()[0].params().push_back((tensors[i]));
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp: In function ‘void ato_set_learning_rate(optimizer, double)’:
torch_api.cpp:378:19: error: ‘OptimizerOptions’ is not a member of ‘torch::optim’
torch::optim::OptimizerOptions
d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:378:19: note: suggested alternative: ‘Optimizer’
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:378:37: error: ‘d’ was not declared in this scope
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:378:46: error: ‘class torch::optim::Optimizer’ has no member named ‘defaults’
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:379:63: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::AdamOptions*’ (source is not a pointer)
if (auto adam = dynamic_casttorch::optim::AdamOptions*(d))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:381:70: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::RMSpropOptions*’ (source is not a pointer)
else if (auto rms = dynamic_casttorch::optim::RMSpropOptions*(d))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:383:66: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::SGDOptions*’ (source is not a pointer)
else if (auto sgd = dynamic_casttorch::optim::SGDOptions*(d))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp: In function ‘void ato_set_momentum(optimizer, double)’:
torch_api.cpp:392:19: error: ‘OptimizerOptions’ is not a member of ‘torch::optim’
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:392:19: note: suggested alternative: ‘Optimizer’
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:392:37: error: ‘d’ was not declared in this scope
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:392:46: error: ‘class torch::optim::Optimizer’ has no member named ‘defaults’
torch::optim::OptimizerOptions* d = &(t->defaults());
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:393:63: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::AdamOptions*’ (source is not a pointer)
if (auto adam = dynamic_casttorch::optim::AdamOptions*(d)) {
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:397:70: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::RMSpropOptions*’ (source is not a pointer)
else if (auto rms = dynamic_casttorch::optim::RMSpropOptions*(d))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp:399:66: error: cannot dynamic_cast ‘d’ (of type ‘’) to type ‘struct torch::optim::SGDOptions*’ (source is not a pointer)
else if (auto sgd = dynamic_casttorch::optim::SGDOptions*(d))
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’
x
^
torch_api.cpp: In function ‘int ati_tag(ivalue)’:
torch_api.cpp:641:17: error: ‘struct c10::IValue’ has no member named ‘isList’; did you mean ‘isInt’?
else if (i->isList()) return 12;
^
torch_api.h:14:5: note: in definition of macro ‘PROTECT’

... and lots more.

Programs that use ocaml-torch with GPU acceleration segfault right before terminating

Programs I write using ocaml-torch that use GPU acceleration segfault right before terminating:

Segmentation fault (core dumped)

This is not a huge deal as it happens when the program is about to terminate anyway but I was wondering if you had observed the same phenomenon.

In particular, I replicated the problem on your mnist/conv and char_rnn examples.

two ocaml <-> lib torch syntax questions

Here is a small example for doing back prop:

module Tch = Torch;;
module Ten = Torch.Tensor ;;
module Tchc = Torch_core ;;

let x = Ten.ones [2; 2] ~requires_grad:true ~kind:Tchc.Kind.Float in
let y = Ten.(x + Ten.f 2.) in
let z = Ten.(y * y * f 3.) in
let out = Ten.mean z in

Ten.grad x ;;

There are two things I find slightly annoying about this code and want to see if there is a way to fix:

  1. the let x = ... in expressions. I don't mind typing in let x = ..., as we do that all the time in Rust. However, in Rust, we do let x = ... ; it's a single line on it's own, defining a var in the local context. In OCaml, it seems that every let ... in is creating a new environment, containing precisely one binding. Is there a way around this, or is this a fundamental OCaml design ?

  2. The other thing I don't quite like is this line here: let _ = Ten.backward out in when I first look at this, I'm expecting Ten.backward to be a pure function and to be returning a result. But what' it's doing is modifying the memory associated with the gradients for out (and nodes it depends on). If it was something like out.do_backward_pass() it would be a bit more obvious that it's modifying the Tensor. Is there a way around this, or is this also the way it is when using OCaml ?

In particular, I'm expecting (perhaps this is unreasonable -- and I'm overly sensitive to this due to Rust's ref vs mut ref semantics) for functions to be pure, and for things that modify to be member functions.

Build (gcc) error for MNIST linear example

Hi,

I have succeeded to install the library through opam ("opam install torch"), and to run the simple print example described in https://github.com/LaurentMazare/ocaml-torch using dune ("dune exec src/example.exe").
The problem is when I try to run the linear classifier for MNIST (examples/mnist/linear.ml), again, using dune ("dune exec examples/mnist/linear.exe"),
I see some errors related to gcc and torch C++ API.

The relevant specs are as follows:

  • Ubuntu 16.04 LTS
  • Opam version: 2.0.5
  • I tried two Ocaml compiler versions, 4.06.0 and 4.07.0 (and both led to nearly same error messages).
  • gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 (* When I used gcc 7.4.0, the similar error mesages were printed.)

The length of the error messages are too long, and so I attach two separate files for the error messages, one for Ocaml 4.06.0 and the other for Ocaml 4.07.0:
error_log_4.06.0.txt
error_log_4.07.0.txt

Thanks in advance!

Best,
Gwonsoo

linking problem in opam install

I get the following linking error when doing opam install torch
Does the message happen to ring a bell?
I'm on opam 2.02, testing opam switches 4.06.1+fp+flambda as well as 4.07.0+fp+flambda, linux Ubuntu 18.10.
I had got quite excited by the simplicity of the install :-)

∗ installed sexplib.v0.11.0
[ERROR] The compilation of torch failed at "/home/graham-lengrand/.opam/opam-init/hooks/sandbox.sh build dune build -p torch -j 4".

#=== ERROR while compiling torch.0.3 ==========================================#
# context     2.0.2 | linux/x86_64 | base-bigarray.base base-threads.base base-unix.base ocaml-variants.4.06.1+fp+flambda | https://opam.ocaml.org/#421079ff
# path        ~/.opam/4.06.1+fp+flambda/.opam-switch/build/torch.0.3
# command     ~/.opam/opam-init/hooks/sandbox.sh build dune build -p torch -j 4
# exit-code   1
# env-file    ~/.opam/log/torch-29671-3e450d.env
# output-file ~/.opam/log/torch-29671-3e450d.out
### output ###
#   ocamlmklib src/wrapper/dlltorch_core_stubs.so,src/wrapper/libtorch_core_stubs.a (exit 2)
# (cd _build/default && /home/graham-lengrand/.opam/4.06.1+fp+flambda/bin/ocamlmklib.opt -g -o src/wrapper/torch_core_stubs src/wrapper/torch_stubs.o src/wrapper/torch_api.o -lstdc++ -Wl,-rpath,/home/graham-lengrand/.opam/4.06.1+fp+flambda/lib/libtorch/lib -L/home/graham-lengrand/.opam/4.06.1+fp+flambda/lib/libtorch/lib -lc10 -lcaffe2 -ltorch)
# /usr/bin/ld: src/wrapper/torch_api.o: relocation R_X86_64_PC32 against symbol `_ZN3c105ErrorD1Ev' can not be used when making a shared object; recompile with -fPIC
# /usr/bin/ld: final link failed: bad value
# collect2: error: ld returned 1 exit status

Base < 0.14

Hey, is "base" {>= "v0.11.0" & < "v0.14"} a big deal? I tried following patch and there is no issue with base toolchain v0.14.0.

diff --git a/torch.opam b/torch.opam
index 2523b59..8621f70 100644
--- a/torch.opam
+++ b/torch.opam
@@ -11,7 +11,7 @@ run-test: [["dune" "runtest" "-p" name "-j" jobs]]
 build: [["dune" "build" "-p" name "-j" jobs]]
 
 depends: [
-  "base" {>= "v0.11.0" & < "v0.14"}
+  "base" {>= "v0.11.0" & < "v0.15"}
   "cmdliner"
   "ctypes" {>= "0.5"}
   "ctypes-foreign"
@@ -21,11 +21,11 @@ depends: [
   "npy"
   "ocaml" {>= "4.07"}
   "ocaml-compiler-libs"
-  "ppx_custom_printf" {< "v0.14"}
-  "ppx_expect" {< "v0.14"}
-  "ppx_sexp_conv" {< "v0.14"}
-  "sexplib" {< "v0.14"}
-  "stdio" {< "v0.14"}
+  "ppx_custom_printf" {< "v0.15"}
+  "ppx_expect" {< "v0.15"}
+  "ppx_sexp_conv" {< "v0.15"}
+  "sexplib" {< "v0.15"}
+  "stdio" {< "v0.15"}
 ]
 
 available: os = "linux" | os = "macos"

how to flush output while training ?

I have the following code:

module Tch = Torch;;
module Ten = Torch.Tensor ;;
module Tchc = Torch_core ;;

let dtype = Tchc.Kind.Float in
let device = Tchc.Device.Cuda 0 in
let (d_n, d_in, d_h, d_out) = (64, 1000, 100, 10) in
let x = Ten.randn [d_n; d_in] ~device:device ~kind:dtype in
let y = Ten.randn [d_n; d_out] ~device:device ~kind:dtype in
let w1 = ref (Ten.randn [d_in; d_h] ~device:device ~kind:dtype) in
let w2 = ref (Ten.randn [d_h; d_out] ~device:device ~kind:dtype) in
let learning_rate = Ten.f 1e-6 in
let _ =
  for i = 1 to 50 do
    let h = Ten.mm x !w1 in
    let h_relu = Ten.clamp_min h ~min:(Tch.Scalar.float 0.) in
    let y_pred = Ten.mm h_relu !w2 in
    let loss = Ten.( sum (pow (y_pred - y) (Tch.Scalar.float 2.) )) in
    let lf = Ten.float_value loss in
    let _ = Stdio.eprintf "Time step: %d, loss: %f\n" i lf in
    let _ = Stdio.prerr_endline in
    (* let _ = Stdio.eprint_endline in *)
    let grad_y_pred = Ten.( f 2.0 * (y_pred - y)) in
    let grad_w2 = Ten.mm (Ten.transpose h_relu 0 1) grad_y_pred in
    let grad_h_relu = Ten.mm grad_y_pred (Ten.transpose !w2 0 1) in
    let grad_h = Ten.( clamp_min grad_h_relu (Tch.Scalar.float 0.)) in
    let grad_w1 = Ten.(mm (Ten.transpose x 0 1) grad_h) in
    w1 := Ten.( !w1 - learning_rate * grad_w1);
    w2 := Ten.( !w2 - learning_rate * grad_w2);
    Caml.Gc.full_major ()
  done
in 
(dtype, device) ;; 

The training works fine. However, the output is buffered.

It does not matter if I run this in 'dune exec' or 'dune utop'. In both cases, all the output comes out at once (when the for loop is done). Instead, I'd prefer the output to be flushed on every training loop. So the time step + error are printed unbuffered.

How can I fix this?

Linking error during "make all" (during alternative installation Option 1)

Hi,

I am trying to follow the instruction for the alternative installation option 1 described in Github README.

I downloaded libtorch from pytorch.org, with the setting Stable(1.3) / Linux / LibTorch / C++ / 10.1 and by clicking https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.3.1.zip.
Then, I unzipped the directory at ~/lib/libtorch. Then,

export LIBTORCH=~/lib/libtorch
git clone https://github.com/LaurentMazare/ocaml-torch.git
cd ocaml-torch
make all

When I run "make all", the error messages related to linking are printed. The messages are too long, and so I attach a file for the messages:
error_gpu_makeall.txt

The current setup is as follows:

  • OS: Ubuntu 16.04 LTS
  • Opam version: 2.0.5
  • Ocaml compiler version: 4.07.0
  • gcc version: 8.3.0

Thanks in advance!

Best,
Gwonsoo

Use a GADT to add type constraints for tensor elements

The current tensor type is Tensor.t. However tensors can embed multiple kind of elements and calling a function like to_float0_exn on a tensor containing integers is likely to raise.
We could try adding some type information to Tensor.t in the same way this is done for bigarray or the tensorflow tensors from tensorflow-ocaml. This would involve a type like 'a Tensor.t where 'a is the type of underlying element.

Then functions could have the following type:

type 'a t
type 'a kind
val float_kind : float kind
val int_kind : int kind
val create : 'a kind -> shape:int list -> 'a Tensor.t
val to_elem0_exn : 'a Tensor.t -> 'a

The wrapper code for tensor operations is automatically generated from the Declarations.yaml (this file being generated when compiling PyTorch). This file describes all the operations but does not provide much type information, though IndexTensor are used for tensor holding integers, multiple tensors involved in the same ops should have the same type, etc.
There is some ongoing work on cleaning up Declarations.yaml which is also likely to help pytorch/pytorch#12562.

adding pre-existing Tensor to a VarStore

Sample code:

  open Base;;
  module Tch = Torch;;
  module Ten = Torch.Tensor ;;
  module Tchc = Torch_core ;;
  module L = Tch.Layer ;;
  module Opt = Tch.Optimizer ;;
  
  
  let dtype = Tchc.Kind.Float in
  let device = Tchc.Device.Cuda 0 in
>>let x_train = Ten.view (Ten.of_float1 [| 1.; 2.; 3. |] ~device:device) [3; 1] in
  let y_train = Ten.of_float1 [| 1.; 2.; 3. |] ~device:device in
  let w = Ten.randn [1] ~device:device ~kind:dtype ~requires_grad:true in
  let b = Ten.randn [1] ~device:device ~kind:dtype ~requires_grad:true in
  let vs = Tch.Var_store.create ~device:device ~name:"MyVarStore" () in
  let optim = Opt.adam vs ~learning_rate:1e-4 in
  for i = 0 to 100 do
    let y_pred = Ten.(mm x_train w + b) in
    let loss = Ten.mean (Ten.(pow (y_pred - y_train) ~exponent:(Tch.Scalar.float 2.))) in
    let _ = Stdio.eprintf "Time step: %d, loss: %f\n%!" i (Ten.float_value loss) in
    Opt.zero_grad optim;
    Ten.backward loss;
    Opt.step optim;
  done ;;   

  1. This code does not work because w and b are not part of Var_store vs.

  2. I know how to use

val linear
  :  Var_store.t
  -> ?activation:activation (* default: no activation *)
  -> ?use_bias:bool (* default: true *)
  -> ?w_init:Var_store.Init.t
  -> input_dim:int
  -> int
  -> t
  1. Now here's my question: without using Layer.Linear, is there a way to directly construct a Tensor to be part of a VarStore ?

When I look at randn, I see:

val randn : create

then, when we look for create we see:

type create =
  ?requires_grad:bool
  -> ?kind:Torch_core.Kind.packed
  -> ?device:Device.t
  -> ?scale:float
  -> int list
  -> t

So directly creating a randn as part of a Var_store looks unlikely.

Citing this project

Hi Laurent,

We used ocaml-torch in our research, and wrote a paper: https://arxiv.org/abs/2103.00737.
Right before Section 6.1, the paper cites ocaml-torch, but I want to double-check if that is the way of citing this project that you like.

Thanks,
Gwonsoo

Newbie question (building example) (Library not loaded: @rpath/libc10.dylib)

Sorry I'm new to most of this (besides torch) but I wanted to give OCaml a try.

On my mac I get this while building the example.

Any clue what I'm doing wrong?

Thanks

❯ dune build example.bc

      ocamlc example.bc (exit 2)
(cd _build/default && /Users/pmanning/.opam/4.06.0/bin/ocamlc.opt -w @a-4-29-40-41-42-44-45-48-58-59-60-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -o example.bc -I /Users/pmanning/.opam/4.06.0/lib/base -I /Users/pmanning/.opam/4.06.0/lib/base/caml -I /Users/pmanning/.opam/4.06.0/lib/base/shadow_stdlib -I /Users/pmanning/.opam/4.06.0/lib/bytes -I /Users/pmanning/.opam/4.06.0/lib/ctypes -I /Users/pmanning/.opam/4.06.0/lib/integers -I /Users/pmanning/.opam/4.06.0/lib/ocaml/threads -I /Users/pmanning/.opam/4.06.0/lib/sexplib0 -I /Users/pmanning/.opam/4.06.0/lib/stdio -I /Users/pmanning/.opam/4.06.0/lib/torch -I /Users/pmanning/.opam/4.06.0/lib/torch/core /Users/pmanning/.opam/4.06.0/lib/base/caml/caml.cma /Users/pmanning/.opam/4.06.0/lib/base/shadow_stdlib/shadow_stdlib.cma /Users/pmanning/.opam/4.06.0/lib/sexplib0/sexplib0.cma /Users/pmanning/.opam/4.06.0/lib/base/base.cma /Users/pmanning/.opam/4.06.0/lib/stdio/stdio.cma /Users/pmanning/.opam/4.06.0/lib/ocaml/unix.cma /Users/pmanning/.opam/4.06.0/lib/ocaml/bigarray.cma /Users/pmanning/.opam/4.06.0/lib/integers/integers.cma /Users/pmanning/.opam/4.06.0/lib/ctypes/ctypes.cma /Users/pmanning/.opam/4.06.0/lib/ocaml/threads/threads.cma /Users/pmanning/.opam/4.06.0/lib/ctypes/ctypes-foreign-base.cma /Users/pmanning/.opam/4.06.0/lib/ctypes/ctypes-foreign-threaded.cma /Users/pmanning/.opam/4.06.0/lib/ocaml/str.cma /Users/pmanning/.opam/4.06.0/lib/ctypes/cstubs.cma /Users/pmanning/.opam/4.06.0/lib/torch/core/torch_core.cma /Users/pmanning/.opam/4.06.0/lib/torch/torch.cma .example.eobjs/example.cmo)
File "_none_", line 1:
Error: Error on dynamically loaded library: /Users/pmanning/.opam/4.06.0/lib/stublibs/dlltorch_core_stubs.so: dlopen(/Users/pmanning/.opam/4.06.0/lib/stublibs/dlltorch_core_stubs.so, 10): Library not loaded: @rpath/libc10.dylib
  Referenced from: /Users/pmanning/.opam/4.06.0/lib/stublibs/dlltorch_core_stubs.so
  Reason: image not found

Request For Advise :-)

Hi Laurent,

I'd like to play with reinforced learning and OCaml with aim to create working A3C model for my task. Which binding would you recommend for it? Tensorflow of Torch?

Thanks in advance :-)

os x?

is it a problem in principle to support os x, e.g. due to gpu issues?

torch unmet availability conditions: os = "linux"

Installing the GPU accelerated version of ocaml-torch

Hello!
Earlier I had installed ocaml-torch for a CPU version. I am trying to install ocaml-torch for GPU acceleration with the following commands. (Downloaded and unzipped libtorch 1.10)

export LIBTORCH=./libtorch
git clone https://github.com/LaurentMazare/ocaml-torch.git
cd ocaml-torch
make clean
make all

make fails with the following error trace.

File "src/stubs/torch_bindings_generated.ml", line 8309, characters 2-16:
Error: Multiple definition of the type name t.
       Names must be unique in a given structure or signature.
File "src/wrapper/torch_bindings_generated.ml", line 8309, characters 2-16:
Error: Multiple definition of the type name t.
       Names must be unique in a given structure or signature.
Makefile:34: recipe for target 'all' failed
make: *** [all] Error 1

PS: I have deactivated the conda environment before running make.

Can you kindly help me with this?

Thank you!

Unexpected results related to creation/initialization of layers and optimizer (when step function does nothing)

Hi,

I borrowed the example code in ocaml-torch/src/tests/optimization_tests.ml, and changed a little bit and ran it to see if optimization works. The code is below:

module T = Torch
let main () =
  Torch_core.Wrapper.manual_seed 42;
 
  let batch_size = 1 in
  let learning_rate = 1e-3 in
  let xs = T.Scalar.float 1.0 |> T.Tensor.mul1 (T.Tensor.ones [batch_size; 2]) in
  let ys = T.Tensor.f 10.0 in
  let device = T.Device.cuda_if_available () in
  let vs = T.Var_store.create ~name:"minimal" ~device () in
    
  (* TWO LINES BELOW *)
  let optimizer = T.Optimizer.sgd vs ~learning_rate in  (* (1) *)
  let linear = T.Layer.linear vs ~input_dim:2 1 in  (* (2) *)
  
  for _ = 1 to 100 do
    T.Optimizer.zero_grad optimizer;
    let ys_ = T.Layer.forward linear xs in
    let loss = T.Tensor.((ys_ - ys) * (ys_ - ys)) in
    Printf.eprintf "--> %f\n" (loss |> T.Tensor.float_value);
    T.Tensor.backward loss;
    T.Optimizer.step optimizer
  done

When I ran this code, it kept printing the same, initial loss value.

Actually the code I am working on is more complex than this example, and it took several days to figure out what's happening, since I thought it was because I did something wrong with my domain-specific part of the code.

The problem was that I created optimizer BEFORE creating linear in the example code above. When creating an optimizer, the procedure registers trainable parameters stored in variable store vs, but the trainable parameters are registered at vs only when we create and initialize layers. So, what I should have done is create layers first (so that their trainable parameters are registered at vs first) and create an optimizer later to properly grab the parameters (corresponding to layers) already listed in vs.

The two lines ((1) and (2) in the example code) are seemingly (and syntactically) not dependent, but there is an important dependency between the two from the perspective of the whole optimization process inside the implementation. This is not an issue in Pytorch, since we must plug in parameters explicitly (e.g., torch.optim.Adam(model.parameters(), lr=learning_rate)) when we create an optimizer in Pytorch, thus no hidden dependency. I see that the current ocaml-torch document says nothing about this hidden dependency, and this may bother other users in the future too.

Yes, the easiest fix is to change the order of (1) and (2), and that is what I'm doing basically.
But it would help users if the document explicitly mentions this point.

Best,
Gwonsoo

cat_out unexpected result on views

I am not familiar with the libtorch C++ API, so maybe this is not unexpected:

I noticed that cat_out behaves unexpectedly when combined with views:

# let x = T.ones [3; 2];;
val x : T.t =
 1  1
 1  1
 1  1

# let y = T.narrow ~dim:1 ~start:0 ~length:1 x;;
val y : T.t =
 1
 1
 1

# T.(y.%.{[2;0]}<-2.);;
- : unit = ()

# y;;
- : T.t =
 1
 1
 2

# x;;
- : T.t =
 1  1
 1  1
 2  1

# T.zeros_out ~out:y ~size:(T.size y);;
- : T.t =
 0
 0
 0

x;;
- : T.t =
 0  1
 0  1
 0  1

So the view y and ones_out are working as expected. But then with cat_out:

# T.cat_out ~out:y ~dim:1 [T.ones ~scale:3. (T.size y)];;
- : T.t =
 3
 3
 3

# x;;
- : T.t =
 3  3
 3  1
 2  1

In fact, ones_out and zeros_out work as expected before cat_out, but cat_out changes its behavior:

# T.zeros_out ~out:y ~size:(T.size y);;
- : T.t =
 0
 0
 0

x;;
- : T.t =
 0  0
 0  1
 0  1

# T.ones_out ~out:y ~size:(T.size y);;
- : T.t =
 1
 1
 1

# x;;
- : T.t =
 1  1
 1  1
 0  1

So it seems like cat_out is modifying the view. The ~dim:1 in cat_out is not relevant; the same thing happens with 0.

Is this expected? At least to me it was not.

Full sequence of similar steps to copy into a REPL:

open Torch;;
module T = Tensor;;

let x = T.zeros [ 3; 2 ];;
let y = T.narrow ~dim:1 ~start:0 ~length:1 x;;
T.(y.%.{[2;0]}<-2.);;
T.ones_out ~out:y ~size:(T.size y);;
T.cat_out ~out:y ~dim:1 [T.ones ~scale:3. (T.size y)];;
T.ones_out ~out:y ~size:(T.size y);;
T.zeros_out ~out:y ~size:(T.size y);;

let y = T.narrow ~dim:1 ~start:0 ~length:1 x;;
T.zeros_out ~out:y ~size:(T.size y);;

Happy Holidays and thanks for releasing this! 🎄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.