GithubHelp home page GithubHelp logo

diffsharp / diffsharp Goto Github PK

View Code? Open in Web Editor NEW
573.0 40.0 67.0 165.64 MB

DiffSharp: Differentiable Functional Programming

Home Page: http://diffsharp.github.io

License: BSD 2-Clause "Simplified" License

F# 89.90% Dockerfile 0.01% Python 0.06% HTML 9.45% R 0.58%
machine-learning tensor dotnet autodiff gpu deep-learning neural-network

diffsharp's Introduction


Build Status Coverage Status

This is the development branch of DiffSharp 1.0.

NOTE: This branch is undergoing development. It has incomplete code, functionality, and design that are likely to change without notice; when using TorchSharp backend, only x64 platform is currently supported out of the box, see [DEVGUIDE.md] for more details.

DiffSharp is a tensor library with support for differentiable programming. It is designed for use in machine learning, probabilistic programming, optimization and other domains.

Key features

  • Nested and mixed-mode differentiation
  • Common optimizers, model elements, differentiable probability distributions
  • F# for robust functional programming
  • PyTorch familiar naming and idioms, efficient LibTorch CUDA/C++ tensors with GPU support
  • Linux, macOS, Windows supported
  • Use interactive notebooks in Jupyter and Visual Studio Code
  • 100% open source

Documentation

You can find the documentation here, including information on installation and getting started.

Release notes can be found here.

Communication

Please use GitHub issues to share bug reports, feature requests, installation issues, suggestions etc.

Contributing

We welcome all contributions.

  • Bug fixes: if you encounter a bug, please open an issue describing the bug. If you are planning to contribute a bug fix, please feel free to do so in a pull request.
  • New features: if you plan to contribute new features, please first open an issue to discuss the feature before creating a pull request.

The Team

DiffSharp is developed by Atılım Güneş Baydin, Don Syme and other contributors, having started as a project supervised by the automatic differentiation wizards Barak Pearlmutter and Jeffrey Siskind.

License

DiffSharp is licensed under the BSD 2-Clause "Simplified" License, which you can find in the LICENSE file in this repository.

diffsharp's People

Contributors

adelarsq avatar barak avatar cgravill avatar dsyme avatar gbaydin avatar jonsequitur avatar kevmal avatar migueldeicaza avatar mrakgr avatar nhirschey avatar oluwandabira avatar pkese avatar rwe avatar sir-deenicus avatar smoothdeveloper avatar soma-kurisu avatar sporring avatar visualmelon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffsharp's Issues

Tangents are not being propagated through the relu and sigmoid activations.

static member ReLU (a:D) =
    let inline ff(a) = max 0.f a
    let inline fd(a) = D.ReLU(a)
    let inline df(cp, ap, at) = (1.f + D.Sign(ap)) / 2.f
    let inline r(a) = ReLU_D(a)
    D.Op_D_D (a, ff, fd, df, r)

static member Sigmoid (a:D) =
    let inline ff(a) = 1.f / (1.f + exp -a)
    let inline fd(a) = D.Sigmoid(a)
    let inline df(cp:D, ap, at) = cp * (1.f - cp)
    let inline r(a) = Sigmoid_D(a)
    D.Op_D_D (a, ff, fd, df, r)

In the df functions there should be an at in there somewhere. The DV and DM versions are similarly affected.

Not threadsafe with sufficient load

Running 0.8.4 I get exceptions trying to run too much in different threads. This does not affect 0.7.7.

open DiffSharp.AD.Float64;;
let sphere (xs:DV) = xs * xs;;
Array.Parallel.map (DV >> grad sphere) (Array.replicate 10000 (Array.replicate 3 1.));;
System.AggregateException: One or more errors occurred. ---> System.InvalidCastException: Unable to cast object of type 'D' to type 'DV'.
   at Microsoft.FSharp.Core.LanguagePrimitives.IntrinsicFunctions.UnboxGeneric[T](Object source) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\prim-types.fs:line 598
   at DiffSharp.AD.Float64.DOps.adjoint[T](Adjoints adjoints, T d)
   at Microsoft.FSharp.Collections.ArrayModule.Parallel.Map@1317-3.Invoke(Int32 i) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\array.fs:line 1318
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, Action`1 body)
   at Microsoft.FSharp.Collections.ArrayModule.Parallel.Map[T,TResult](FSharpFunc`2 mapping, T[] array) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\array.fs:line 1319
   at <StartupCode$FSI_0080>.$FSI_0080.main@()
---> (Inner Exception #0) System.InvalidCastException: Unable to cast object of type 'D' to type 'DV'.
   at Microsoft.FSharp.Core.LanguagePrimitives.IntrinsicFunctions.UnboxGeneric[T](Object source) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\prim-types.fs:line 598
   at DiffSharp.AD.Float64.DOps.adjoint[T](Adjoints adjoints, T d)
   at Microsoft.FSharp.Collections.ArrayModule.Parallel.Map@1317-3.Invoke(Int32 i) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\array.fs:line 1318
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

My real scenario has a different stack trace, and reliably occurs with only 2 threads, but I imagine it has a similar cause:

The given key was not present in the dictionary.
   at System.ThrowHelper.ThrowKeyNotFoundException()
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at DiffSharp.AD.Float64.Adjoints.ApplyDelta(Int32 uniq, Delta x)
   at DiffSharp.AD.Float64.DOps.pushRec@3592-8(Adjoints adjoints, Dictionary`2 fanouts, FSharpList`1 ds)
<my gradhessian' call>

reversePush results in Stack overflow for larger than small networks

Since reversePush is not tail recursive, it results in a stack overflow for even simple networks when you push in a lot of data/use a bunch of memory. Here's the code I used to produce this (backprop is modified to allow a linear output layer and different activation functions but that's not relevant to the issue).

   let trainLine = [|
      for i in 0.0..0.01..18. -> 
            i, cos i|] 

   let testLine = [|
      for i in 0.0..0.01..24. -> 
            i, cos i|] 

   let train = trainLine |> Array.map (fun (x,y) -> vector [D x] , vector [D y])

   let net2 = createNetwork [|1;3; 1|]
   let train2 = backprop true sigmoid net2 1e-3 0.005 10000 train 

   let errs = train2 |> Seq.takeOrMax 600 |> Seq.toArray

Missing some operations on vectors/matrices C#

Hello,

I am unable to find the following operations on vectors and matrices when coding in C#:

let v = v1 &* v2 // Vector outer (dyadic, tensor) product
let v = v1 .* v2 // Element-wise (Hadamard) product
let v = v1 ./ v2 // Element-wise division
let v = v1 ** v2 // Element-wise exponentiation

And:

let m = m1 .* m2 // Element-wise (Hadamard) product
let m = m1 ./ m2 // Element-wise division
let m = m1 ** m2 // Element-wise exponentiation

Thank you.

Enhancement : Way to use it in FSI

When trying to use DiffSharp in FSI (FSharp interactive) I always get : Unable to load DLL 'libopenblas':
But when I run it, everything is fine.

FSI is running in x64 with debugging enabled.

New release?

There hasn't been a release of https://www.nuget.org/packages/DiffSharp/ since 2015 and lots of useful changes have been made since. Are there any plans for a release? Blockers you'd like help with?

I was able to build from source so I can use DiffSharp with .NET Core but I'd really like to share the work with others who are unlikely to build manually.

separate treatment for [Adj * float -> Adj] operations

Hi, just a quick remark: it seems that (in some applications) you could save quite a bit by having separate AddFloat / SubFloat / MulFloat / DivFloat variants in Reverse.fs, to treat all four [Adj * float -> Adj] operations. Indeed the adjoint for those float constants is never going to be used; I understand those floats are currently being first converted into Adj and then passed in the Add/Sub/Mul/Div constructs, so their adjoints will actually get computed in the reverse sweep. [I haven't thoroughly read your code so I could well be missing something...]

conv2d overloads ambiguous because of `stride` parameter

These overloads are ambiguous unless an explicit stride is given:

    static member conv2d(a:Tensor, b:Tensor, ?stride:seq<int>, ?padding:seq<int>, ?dilation:seq<int>) = a.conv2d(b, ?stride=stride, ?padding=padding, ?dilation=dilation)
    static member conv2d(b:Tensor, ?stride:seq<int>, ?padding:seq<int>, ?dilation:seq<int>) = fun (a:Tensor) -> a.conv2d(b, ?stride=stride, ?padding=padding, ?dilation=dilation)
    static member conv2d(a:Tensor, b:Tensor, ?stride:int, ?padding:int, ?dilation:int) = a.conv2d(b, ?stride=stride, ?padding=padding, ?dilation=dilation)
    static member conv2d(b:Tensor, ?stride:int, ?padding:int, ?dilation:int) = fun (a:Tensor) -> a.conv2d(b, ?stride=stride, ?padding=padding, ?dilation=dilation)

e.g.

            let x = dsharp.ones([1;1;4;4])
            let y = dsharp.ones([1;1;4;4])
            let z = dsharp.conv2d(x,y)

gives

Severity	Code	Description	Project	File	Line	Suppression State
Error	FS0041	A unique overload for method 'conv2d' could not be determined based on type information prior to this program point. A type annotation may be needed. Candidates: static member DiffSharp.conv2d : a:Tensor * b:Tensor * ?stride:int * ?padding:int * ?dilation:int -> Tensor, static member DiffSharp.conv2d : a:Tensor * b:Tensor * ?stride:seq<int> * ?padding:seq<int> * ?dilation:seq<int> -> Tensor	DiffSharp.Tests	C:\GitHub\dsyme\DiffSharp\src\DiffSharp.Tests\TestTensor.fs	1957	Active

Support the dMatrix type for the Cuda backend

Speed tests on the experimental Cuda backend show that when the functions use the native array parameters such as: let saxpy(alpha:float32, x:float32[], y:float32[]) ... they are 14x slower. The cost of constantly transferring the data back and forth from GPU to host is prohibitive. For the Cuda backend to give any speedup the functions have to use the native type such as: let saxpy(alpha:float32, x:dMatrix, y:dMatrix) ... where the dMatrix type is defined as such:

type dMatrix(num_rows:int,num_cols,dArray: CudaDeviceVariable<float32>) = 
    new(num_rows: int,num_cols) =
        let q = (num_rows*num_cols) |> SizeT
        let t = new CudaDeviceVariable<float32>(q)
        new dMatrix(num_rows,num_cols,t)

    member t.num_rows = num_rows
    member t.num_cols = num_cols
    member t.dArray = dArray

    interface IDisposable with
        member t.Dispose() = dArray.Dispose()

This would probably entail a significant change to the library.

Later on, for convolutional nets an extended class will also be required as the cuDNN library takes 4D parameters. I intend to leave this as a long standing request for when I finish the the backend.

Edit: (12/22/2015) The first (non working) version of this can be seen at my branch. The backend is done while the frontend remains in a half finished state.

Edit2: (12/26/2015) I am now sure that the reason the frontend is not functioning correctly is related to the finalizer. This is not something that is actually fixable. In my own code, I've discovered an unbelievable bug that made me decide to not use them at all.

I did not want to believe it when I first asked on Stack Overflow, but until Cuda programming considerably matures, either users will have to manually manage memory or the library is going to have to implement GC on its own.

The main loop gives Nans when in a computation expression

In the sequence recall program below, when I uncomment the '\let train =' and the stuff at the end, I get Nan numbers after around 2000-3000 iterations. When I leave that as it is, it optimizes just fine through the whole 10k iterations.

This looks like a bug to me, as there is nothing to indicate to me that the results should differ from the two runs.

Also, although not related to this issue, one thing that sticks out to me is the lack of map for the DM matrices. It would really help in implementing various activation functions not to mention the clipping function for the final sigmoid layer. Would that be something that is difficult to add to the AD library?


#I @"C:\Users\Marko\Documents\Visual Studio 2015\Projects\Automatic Differentiation\packages\DiffSharp.0.7.4\lib\net46"
#r @"DiffSharp.dll"

#I @"C:\Users\Marko\Documents\Visual Studio 2015\Projects\Automatic Differentiation\packages\FSharp.Quotations.Evaluator.1.0.6\lib\net40"
#r @"FSharp.Quotations.Evaluator.dll"

//#I @"C:\Users\Marko\Documents\Visual Studio 2015\Projects\Automatic Differentiation\packages\FSharp.Charting.0.90.13\lib\net40"
//#r "FSharp.Charting.dll" 
//#r @"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.6\System.Windows.Forms.DataVisualization.dll"

//open FSharp.Charting

open DiffSharp.AD.Float32
open DiffSharp.Util

open System.IO

let rng = System.Random()

// A layer of neurons
type Layer =
    {mutable W:DM  // Input weight matrix
     mutable U:DM  // Recurrent weight matrix
     mutable b:DV  // Bias vector
     a:DM->DM}     // Activation function

let createRandomLayer hidden_size input_size act =
    {
    W = DM.init hidden_size input_size (fun _ _ -> (rng.NextDouble()-0.5) / sqrt(float hidden_size) |> float32)
    U = DM.init hidden_size input_size (fun _ _ -> (rng.NextDouble()-0.5) / sqrt(float hidden_size) |> float32)
    b = DV.init hidden_size (fun _ -> (rng.NextDouble()-0.5) / sqrt(float hidden_size) |> float32)
    a = act
    }

// A feedforward network of neuron layers
type Network =
    {layers:Layer[]} // The layers forming this network

// For the section with no previous hidden state.
let runLayerNoH (x:DM) (l:Layer) =
    l.W * x + l.b |> l.a

// For the section with no input
let runLayerNoI (y:DM) (l:Layer) =
    l.U * y + l.b |> l.a

// For the section with previous hidden state
let runLayer (x:DM) (y:DM) (l:Layer) =
    l.W * x + l.U * y + l.b |> l.a

// To me these two problems look roughly similar but to the network they are worlds apart it seems.
let sequence_recall_data batch_size seq_length =
    [|
    for k = 1 to batch_size do
        let t = [|for i=1 to 7 do yield if rng.NextDouble() > 0.5 then 1.0f else 0.0f|]
        yield t
        for i=2 to seq_length-1 do
            let t = [|for i=1 to 7 do yield if rng.NextDouble() > 0.5 then 1.0f else 0.0f|]
            yield t
        yield t |]

let target_length = 3
let batch_size = 50
let training_data = sequence_recall_data batch_size target_length
let training_data_transposed =
    [|
    for i=0 to target_length-1 do
        let t = 
            [|
            for k=0 to batch_size-1 do
                let ind = k*target_length+i
                yield training_data.[ind] |] |> Array.map Array.toSeq |> Array.toSeq |> toDM
        yield t |]

let hidden_size = 10
let input_size = 7
let l1 = createRandomLayer hidden_size input_size DM.Tanh
let l2 = createRandomLayer input_size hidden_size DM.Sigmoid

let layers = [|l1;l2|]

let learning_rate = 0.1f / float32 batch_size


//let train =
    //[|
for i=1 to 10000 do
    let tag = DiffSharp.Util.GlobalTagger.Next
    for l in layers do
        l.W <- l.W |> makeReverse tag
        l.U <- l.U |> makeReverse tag
        l.b <- l.b |> makeReverse tag

    let a1 = runLayerNoH training_data_transposed.[0] l1
    let a2 = runLayer training_data_transposed.[1] a1 l1
    let a3 = runLayerNoI a2 l1
    let b3 = runLayerNoH a3 l2
    //let cost = -(training_data_transposed.[2] * log b3 + (1.0f-training_data_transposed.[2]) * log (1.0f-b3)) |> DM.Sum // Does not work. Probably because I have not clipped the outputs.
    let cost = b3 .* b3 |> DM.sum

    cost |> reverseProp (D 1.0f)

    for l in layers do
        l.W <- l.W.P - learning_rate*l.W.A
        l.U <- l.U.P - learning_rate*l.U.A
        l.b <- l.b.P - learning_rate*l.b.A

    let t = float32 cost

    printfn "The cost at iteration %i is %f" i t
        //yield 0.0f |]

//(Chart.Line train).ShowChart()

Symbolic differentiation is failed at specific value

Hi,

Symbolic differentiation of power operator (**) is failed at 0.0 .

#I "../packages"
#r "DiffSharp/lib/net46/DiffSharp.dll"
#r "FSharp.Quotations.Evaluator/lib/net40/FSharp.Quotations.Evaluator.dll"

open DiffSharp.Symbolic.Float64

[<ReflectedDefinition>]
module m =
  let y x = x**2.0

let dy = diff <@ m.y @>

// following statement will return nan (0.0 is expected)
dy 0.0
// val it : float = nan

// but this is OK
dy 1.0
// val it : float = 2.0

Check and fix dilations in libtorch backend

Dilation code in the libtorch backend seems incomplete for some cases. We need to check the behavior and fix it and ensure all tests are passing without excluding the libtorch backend. scatter can be helpful to reimplement dilations.

See here: #106 (comment)

Forward on forward mode question

let inline r x y = (x*x)*(y*y)

let a = D 3.0f |> makeForward 0u (D 1.0f)
let a' = a |> makeForward 1u (D 1.0f)

let b = D 2.0f |> makeForward 0u (D 0.0f)
let b' = b |> makeForward 1u (D 0.0f)

let q = r a' b'

I am wondering whether it would be possible to take second derivatives using forward on forward. I already know how to do it using reverse over forward, but I've read that forward on forward should be possible as well.

Notes on BERT on DiffSharp

@moloneymb has ported BERT to work over Tensorflow.NET, https://github.com/moloneymb/BERTInFSharp/. Just a sample and not for production.

We're doing an investigation of what it would mean to do this over DiffSharp instead of TF.NET

This is a discussion thread about what additions/changes/updates/... would be needed to DiffSharp 1.0 (dev branch) to allow this.

Unexpected behaviour in C# when negation operator is used in the very first term

Running the following C# code:

var fgrad = AD.Grad(v => -v[0] - v[1] * 2.0);
var fgradValue = fgrad(new DV(new double[] {1, 1}));

gives the unexpected result (0, -2.0) [first value should be -1.0]
The result is the same when the first line is modified to:

var fgrad = AD.Grad(v => -v[0] * 5.0 - v[1] * 2.0); // still gives (0, -2.0) at (1, 1)

The correct result is given when the code is changed to:

var fgrad = AD.Grad(v => -1.0 * v[0] - v[1] * 2.0); // and
var fgrad = AD.Grad(v => -5.0 * v[0] - v[1] * 2.0);

DiffSharp version: 0.7.7.0

Support more target frameworks in DiffSharp.fsproj

Right now only the netstandard2.0 target framework is supported by the DiffSharp project file. This causes build failures in libraries that use DiffSharp and must support net45 and older .NET frameworks and toolchains. Both NET Standard 2.0 and other targets can be supported by using this line in DiffSharp.fsroj

<TargetFrameworks>netstandard2.0;netcoreapp2.0;net45</TargetFrameworks>

I built DiffSharp successfully for these targets without any changes to the source code and the tooling will take care of sorting out the dependencies and packaging the libraries correctly for NuGet.

Design and implement a C#-friendly API for non-F# consumers

This can be added to the dev branch once it's fairly stable. It should not influence the base DiffSharp API, but can be implemented as a lightweight extra layer with C#-friendly design. We can implement a unit testing project in C# to test this API.

Discussion: sparse tensors

Torch has support for sparse tensors, I don't think it will be hard to surface it up to DiffSharp (once we successfully codegen the necessary TorchSharp API, which @moloneymb is working on).

Do we want this mapped up, and what priority is this?

duplicate implementation of AD.Float32 and AD.Float64

I was looking at implementation files AD.Float32.fs and AD.Float64.fs and was thinking I could contribute a refactoring so the code is 100% identical but a dozen of lines at the top of each file.

It would basically consist of defining type alias and few literals. I think it would make extending and maintaining the library easier than it is now.

Does that sound like a good idea?

kmeans examples of diffsharp

kmeans is an unsupervised algrorithmn,but the example used on sgd used label of data which is y in function below,can this represent the sgd is suitable for kmeans?
let sgd f w0 (eta:D) epsilon (t:(DV*DV)[]) =
let rec desc w =
let x, y = t.[rnd.Next(t.Length)]
let g = grad (fun wi -> DV.l2norm (y - (f wi x))) w
if DV.l2norm g < epsilon then w else desc (w - eta * g)
desc w0
sorry i am very curious about this,could anyone explain this?thank you.
The example i saw was on http://diffsharp.github.io/DiffSharp/
@gbaydin

OpenBLAS x86 also works with DiffSharp

I have tried using native providers' x86 builds and used "Any CPU" as build configuration. It builds and runs.

I have testes vector and matrix multiplication, transpose etc.
gradient and hessian also works.

I obtained x86 builds of native providers from MathNet distribution. They have both x64 and x86 providers.

Oddity in diff type inference

I was a bit surprised by the following:

#r @"..\packages\DiffSharp.0.5.7\lib\DiffSharp.dll"
open DiffSharp.AD.Forward

let f x = sqrt x
let f' = diff f 

Why is f' is of type val f' : (int -> float) and not val f' : (float-> float) ? Is this a bug, or is there a way to give hints to obtain the proper signature?
Cheers,
Mathias

Implement DV for non-AD modules?

I'm porting an autodiff benchmark to DiffSharp 0.7, see at https://github.com/awf/ADBench/blob/master/tools/DiffSharp/ba.fs#L80

I'm using the "DV" vectors, which is great, and cleans up the code, but I would like to switch between AD, Numerical, and Symbolic DiffSharp without rewriting the code. It looks as if "DV" is implemented only in AD.Float{32,64}. I can happily port it to Numerical.*, but that becomes a lot of code duplication -- is that the way we should do it?

No, of course I could try to write generic code, but that worsens my error messages, and I need a linear algebra library anyway, so I may as well use this particular set of method names.

DiffSharp 0.7: Hard crash inverting large nan matrices

open DiffSharp.AD.Float64;;

This is nice:

Array.replicate 49 (D nan) |> DM.ofArray 7 |> DM.inverse;;
val it : DM = DM [[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan]]

This is odd:

Array.replicate 64 (D nan) |> DM.ofArray 8 |> DM.inverse;;
val it : DM =
DM
[[nan; 6.952457011e-310; 3.952525167e-323; 4.243991586e-314;
8.487983165e-314; 1.273197475e-313; 1.697596633e-313; 0.0]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]
[nan; nan; nan; nan; nan; nan; nan; nan]]

This is distressing:

Array.replicate 81 (D nan) |> DM.ofArray 9 |> DM.inverse;;
PS C:\Windows\System32>

Faulting application name: fsiAnyCpu.exe, version: 10.800.20.18106, time stamp: 0xd0e394f5
Faulting module name: clr.dll, version: 4.8.4150.0, time stamp: 0x5e176fc2
Exception code: 0xc0000005
Fault offset: 0x00000000000810c3
Faulting process id: 0x56ec
Faulting application start time: 0x01d622d92cbff942
Faulting application path: C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\Common7\IDE\CommonExtensions\Microsoft\FSharp\fsiAnyCpu.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
Report Id: 5bbbc2e5-8eed-4109-8842-b2f3ee2daef5
Faulting package full name:
Faulting package-relative application ID:

Application: fsiAnyCpu.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFBD63110C3 (00007FFBD6290000) with exit code 80131506.

This affects both the current stable and beta versions - though the message refers to libopenblas.dll in the stable version.

ForwardG module errors on function that works with Forward module

Great work so far with this library. Hopefully this can replace some of the by hand derivations that I've had to do in the past on optimization problems. I was hoping I could try to use this library to replace some onerous gradient & hessian formulas. I was able to reproduce the gradient calculations I had previously did by hand with the Forward module, but was unable to use the ForwardGH module to calculate the hessian. I thought that I would then try to use the ForwardG module to see if I could once again reproduce the result I got using the Forward module, but I got a System.IndexOutOfRangeException: Index was outside the bounds of the array. I was unable to trace the error due to the numerous inline functions, but thought I would ask to see if there was somehow I was misusing the library.

Thanks,

Dave

The following code works:

#r "../packages/DiffSharp.0.5.2/lib/DiffSharp.dll"

open DiffSharp.AD.Forward

let g = 
    grad (fun theta ->
        let n = (Array.length theta - 1) / 2
        let weights = theta.[0 .. n-1]
        let lambdas = theta.[n .. 2*n-1]
        let parameters = Array.zip weights lambdas
        let kappa = theta.[2*n]
        let bins = [| 0., 2500., 58. ; 2500., 7500., 61. |]
        let binValue (low,high,count) = 
            count * log(parameters |> Array.sumBy(fun (w,l) -> w * (exp(-l*low) - exp(-l*high))))
        -(bins |> Array.sumBy binValue) - kappa * (1. - Array.sum weights))     

let test =  g [| 0.8; 0.2; 1./10000.; 1./100000.; 336. |]

// returns test = [|192.0666263; 316.7334947; -802447.6527; -372023.8047; 0.0|]

Switching the opened namespace from DiffSharp.AD.Forward to DiffSharpAD.ForwardG throws an error.

Bugs in reverse mode for matrix division-by-constant

There is something wrong with these two rules in reverse mode AD for M / v (matrix-division-by-constant)

                            | Div_DM_D(a, b) -> pushRec ((bx (dA / b.P) a) :: (bx (dA * (-a.P / (b.P * b.P))) b) :: t)
                            ...
                            | Div_DMCons_D(cons, b) -> pushRec ((bx (dA * (-cons / (b.P * b.P))) b) :: t)

The problem is that in both cases b is a scalar, but the adjoint being pushed for b is a matrix, in each case (dA * (-a.P / (b.P * b.P))) and (dA * (-cons / (b.P * b.P))).

The case seems to have been copied from the case for vectors, where b.P * b.P is the inner dot-product. But for matrices this is the matrix product.

I believe the correct fix is to replace the above expressions by (DM.Sum (dA .* (-a.P / b.P * b.P))) and (DM.Sum (dA .* (-cons / (b.P * b.P))) respectively.

Discussion: immutable/mutable tensors

Currently tensors are immutable, though may share the same underlying data.

We do have some localized mutation through .reverseDiff() that creates a new tensor with a mutable register chain to receive adjoints from a later matching .reverse(). However I'll ignore that as it's "nice" mutation, because it is localised in the sense that (1) if you don't use these constructs, everything is immutable, and (2) if you do use them according to the rules then they won't interfere.

Now, I'm sure that at some point we will need to introduce localized mutation on tensors. Indeed we already use localized mutation internally in Tensor.fs, but again it is local, not global.

I propose we do it somthing like this?

  1. tensors are by default immutable, and we check for this and raise exceptions if you try to mutate
  2. if you do tensor.mutableLike() then you get a mutable tensor - still type Tensor but mutation operations no longer throw exceptions. A isMutable flag can also be passed into zerosLike and friends to get a tensor that starts off mutable.
  3. a tensor.unsafeImmutable() gets you back to an immutable tensor, sharing the same underlying data

Like reverse this API is safe if you play by the rules.

I also propose that we eventually somehow keep a flag to report when we know that a tensor has a zero value, basically tensor.isKnownZero, likewise tensor.isKnownOne. I expect these will be just too useful for peep-hole optimization purposes not to propogate them, and they can help us simplify how we write extensions (see this. But we can only trust these flags if tensors are immutable....

Add boundary checking.

Literally every professionally done numerical library has this in some form. It caused me some hilarious mistakes when I first discovered DiffSharp and not having them for 'performance' is not a valid reason in my mind. This is one area where one should not skimp out on.

DiffSharp should definitely have had this from day one.

Edit: It also bears mentioning that in the Cuda version, as all the kernel launches are asynchronous, the CPU's job is literally to act as an accountant while it waits for the GPU to finish. It is no issue at all from the performance standpoint.

Implement generic samplers

Implement some generic samplers (MCMC, Hamiltonian MC, etc.) that sample from a given logprob:Tensor->Tensor

Failure in tupled tensor creation

Tupled tensor creation fails if the leaves are size-1 lists, e.g.

    let v = Tensor.Create [[2.], [3.], [4.]]

I'll send the fix shortly

conv1d and conv2d should support further batching

It looks like conv1d and conv2d expect only one dimension of batching (NxCxI to conv1d). In torch you can do additional dimensions of batching (N1xN2xCxI to conv1d), compare:

import torch
import torch.nn.functional as F
x = torch.zeros(3,1,4,4, dtype=torch.float32)
y = torch.zeros(3,1,4,4, dtype=torch.float64);
F.conv1d(x,y)

and

let x = dsharp.zeros([3;1;4;4])
let y = dsharp.zeros([3;1;4;4])
dsharp.conv1d(x,y)

gives

System.Exception: Expecting two 3d Tensors t1, t2 where t1 is input (NxCxI: batchSize x inputChannels x inputLength) and t2 is filters (KxCxF: outputChannels x inputChannels x kernelLength), received Tensors with shapes [|3; 1; 4; 4|], [|3; 1; 4; 4|]

Bug in Backend.OpenBLAS.fs

Hallo,

At first congratulations to this library, but I've found two little bugs in module LAPACK in function ssysv and dsysv, which make these functions useless.
You must correct line 487 from: use arg_b = new PinnedArray(b)
to use arg_b = new PinnedArray(b')
and similar (variable b -> b') in line 580.

Enrico

Bug in reverse AD for matrix determinant operation

When doing

    let fmsD (x:DM) = x * (log (x / 2.)) |> DM.det

    let xm = DM (Array2D.init 10 10 (fun _ _ -> rnd.NextDouble()))
    computeAdjoints (fmsD (makeReverse 100u xmD))

We get an exception during reverse AD

Unhandled Exception: System.Exception: Cannot get tangent value of DR.
   at DiffSharp.AD.Float64.D.get_T() in C:\GitHub\dsyme\DiffSharp\src\DiffSharp\AD.Float64.fs:line 111

Basically the use of d.T (the forward tangent) is failing in this code (note the Check this comment)

                        | Det_DM(a) -> pushRec ((bx (d.T * d.P * DM.Transpose(DM.Inverse(a))) a) :: t) // Check this

We should not be using the tangent of d here

proposal for user choice of packages

Follow on from #106

After discussing with @gbaydin today my proposal is that we remove the DiffSharp pacakge and have these three "top-level-entry-point" packages:

  • package DiffSharp-torch-cpu
    (references DiffSharp.Core, DiffSharp.Backends.Torch, libtorch-cpu)

  • package DiffSharp-torch-cuda-10.2-win-x64
    (references DiffSharp.Core, DiffSharp.Backends.Torch, libtorch-cuda-10.2 for windows)

  • package DiffSharp-torch-cuda-10.2-linux-x64
    (references DiffSharp.Core, DiffSharp.Backends.Torch, libtorch-cuda-10.2 for linux)

If we have other backends that are high quality and fully feature-complete and tested we can add those too.

With this, the user on a Jupyter notebook simply references one of these and a version number, e.g.

#r "nuget: DiffSharp-torch-cpu, 1.0.3"

That's a nice and simple choice isn't it?

Note we really have to split windows/linux Cuda packages here because the downloads are v. large.

Note I haven't listed an accumulation package for the reference backend. I think people wanting to use the reference backend are likely to be building new DiffSharp library components (e.g. derived algorithms) or extensions (new backends etc.) and should reference the individual packages they need. For example they reference DiffSharp.Core in their library project and DIffSharp.Backend.Reference in their test project (and a few more packages if they want to test on Torch). That actually gives a really nice, light experience for creating a library project that adds some stuff to DiffSharp - no need to download anything Torch related if you're happy to trust the reference backend for testing.

Aside : With this, the user gets whatever is the latest version of Torch the TorchSharp we depend on happens to work on. In some far future I could see people wanting to vary the Torch between two or three recent versions, so you can imagine this

  • package DiffSharp-torch-1.5-cpu
  • package DiffSharp-torch-1.6-cpu
  • package DiffSharp-torch-1.5-cuda-10.2-linux-x64
  • package DiffSharp-torch-1.6-cuda-10.3-linux-x64

etc. It's an interesting question which of these the user is most likely to want to vary on. (and will they want to use nightly or modified or patched versions of Torch? That would be hard...)

Roadmap to DiffSharp 1.0

This is a catalog issue to track what needs doing for DiffSharp 1.0, based on 1:1 discussions with @gbaydin. There will be a long list of other things, we'll extend this as necessary

  • Typed Backend.None CPU tensors (draft)

  • Add keep_dims on Mean (done)

  • Fix CompareTo in RawTensorFloat32CPU.fs (done)

  • Broadcasting. Full pytorch-style broadcasting for Add (see TODO here). The design principle expected here is "we should do the same thing as PyTorch". Similarly full pytorch-style broadcasting for Mul and other operations. (done)

  • Convolutions (@gbaydin)

  • Transposed convolutions

  • Batchnorm

  • Dropout

  • Switch to Python-style casing on all operations to align with SciSharp

  • Remove excess overloads and use optional arguments instead

  • Finalize API dsharp.abc

  • libtorch and cuda backends

  • Add Reshape (similar code to View)

  • Add OneHot

  • Differentiation API

  • Optimizers

  • Fix possible memory leak on Linux

  • Tensor save/load

  • General transpose

  • probability distributions

  • Generalization and batching of Transpose.

  • Zero-size tensors #150

  • Docs tooling #134

  • Docs #167

  • PyTorch Half support

  • Batching conv1d/2d/3d/... #98

  • Batching for MatMul. Currently no batching is supported, only 2D x 2D done

  • Einstein summation #92

  • norm #93

  • matrix inverse

Things out of scope for 1.0

  • Strided views

  • Quantized

  • Complex

  • Sparse

potential overwriting of [xa]

In Reverse.ReverseOps.grad', is there anything that prevents the user from modifying some elements of [xa] within the call to [f xa]? (I come from OCaml, and I would guess that with the current formulation, elements of [xa] could be modified, which would then lead to [Array.map adjoint xa] reading the wrong adjoint values; but perhaps there's something specific to F# I'm missing here)

dev branch - model definition notes

The model definition in Tests.fs could be like this:

let MakeModel ps f =
    let model = 
        { new  Model() with
            override l.Forward(x) = f x 
          }
    model.AddParameters(ps)
    model

let FeedforwardNet(p2: int) =
    let fc1 = Linear(2, p2)
    let fc2 = Linear(p2, 1)
    MakeModel ["fc1", fc1; "fc2", fc2] 
        (fc1.Forward >> 
         Tensor.LeakyRelu >> 
         fc2.Forward >> 
         Tensor.LeakyRelu)

let Compose (model1: Model) (model2: Model) =
    MakeModel [ for KeyValue(k,v) in model1.Parameters do yield (k,v)
                for KeyValue(k,v) in model2.Parameters do yield (k,v) ]
        (model1.Forward >> model2.Forward)
        

Test issue: Assert.AreEqual

Just to mention that in all our calls Assert.AreEqual(t0, t0Correct) these are the wrong way around - the expected should come first for Assert.AreEqual

This affects the error messages when tests fail.

In any case I'd like to change this to a helper assertEqual: T * T -> unit that enforces that the types of the two things are the same. The Assert.AreEqual(obj,obj) overload can hide obscure errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.