juliaoptimaltransport / optimaltransport.jl Goto Github PK

Optimal transport algorithms for Julia

Home Page: https://juliaoptimaltransport.github.io/OptimalTransport.jl/dev

License: MIT License

Julia 100.00%

julia optimal-transport wasserstein-distance sinkhorn-algorithm wasserstein optimal-transport-algorithms optimization scaling-algorithms

optimaltransport.jl's People

Contributors

Stargazers

Watchers

Forkers

devmotion timmats davibarreira patrikgerber zsteve jumpdiffusion keithjlee jacobadenbaum

optimaltransport.jl's Issues

Contribution Workflow

Hey guys, it's been a while. Thankfully I'm done with my phd qualification, and I'm starting to work again on some Optimal Transport stuff.
I just submitted a small PR, and I realized I forgot some of the workflow for contributing (like, the command to adjust the style and documentation building). I was thinking that perhaps it would be nice to have a small section in the readme (or the docs) with "workflow" to contribute. Like, "remember to run this, and run that etc". What do you think?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Question about optimal transport and possibilities of OptimalTransport.jl

Sorry if this is not the right place to ask this kind of questions.

I am looking for an algorithm that has the following characteristics (see below).
Is there any functionality in OptimalTransport.jl that could be used for this?

Many thanks!

INPUT:

histogram H,
desired mean and variance (M,V)

OUTPUT:

histogram H' that has desired mean and variance (M,V) but that also has a shape that is obtained by adapting H, while somehow minimizing some cost that penalizes large modifications w.r.t. H.

(I can of course give more information about the problem is the description is not very clear to you).

ForwardDiff errors on differentiating through the output of `sinkhorn2`

I'm currently trying to autodiff through sinkhorn2 via Optim.jl, but I'm running into the following error:

julia> opt_primal = optimize(u -> f_primal(softmax(u), ε, K, interp_frac), zeros(size(μ0)), LBFGS(), Optim.Options(store_trace = true, 
show_trace = true, iterations = 250); autodiff = :forward)
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  ...               
Stacktrace:    
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11})
    @ Base ./number.jl:7
  [2] setindex!
    @ ./array.jl:841 [inlined]
  [3] setindex!
    @ ./multidimensional.jl:639 [inlined]
  [4] macro expansion
    @ ./broadcast.jl:984 [inlined]
  [5] macro expansion
    @ ./simdloop.jl:77 [inlined]
  [6] copyto!
    @ ./broadcast.jl:983 [inlined]
  [7] copyto!
    @ ./broadcast.jl:936 [inlined]
  [8] materialize!
    @ ./broadcast.jl:894 [inlined]
  [9] materialize!
    @ ./broadcast.jl:891 [inlined]
 [10] sinkhorn_gibbs(μ::Vector{Float64}, ν::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11}}, K::Matrix{Float
64}; tol::Nothing, atol::Nothing, rtol::Nothing, check_marginal_step::Nothing, check_convergence::Nothing, maxiter::Int64)
    @ OptimalTransport ~/OptimalTransport.jl/src/OptimalTransport.jl:194
 [11] sinkhorn_gibbs
    @ ~/OptimalTransport.jl/src/OptimalTransport.jl:161 [inlined]
 [12] sinkhorn(μ::Vector{Float64}, ν::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11}}, C::Matrix{Float64}, ε
::Float64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OptimalTransport ~/OptimalTransport.jl/src/OptimalTransport.jl:262
 [13] sinkhorn
    @ ~/OptimalTransport.jl/src/OptimalTransport.jl:259 [inlined]
 [14] sinkhorn2(μ::Vector{Float64}, ν::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11}}, C::Matrix{Float64}, 
ε::Float64; regularization::Bool, plan::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OptimalTransport ~/OptimalTransport.jl/src/OptimalTransport.jl:286
 [15] #ot_smooth_primal#1
    @ ./REPL[18]:1 [inlined]
 [16] f_primal(μ::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11}}, ε::Float64, K::Matrix{Float64}, f::Float6
4)

The relevant code appears to be OptimalTransport.jl:194:

187     norm_μ = sum(abs, μ) # for convergence check
188     isconverged = false
189     check_step = check_convergence === nothing ? 10 : check_convergence
190     for iter in 0:maxiter
191         if iter % check_step == 0
192             # check source marginal
193             # do not overwrite `tmp1` but reuse it for computing `u` if not converged
194             @. tmp2 = u * tmp1                                                                                                     
195             norm_uKv = sum(abs, tmp2)
196             @. tmp2 = μ - tmp2
197             norm_diff = sum(abs, tmp2)

I'm not sure why 194 causes this issue. I tried looking at the types of u, tmp1 and tmp2:
[ Info: (Matrix{Float64}, Matrix{ForwardDiff.Dual{ForwardDiff.Tag{var"#4#5", Float64}, Float64, 11}}, Matrix{Float64})

So it appears that the in-place assignment here is resulting in a type incompatibility. I suppose this could be mitigated by avoiding explicit in place assignments?

wasserstein calculate

Hello,how can I calculate wasserstein distance if I only get two histogram.
I notice DiscreteNonParametric however it needs true probability vector whicih needs sum to 1
how can I calculate the distance instead the constraint?

Optimal transport between dataset and discrete bivariate distribution.

Hey,

I have a bivariate dataset and a bivariate distribution defined as :

using Distributions
n=100
data = randn((n,2))
MarginDist(data,i) = DiscreteNonParametric(data[:,i],ones(size(data,1))/size(data,1))
D = product_distribution(MarginDist(data,2),MarginDist(data,1))

How should I go for computing (or at least approximating) the Wasserstein distance (cost = square Euclidean norm) between the dataset data and the distribution D ? Note that the marginals are exchanged (so that, when the distance is minimized, they match each other) and that the dependence structure of D is the independence, all this is on purpose.

Add Geometric Optimal Transport

Is it possible to add Geometric Optimal Transport, which was developed by Lei Na, Gu Xianfeng, and Shingtung Yau (邱成桐), into this library?
It has application in Geometric Deep Learning

Transferring repo to an Github organisation?

Hi @davibarreira and @devmotion ,

I notice there's been quite some activity on the repo lately, and I'm delighted that the repo has had contributions!
Unfortunately in recent months this project has been on the backburner for me for personal and academic reasons, so apologies for my lack of presence (research shifted to a python codebase for compatibility issues and have been thesis-writing).

Although I've added @devmotion as a contributor to the repo, I was thinking it might be a good idea to make a github organisation to host the repo instead of being on my personal github account. Perhaps this may make the contributing process smoother?

Edit: I imagine that this would be something that could be done perhaps not immediately, but in the near future.

Keen to hear any thoughts!

Function `emd` returning an error for an use case

I'm trying to run the following case:

n = 2
distm = collect(Iterators.product(1:n,1:n));
μ = reshape(distm,n*n);
ν = reshape(distm,n*n);
C = Distances.pairwise(Distances.SqEuclidean(), μ, ν)
u = ones(n^2)/n^2
v = ones(n^2)/n^2
emd2(u,v,C,Tulip.Optimizer())

This is returning the following error, which I haven't been able to figure out yet:
MathOptInterface.UnsupportedAttribute{MathOptInterface.ObjectiveFunction{MathOptInterface.ScalarAffineFunction{Int64}}}: Attribute MathOptInterface.ObjectiveFunction{MathOptInterface.ScalarAffineFunction{Int64}}() is not supported by the model.

Any ideas?

New Pull Requests

Hey @zsteve , I just submitted some pull requests to the library. I've implemented the Earth-Movers Distance in Julia, but the function to solve efficiently the LP problem necessitates the JuMP.jl, Tulip.jl and Clp.jl packages. Which were added as dependencies.

I was wondering how to contribute to the documentation, since it seems to be hosted in your personal page.

Also, the second pull request is just a Jupyter Notebook using the new function and doing some nice visualizations.

Documentation setup incorrect?

It seems the documentation is not deployed correctly since the gh-pages branch does not exist. Maybe you did not add a deploy key and a secret to Github?

Error in emd2

Hello,
I wanted to report that the following produces an error for me:

thetas = range(0, step = pi/6, length = 12)
costmat = [1 - cos(th1 - th2) for th1 in thetas, th2 in thetas]
p = [0.0, 0.0, 0.0, 0.0, 0.0, 0.3249465782947873, 0.0, 0.0, 0.3501068434104255, 0.0, 0.0, 0.3249465782947873]
q = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
OptimalTransport.emd2(p, q, costmat, Tulip.Optimizer())

However, it works when using OptimalTransport.POT.emd2 or if I slightly tweak the q vector (reducing 1.0 to .99).
The error message reads: "ERROR: BoundsError: attempt to access 1-element Array{Float64,1} at index [21]"

I am gonna use OptimalTransport.POT.emd2, but I thought its better to report it.

Separate Python even more?

I am implementing a Python interface for one of my packages with pyjulia. The package depends on OptimalTransport but since POT was quite annoying to install (there's a bug in pip and therefore one always has to install cython and numpy first, otherwise installation will fail) I completely switched to OptimalTransport + Tulip, eliminating all Python dependencies. However, pyjulia requires PyCall and installs and loads it automatically, and therefore automatically also the POT submodule in OptimalTransport is loaded even though I don't want to use it. And now Python users either have to install POT which is quite annoying and also problematic for CI (and it's not even used at all!), or they'll get an error message when OptimalTransport tries to load it in https://github.com/zsteve/OptimalTransport.jl/blob/9ad8f733770b7cd446c84ebfc90e2a558d6d120b/src/pot.jl#L8.

Maybe the Python interface to POT could be moved to a completely separate package or one could require users to load, and possibly install, it explicitly by making the __init__ function a proper function that could be called e.g. load_POT!.

Greenkhorn and screenkhorn - near-linear time sinkhorn

Thx for the great package, I have just been toying with it. I noticed there are some newer near-linear time sinkhorn-based algorithms (Greenkhorn and screenkhorn) that are available in POT. The link to the Greenkhorn paper is below.

Just curious if these are on the roadmap for OptimalTransport.jl?

https://arxiv.org/pdf/1705.09634.pdf

Citation in Recent Paper

Hey guys, just saw this article here.
It seems our package is getting noticed. ;)

CI broken

CI is broken for the time being because of issues with POT and Python/Conda.
See here: PythonOT/POT#268

Compute Wasserstein distance between a density and a sum of Diracs

I have two distributions in d-dimensional space, between which I want to compute Wasserstein distance. One distribution is a sum of Dirac delta functions (i.e. an empirical distribution), and the other is given by a density (e.g. a Gaussian). Is my best option to compute histograms of both and compute the distance between the histograms? I don't like this approach because the result will depend on the bin width, and bin width choice is a hard problem. Is there a better way?

Here is what I have so far:

using Distributions, LinearAlgebra, StatsBase
σ = MvNormal(I(2))
μ = rand(σ, 1000)
μ_hist = fit(Histogram, (μ[1,:], μ[2,:])) # make histogram of the empirical distribution
μ_mass = reshape(μ_hist.weights ./ 1000, :)
support = (μ_hist.edges[1][1:end-1] .+  Float64(μ_hist.edges[1].step), μ_hist.edges[2][1:end-1] .+  Float64(μ_hist.edges[2].step))
σ_mass = reshape([pdf(σ, [x,y]) for x in support[1], y in support[2]], :)
σ_mass ./= sum(σ_mass) # normalize to 1
sum(μ_mass) ≈ sum(σ_mass) # make sure the OT problem is balanced
C = reshape([sum(abs2, [x1,y1] .- [x2,y2]) for x1 in support[1], y1 in support[2], x2 in support[1], y2 in support[2]], length(μ_mass), length(σ_mass)) # |x-y|²
transport_plan = sinkhorn(μ_mass, σ_mass, C, 0.01)

Questions:

How to obtain the cost of the plan now?
What if I want to do exact regularized OT, how can I do it?
Can I circumvent making histograms and compute W₂(μ,σ) directly?
Do I have to do the reshaping into vectors? Seems annoying but I was getting errors from the statement size(C) == (size(μ, 1), size(ν, 1)) in checksize. I don't quite understand what C should be when μ and ν are not vector-valued.

Performance and scalability

Thanks for a great package.
I was hoping to use the Sinkhorn algorithm which I believe scales better than the EMD solver.
I did some quick benchmarks but found it was not the case.

using Distances, LinearAlgebra, OptimalTransport
using BenchmarkTools, Plots

function make_noisyprob(L, errorlevel)
    prob = ones(L)
    # Mimic Hamiltonian error
    prob += errorlevel * rand(L)
    return prob / norm(prob, 1)
end

function make_fakedata(dim, L, errorlevel)
    p = make_noisyprob(L, errorlevel)
    q = make_noisyprob(L, errorlevel)
    C = pairwise(SqEuclidean(), randn(dim, L), randn(dim, L))
    return p, q, C
end

dim = 10
Ls = 5:5:100
errorlevel = 0.01
ϵ = 0.1

ts_sinkhorn = map(Ls) do L
    p, q, C = make_fakedata(dim, L, errorlevel)
    @belapsed sinkhorn($p, $q, $C, $ϵ)
end

ts_emd = map(Ls) do L
    p, q, C = make_fakedata(dim, L, errorlevel)
    @belapsed emd($p, $q, $C)
end

p = plot()
plot!(p, Ls, ts_sinkhorn; label="Sinkhorn")
plot!(p, Ls, ts_emd;      label="EMD")
xlabel!("#{data points}")
ylabel!("Time (s)")

Did I do anything wrong here?

Separating code in multiple packages

It was discussed the idea of splitting the code in multiple and using OptimalTransport.jl as a "unifying interface". Are we going to follow this route? I've started the creation of the ExactOptimalTransport.jl package with the algorithms for the exact ot problem, but I have some questions about things like documentation. I mean, are we going to have documentation for each package, or just for the OptimalTransport.jl? Also, should we implement all the badges such as Zenodo, or just leave it in the main package?

Compatibility Issues

I was trying to install version 0.3.9 of our package, and I'm getting compatibility issues with UMAP.jl. Here is the error message:

(@v1.6) pkg> add UMAP
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package LsqFit [2fda8390]:
 LsqFit [2fda8390] log:
 ├─possible versions are: 0.5.0-0.12.0 or uninstalled
 ├─restricted by compatibility requirements with UMAP [c4f8c510] to versions: 0.6.0-0.11.0
 │ └─UMAP [c4f8c510] log:
 │   ├─possible versions are: 0.1.0-0.1.8 or uninstalled
 │   ├─restricted to versions * by an explicit requirement, leaving only versions 0.1.0-0.1.8
 │   └─restricted by compatibility requirements with Distances [b4f34e82] to versions: 0.1.6-0.1.8 or uninstalled, leaving only versions: 0.1.6-0.1.8
 │     └─Distances [b4f34e82] log:
 │       ├─possible versions are: 0.7.0-0.10.3 or uninstalled
 │       └─restricted to versions 0.9-0.10 by OptimalTransport [7e02d93a], leaving only versions 0.9.0-0.10.3
 │         └─OptimalTransport [7e02d93a] log:
 │           ├─possible versions are: 0.3.9 or uninstalled
 │           └─OptimalTransport [7e02d93a] is fixed to version 0.3.9
 ├─restricted by compatibility requirements with StatsBase [2913bbd2] to versions: [0.5.0-0.6.0, 0.11.0-0.12.0] or uninstalled, leaving only versions: [0.6.0, 0.11.0]
 │ └─StatsBase [2913bbd2] log:
 │   ├─possible versions are: 0.24.0-0.33.8 or uninstalled
 │   └─restricted to versions 0.33.8-0.33 by OptimalTransport [7e02d93a], leaving only versions 0.33.8
 │     └─OptimalTransport [7e02d93a] log: see above
 └─restricted by compatibility requirements with Distributions [31c24e10] to versions: uninstalled — no versions left
   └─Distributions [31c24e10] log:
     ├─possible versions are: 0.16.0-0.25.2 or uninstalled
     └─restricted to versions 0.25 by OptimalTransport [7e02d93a], leaving only versions 0.25.0-0.25.2
       └─OptimalTransport [7e02d93a] log: see above

Do any of you guys knows a way around this? I'm doing some work where I need to use both UMAP.jl and OptimalTransport.jl.

How to handle the pull requests with many modifications?

So, after sending the Pull Requests, many alterations were recommended, and it was suggested to even alter the name of the functions. I'm new to Open Source contributing. What is the proper way to handle such cases? Should I cancel my pull request and send another one with the changes made?

Deprecating and renaming additional functions

Similar to #45, a lot of functions currently are either named mirroring Python OT's function naming choices, or just ad-hoc as they were added originally. (I agree with ot_cost and ot_plan for unregularised OT as per #45, for the record).

Along those lines, for the sinkhorn routines we should use sinkhorn_cost and sinkhorn_plan, etc.
What to rename quadreg to? This solves OT regularised with an L2 cost on the transport plan. Currently it's solved using one particular algorithm, but there are other approaches, so I'd prefer not to refer to any particular algorithm in the function name.
One possibility is to have a general ot_reg_cost and ot_reg_plan wrapper for generalised regularisations. Then the user can specify the regularisation functional and algorithm upon calling. So something like
ot_reg_cost(mu, nu, C, 0.05; reg_func = "L2", method = "lorenz")

Adding support to parallelization with CUDA

I'm not very versed on CUDA and code parallelization, but some of the stuff that I'm doing is starting to become too demanding. Hence, enabling parallelization and running on CUDA would be an interesting feature. What do you guys think? Do you thinks this is a feature we should support in the package?

Dynamic Formulation for OT

I'm starting to work on a solver using the dynamic formulation of OT (the Benamou-Brenier numerical method). I was wondering if such method should go in the main OptimalTransport.jl package, or if I should create a new package.

License

I was very surprised when I just noticed that you use the GPL license for this package. Is there any particular reason for this? Commonly Julia packages use the MIT license since in contrast to GPL it is not viral and does not "infect" derivative work.

Please add Fused Gromov Wasserstein

Hi,

To compare huge molecules with different numbers of atoms, It would be great to have a way to do this:

https://tvayer.github.io/materials/Titouan_Marseille_2019.pdf

http://proceedings.mlr.press/v97/titouan19a.html

One problem we hit was, running out of GPU memory, it would be hella useful to do some blockwise reduction or something like that

Process for adding new repos to the Julia Optimal Transport project

Guys, I'd like to know what is the process for adding new repos to the Julia Optimal Transport project. For example, what are the criteria for a repo to be added to the project.
I have some repos that I'm working on, such as a repo with VegaLite visualizations for Optimal Transport and another of Hierarchical Clustering with OT. In the future, I'll probably do another one with Transfer Learning using OT (I think POT.py has something similar).

Unbalanced Sinkhorn divergences

Recently came across a use case for this. Might be something worth implementing in the future.
https://arxiv.org/pdf/1910.12958.pdf

Contributing

Hey there, I'm doing my dissertation on Optimal Transport, and will probably be writing some code in Julia. So I'm very interested in contributing. Do you have any guidelines on how to contribute?

Att,

Convention about using greek letters/unicode or ASCII variable naming

Currently most of the code uses ASCII variable names (e.g. mu, nu). The benefit here is that it can be easier to deal with. However, I was thinking perhaps it it is more idiomatic Julia to use Greek/Unicode (e.g. μ) for the internals? In any case, we should decide on a convention since currently both ASCII and symbols are used.

Move Python implementation to separate package?

I ran into quite many problems when I tried to set up CI for a package that depended on OptimalTransport.jl due to the Python dependency and the requirement to install POT. I think it could be helpful to either make the Python dependency optional or move it to a separate package.

`sinkhorn2` should return loss, not `dot(gamma, C)`

I noticed a problem with the current implementation of sinkhorn2: currently it is only returning the transport cost
dot(gamma, C), not the true Sinkhorn loss which includes the entropy penalty, i.e. dot(gamma, C) + eps*dot(gamma, log.(gamma)). It seems that PythonOT does the same thing (hence our unit tests work)?

What's strange is that consulting the PythonOT docs, they say:

Solve the entropic regularization optimal transport problem and return the loss.

@devmotion I think your docs in PythonOT.jl say "return the OT cost", which I interpret as being only the dot(gamma, C) component.

I think we should include the entropy penalty, since it's really part of what gives entropic OT its nice properties. For instance, the reason I caught this issue today was that I was trying to solve a variational problem using dot(gamma, C) as the loss and differentiating through sinkhorn2, but not having the entropy term makes the optimisation fail (I think due to lack of smoothness).

Who is Tim Matsumoto... and we should probably update the README

This issue is to deal with the README. I see that our documentation is already very far ahead from the README. So we should probably update it. I've been thinking about doing it, but I was wondering what you guys think it should have (usage examples with figures? list of functionalities? a bit of Optimal Transport theory?).

P.S: Who is Tim Matsumoto?

Should I extend the `ot_cost` and `ot_plan` functions to work with FiniteDiscreteMeasure?

We've exported the discretemeasure function that creates a DiscreteNonParametric distribution from Distributions.jl or FiniteDiscreteMeasure which we've implemented. As of now, if we do something like

mu = discretemeasure(rand(5))
nu = discretemeasure(rand(10))
wasserstein(mu,nu)

It'll work, but only for the 1D case. I'd like to extend this to something like:

mu = discretemeasure(RowVecs(rand(5,2)))
nu = discretemeasure(RowVecs(rand(10,2)))
wasserstein(mu,nu,  Tulip.Optimizer())

Here, the cost matrix would be internally constructed.

It would be nice to have similar functionality for the other functions in the package, where the user could pass
a cost function and the measures:

mu = discretemeasure(RowVecs(rand(5,2)))
nu = discretemeasure(RowVecs(rand(10,2)))
sinkhorn(sqeuclidean, μ, ν, 0.1);

What do you guys think?

Should we create a separate Examples Repository?

Hey guys, I just saw the new Variational Problem example. Loved it! I was studying Wasserstein Gradient Flow these days, but more on theoretical terms, and the example implemented is quite neat! I was wondering what you think of creating a separate repository with notebooks (Pluto maybe?), or do you think this would be redundant? Sorry that I've been inactive these months, I'm quite busy with the phd right now, and hopefully will be able to do more stuff by the end of the year. I'd really like to update our examples using Makie for plots.

With Pluto I've created some very neat interactive Optimal Transport examples, that I presented at school (e.g. Wasserstein solution flowing).

Extend compat and bump version

Hello,

I noticed there are a lot of pull requests for CompatHelper. Would it be possible to merge those pull requests and register a new version?

Thanks!

Creating 1D Versions - Questions about Dependencies

Hey guys, I'm writing the 1D version for the Algorithms. Although our functions already work for the 1D case, I think we should implement a more efficient version, since the solution for the 1D case might be used on a future SlicedWasserstein (note that POT.py also has a 1D version).

My implementations use functions such as quantile from Distributions.jl and uses quadgk from QuadGK.jl.
So I'd like your suggestion on making these two packages a dependency. What do you think?
This would allow us use these two very neat functions for the 1D case:

function otc1d(μ::Distributions.UnivariateDistribution,ν::Distributions.UnivariateDistribution,c)
    g(μ,ν,x) = c(quantile(μ,x),quantile(ν,x))
    f(x) = g(μ,ν,x)
    return quadgk(f,0,1)[1]
end

function otp1d(μ::Distributions.UnivariateDistribution,ν::Distributions.UnivariateDistribution,c)
    Tmap(x) = Distributions.quantile(ν,Distributions.cdf(μ,x))
    return Tmap
end

Note that this allow us to calculate the distance for any 1D parametric distribution.

I've done some quick benchmarking, and the code seem quite efficient as is. Also, I think we should change the name of the functions emd and emd2 to something more accurate, because they are not necessarily the Earth Movers Distance (but we can discuss this in another occasion).

Also, if we make Distributions.jl a dependency, we can multithread the emd function to accept ::Distributions.UnivariateDistribution.DiscreteNonParametric.

Problem PreCompiling with Distributions

I was updating my 1D Optimal Transport code, and before submitting, I pulled our updated master branch. After that, for some reason, when I try using the Distributions.jl types, such as mu::UnivariateDistribution , I get an error saying

ERROR: LoadError: UndefVarError: ContinuousUnivariateDistribution not defined

Here is how you can replicate it. Activate the environment in the Repl and add Distributions.jl as a dependency. After that, modify the OptimalTransport.jl file by adding any function using the Distributions.jl type, for example:

function optimal_transport_plan(
    c, μ::ContinuousUnivariateDistribution, ν::UnivariateDistribution
)
    # Use T instead of γ to indicate that this is a Monge map.
    T(x) = Distributions.quantile(ν, Distributions.cdf(μ, x))
    return T
end

And then you will get the error I just mentioned. Note that this was not happening in my first PR for the 1-D cases. It only happens now with the new master. I'm speculating it has something to do with the new dependencies, but I couldn't figure it out.

juliaoptimaltransport / optimaltransport.jl Goto Github PK

optimaltransport.jl's People

Contributors

Stargazers

Watchers

Forkers

optimaltransport.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs