sciml / datainterpolations.jl Goto Github PK
View Code? Open in Web Editor NEWA library of data interpolation and smoothing functions
License: Other
A library of data interpolation and smoothing functions
License: Other
See DifferentialEquations.jl
For the case that the step size in the abscissa is constant, the following zero_hold
interpolant is some 150 times faster than ZeroSpline
-- on a test case. However, I'm no super coder, so my code uses some 7 times more memory on my test case.
function zero_hold(x,dx,y)
if x < 0
return y[1]
elseif x >= length(y)*dx
return y[end]
else
return y[trunc(Int,x/dx)+1]
end
end
The added speed is important (my test case is not that important -- I used a simple sinus function). But because of added memory use, in practice my code is slower when used in an ODE solver (due to garbage collection).
Would it be possible to extend the ZeroSpline
interpolant to allow for constant abscissa step? With your coding expertise -- can you make it memory efficient, too? That would be super!
Hi,
while implementing a solution to a radiative transfer problem, I encountered a problem when trying to use a CubicSpline
with
SIMD instructions.
Basically I have a integration routine using the @turbo
macro from LoopVectorization.jl, where the integrand contains an interpolator from DataInterpolations.jl.
Here is a small example reproducing the problem:
using DataInterpolations
using LoopVectorization
u = [14.7, 11.51, 10.41, 14.95, 12.24, 11.22]
t = [0.0, 62.25, 109.66, 162.66, 205.8, 252.3]
A = CubicSpline(u,t)
function simd_example(A)
f = x -> A(x)
x_arr = collect(range(10.5,14.0, 100))
y_arr = zero(x_arr)
@turbo for i in eachindex(x_arr)
y_arr[i] = A(x)
end
return y_arr
end
simd_example(A)
Here is the output:
ERROR: TypeError: non-boolean (Mask{4, UInt8}) used in boolean context
Stacktrace:
[1] signless(x::MM{4, 1, Int64}, y::Float64)
@ Base ./operators.jl:143
[2] isless(x::MM{4, 1, Int64}, y::Float64)
@ Base ./operators.jl:185
[3] searchsortedlast
@ ~/Schreibtisch/Test_simd.jl:19 [inlined]
[4] searchsortedlast
@ ./sort.jl:295 [inlined]
[5] #searchsortedlast#5
@ ./sort.jl:297 [inlined]
[6] searchsortedlast
@ ./sort.jl:297 [inlined]
[7] _interpolate(A::CubicSpline{Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, true, Float64}, t::MM{4, 1, Int64})
@ DataInterpolations ~/.julia/packages/DataInterpolations/Al4Ib/src/interpolation_methods.jl:146
[8] AbstractInterpolation
@ ~/.julia/packages/DataInterpolations/Al4Ib/src/DataInterpolations.jl:37 [inlined]
[9] macro expansion
@ ~/.julia/packages/LoopVectorization/ORJxS/src/reconstruct_loopset.jl:947 [inlined]
[10] _turbo_!
@ ~/.julia/packages/LoopVectorization/ORJxS/src/reconstruct_loopset.jl:947 [inlined]
The vectorization fails in searchsortedlast(...)
, when calling isless
with vectorized arguments.
I tried to define an adapted version of searchsortedlast
, but to no avail so far.
Would be great if you could share your opinion on how to approach or circumvent this problem.
Judging from my benchmarks with non interpolated functions I would get a massive 3x speedup from using LoopVectorization.jl.
#84 should have failed as DoesNotExist
is no longer defined by ChainRulesCore (renamed to NoTangent
)
(fixed in #85)
but there are no actual tests of that directly.
Which makes JuliaDiff/ChainRulesCore.jl#292 kinda pointless.
It should be relatively straight forward to set them up using ChainRulesTestUtils.
The URL of this package does not match that stored in METADATA.jl.
cc: @julia-tagbot[bot]
This package should highlight what its benefits are over the existing https://github.com/JuliaMath/Interpolations.jl
Not only in terms of peformance but also in terms of features, compatibility, etc. because as of right now there is no guideline why one should use this package over the existing Interpolations.jl
I am not sure if github issues is the place for this. I am opening this following this post that suggested "From now on we should treat compile time issues and compile time regressions just like we treat performance issues".
On my machine (Macbook 16 inch 2019, MacOS 13.0, Julia 1.8.2, DataInterpolations v3.10.1) I get:
julia> @time using DataInterpolations
8.627113 seconds (14.34 M allocations: 898.668 MiB, 4.95% gc time, 26.15% compilation time: 69% of which was recompilation)
The interpolations need the ability to not just interpolate but also estimate time-varying functions and data. For this, we want the interpolation object to act as a continuous function x(t)
while having x[i]
be the mutable pieces of data that it's interpolating from.
I deal with timeseries a lot, so for me interpolation involves constructing the interpolant once with thousands of data points, then interpolating to thousands of (monotonic) time steps, and repeating that interpolation many times. In particular, for me, the LU decomposition in constructing a cubic spline takes a small fraction of the total time. (But interpolation is by far the largest bottleneck for my larger code.)
At least on my laptop, the step that takes the longest is the call to searchsortedlast
. The problem is that each time this is called, it starts with no information — the required index might be anywhere in the input data. However, if successive interpolations are for correlated t
values, it can vastly increase the speed to assume correlated indices. I guess this improves the cache locality a lot, too. So much so that, at least for my use cases, the code speeds up by a factor of 3 or 4.
FWIW, here's the implementation I threw together with inspiration from Numerical Recipes's hunt
function and the built-in searchsortedlast
.
@inbounds function searchsortedlastcorrelated(v::AbstractVector, x, guess::T)::keytype(v) where T<:Integer
u = T(1)
bottom = firstindex(v)
top = lastindex(v)
if guess < bottom || guess > top
# For a bad guess, just jump to standard bisection
lo = bottom
hi = top
else
increment = u
lo = guess
hi = guess+u
if x ≥ v[lo] # Hunt upwards
lo = hi
while true
hi = lo + increment
if hi ≥ top
hi = top
break
elseif x < v[hi]
break
else
lo = hi
increment += increment
end
end
else # Hunt downwards
hi = lo
while true
lo -= increment
if lo ≤ bottom
lo = bottom
break
elseif x ≥ v[lo]
break
else
hi = lo
increment += increment
end
end
end
end
# Begin standard bisection, à la `searchsortedlast`
while lo < hi - u
m = Base.midpoint(lo, hi)
if x < v[m]
hi = m
else
lo = m
end
end
return lo
end
I just loop through the vector of times to which I have to interpolate, keeping track of the index output by this routine, and feeding it back in with each call to interpolation. With this change, the search drops to taking a fraction of the time that it takes just to do the arithmetic of actually evaluating the spline. (My next goal is to improve this to be able to use SIMD, along the lines of what happened in #123.)
It might be nice to add a method like (interp::AbstractInterpolation)(t::Number, i::Integer)
, or even (interp::AbstractInterpolation)(t::AbstractVector{<:Number})
that would take advantage of this approach.
The coverage report reveals the LOESS based interpolation doesn't have test coverage. Ref #36
As used in https://github.com/UMCTM/DataInterpolations.jl/blob/c6cd37975eb4473ef32d358396b1176365ef8e40/src/interpolation_utils.jl#L60 and https://github.com/UMCTM/DataInterpolations.jl/blob/c6cd37975eb4473ef32d358396b1176365ef8e40/src/interpolation_utils.jl#L66. It prevents the compiler from inferring the types of vectors.
Will this library support extrapolations at some point? I generally want to buy in to SciML and related packages, but need extrapolating capabilities.
The Readme links to
which includes examples for Loess and GaussianProcess interpolations, but it seems those features have been removed from the package.
Also, may I ask why they have been removed? I am interested to contribute data smoothing/interpolation by Tikhonov regularization (aka ridge regression) and thought this might be a good place to include it. But that seems less likely if Loess and GP are out. Have the moved to another package, or are they simply done using the parent packages? I personally think it would be nice to collect the different interpolating and smoothing methods together with a common interface, and thought this might be a good place.
We need a way to test that the order of accuracy is correct.
Can just convert to a DiffEqArray with a high plot density.
Right now this library just errors if the time is before or beyond A.t
. We should extend this to have a flag for allowing extrapolation (default true?), and then add the extrapolation for each method.
Not every algorithm has a test. Every interpolation needs some sort of test.
using DataInterpolations
kVs = [80, 100, 120] # independent variable
HUs = [177, 145, 130] # dependent variable
itp = QuadraticInterpolation(kVs, HUs)
itp(kVs[1]) == HUs[1]
The code above returns false
when it should return true
. Not sure what is wrong, but also attached a screenshot after plotting this example
I would find it helpful for there to be docstrings for types and/or constructors.
Integral is defined for any AbstractInterpolation but then calls samples on that interpolation which only has methods for ::LinearInterpolation{<:AbstractVector}
types.
It would be useful to be able to get the integral of multi-dimensional interpolations. Derivetives returns a vector with element for each data dimension. Integral could do the same though this may not be a useful output for every use case.
I'd like to be able to use this to calculate means and standard deviations for each signal in an interpolation.
Example code:
u = rand(3,20)
t = sort(rand(20))
X1 = LinearInterpolation(u[1,:],t)
DataInterpolations.integral(X1, 0, 0.5) # Works
X = LinearInterpolation(u,t)
# MethodError: no method matching samples(::LinearInterpolation{Matrix{Float64}, Vector{Float64}, true, Float64})
DataInterpolations.integral(X, 0, 0.5)
Thanks,
Aaron.
The URL of this package does not match that stored in METADATA.jl.
cc: @julia-tagbot[bot]
if function (A::LinearInterpolation{<:AbstractVector{<:Number}})(t::Number)
were generalized beyond <:Number
to perhaps <:AbstractVector
then
Vector{SVector}
function (A::LinearInterpolation{<:AbstractMatrix{<:Number}})(t::Number)
could be written in terms of a reinterpret
/reshape
and the previous definition. The package would have to come to depend on StaticArrays.jl, as far as I can tell, however.And so on for the interpolation methods that only require vector space operations on the data.
I think it might be a good idea to test all the interpolators in the package using the function defined in this paper.
Feedback is welcome.
The README:
(A::LinearInterpolation{<:AbstractVector{<:Number}})(out,t::AbstractVector)
(A::LinearInterpolation{<:AbstractMatrix{<:Number}})(out,t::Number)
(A::LinearInterpolation{<:AbstractMatrix{<:Number}})(out,t::AbstractVector)
I know how to use DataInterpolations to get the derivative of an interpolation, like below, but is there anyway to get the second derivative?
using DataInterpolations
using DataInterpolations: derivative
N = 10
x = collect(range(0.0, 1.0, length=N))
y = x.^2
yi = CubicSpline(y,x)
println(yi(0.5)) # 0.5^2 = 0.25
println(derivative(yi, 0.5)) # 2*0.5 = 1.00
Using a symbolic interpolation variable
inerpolator(t::Num)
recently broke.:
ERROR: TypeError: non-boolean (Num) used in boolean context
Stacktrace:
[1] searchsortedlast(a::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, x::Num, o::Base.Order.ForwardOrdering)
@ Base.Sort ./sort.jl:231
[2] #searchsortedlast#5
@ ./sort.jl:293 [inlined]
[3] searchsortedlast
@ ./sort.jl:293 [inlined]
[4] _interpolate(A::LinearInterpolation{Vector{Float64}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, true, Float64}, t::Num)
@ DataInterpolations ~/.julia/packages/DataInterpolations/1u3Xy/src/interpolation_methods.jl:3
[5] (::LinearInterpolation{Vector{Float64}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, true, Float64})(t::Num)
@ DataInterpolations ~/.julia/packages/DataInterpolations/1u3Xy/src/DataInterpolations.jl:32
[6] (::ControlSystemsMTK.var"#74#85"{Num, Matrix{Any}})(i::Int64)
@ ControlSystemsMTK ./array.jl:0
[7] iterate
@ ./generator.jl:47 [inlined]
[8] collect_to!(dest::Vector{Equation}, itr::Base.Generator{Base.OneTo{Int64}, ControlSystemsMTK.var"#74#85"{Num, Matrix{Any}}}, offs::Int64, st::Int64)
@ Base ./array.jl:845
[9] collect_to_with_first!(dest::Vector{Equation}, v1::Equation, itr::Base.Generator{Base.OneTo{Int64}, ControlSystemsMTK.var"#74#85"{Num, Matrix{Any}}}, st::Int64)
@ Base ./array.jl:823
[10] collect(itr::Base.Generator{Base.OneTo{Int64}, ControlSystemsMTK.var"#74#85"{Num, Matrix{Any}}})
@ Base ./array.jl:797
[11] GainScheduledStateSpace(systems::Vector{StateSpace{Continuous, Float64}}, vt::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}; interpolator::Type, x_start::Vector{Float64}, name::Symbol, u0::Vector{Float64}, y0::Vector{Float64})
@ ControlSystemsMTK ~/.julia/dev/ControlSystemsMTK/src/ode_system.jl:480
[12] top-level scope
@ REPL[20]:1
Could this be due to Symbolics being moved to a conditional dependency but the glue code fails to load for julia v1.8?
It satisfies y(t)=y(t_{i}) at every data point, and this is something that the traditional interpolating polynomials all do, like:
and that also serves as a good list of interpolants to implement. In addition, there are other methods which do not satisfy y(t)=y(t_{i}) and instead smooth the data. This can be useful on noisy data. Methods include:
Those are the fancy ways to do it correctly, but the older packages have some methods which are lower parameterizations using simple functions. These can be better when you have less data points. The functions include:
In these cases, a simple curve fitting / parameter estimation is done to find the best parameters for the curve through the data, and that is taken as the form of the interpolation.
t = Float64[0, 1, 2]
y = [i*ones(2) for i in 0:2]
interp = LinearInterpolation(y, t)
interp(1.0) # errors
interp(1.001) # works
The error that I see is:
ERROR: MethodError: no method matching isnan(::Vector{Float64})
It seems like the argument to interp
may not be equal to any entry in t
.
Under Quadratic and Cubic Spline in this file , there are two dead links to https://www.math.uh.edu/~jingqiu/math4364/spline.pdf.
Using the following code
using DataInterpolations, Interpolations, BenchmarkTools
N = 400
u = rand(N)
t = range(0.0, 1.0, length = N)
x = sort(rand(100)) # sorting required for Interpolations
println("cubic spline with DataInterpolations")
interp = DataInterpolations.CubicSpline(u,t)
@btime $interp.($x)
@btime $interp(0.5)
println("cubic spline with Interpolations")
interp2 = Interpolations.CubicSplineInterpolation(t, u)
@btime $interp2.($x)
@btime $interp2(0.5)
println("linear with DataInterpolations")
interplin = DataInterpolations.LinearInterpolation(u,t)
@btime $interplin.($x)
@btime $interplin(0.5)
println("linear with Interpolations")
interplin2 = Interpolations.LinearInterpolation(t, u)
@btime $interplin2.($x)
@btime $interplin2(0.5)
I get the output
cubic spline with DataInterpolations
10.900 μs (1 allocation: 896 bytes)
113.312 ns (0 allocations: 0 bytes)
cubic spline with Interpolations
2.167 μs (1 allocation: 896 bytes)
31.770 ns (0 allocations: 0 bytes)
linear with DataInterpolations
10.499 μs (1 allocation: 896 bytes)
108.972 ns (0 allocations: 0 bytes)
linear with Interpolations
3.290 μs (1 allocation: 896 bytes)
26.851 ns (0 allocations: 0 bytes)
Lots of room for mistakes in the crude benchmarking and eliminated code, but worth taking a look at. For versions
I recently ran a script mixing DataInterpolations and Zygote but I'm not able to precompile DataInterpolations anymore. Here is the error message
Warning: Package DataInterpolations does not have Zygote in its dependencies:
│ - If you have DataInterpolations checked out for development and have
│ added Zygote as a dependency but haven't updated your primary
│ environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with DataInterpolations
└ Loading Zygote into DataInterpolations from project dependency, future warnings for DataInterpolations are suppressed.
I'm comparing Loess with the Loess.jl package and results differ a bit. Steps to reproduce:
import DataInterpolations, Loess
t = sort(10 .* rand(100));
u = sin.(t) .+ 0.5 * randn(100);
l1 = DataInterpolations.Loess(u, t, 2, 0.75);
l2 = Loess.loess(t, u, degree=2, span=0.75);
using Plots
scatter(t, u, label="")
plot!(t, l1.(t), label="DataInterpolations.Loess")
plot!(t, Loess.predict(l2, t), label="Loess.loess")
Concerning performance, it's a bit puzzling as fitting here is much faster than Loess.jl and prediction is much slower:
julia> using BenchmarkTools
julia> @btime DataInterpolations.Loess($u, $t, 2, 0.75);
1.706 μs (4 allocations: 4.31 KiB)
julia> @btime Loess.loess($t, $u, degree=2, span=0.75);
4.058 ms (77694 allocations: 3.82 MiB)
But performance on the prediction degrades when the data is a bit bigger:
julia> t = sort(10 .* rand(1000));
julia> u = sin.(t) .+ 0.5 * randn(1000);
julia> l1 = DataInterpolations.Loess(u, t, 2, 0.75);
julia> l2 = Loess.loess(t, u, degree=2, span=0.75);
julia> @btime $l1.($t);
43.769 ms (17003 allocations: 94.37 MiB)
julia> @btime Loess.predict($l2, $t);
4.545 ms (48492 allocations: 1.48 MiB)
The URL of this package does not match that stored in METADATA.jl.
cc: @julia-tagbot[bot]
It looks like you are using the textbook Lagrange interpolation formula. Trefethen has some articles on why this is numerically unstable and you should use the alternative barycentric formula instead. (This evaluates exactly the same interpolating polynomial in a different way.)
@JuliaRegistrator register()
If there are NaN value in the u vector then NaNs bleed into the interpolation values more than they should:
julia> t = 1:5
1:5
julia> u = [-20, -10, NaN, 10, 20]
5-element Vector{Float64}:
-20.0
-10.0
NaN
10.0
20.0
julia> [t u]
5×2 Matrix{Float64}:
1.0 -20.0
2.0 -10.0
3.0 NaN
4.0 10.0
5.0 20.0
julia> map(t->t => li(t), 1:0.5:5)
9-element Vector{Pair{Float64, Float64}}:
1.0 => -20.0
1.5 => -15.0
2.0 => NaN
2.5 => NaN
3.0 => NaN
3.5 => NaN
4.0 => 10.0
4.5 => 15.0
5.0 => 20.0
Above li(2.)
should equal -10.0
.
(The code is here: https://github.com/PumasAI/DataInterpolations.jl/blob/master/src/interpolation_methods.jl#L2-L6)
I'm attempting to use DataInterpolations with a ModelingToolkitStandardLibrary.Blocks.TimeVaryingFunction in ModelingToolkit. When I run structural_simplify
it seems the derivative is needed. I supply this as
f = LinearInterpolation(x, time)
Symbolics.derivative(::typeof(f), args::NTuple{1, Any}, ::Val{1}) = DataInterpolations.derivative(f, args[1])
@named system = System(false, f)
sys = structural_simplify(system)
However this gives: ERROR: MethodError: no method matching derivative(::LinearInterpolation{Vector{Float64}, Vector{Float64}, true, Float64}, ::SymbolicUtils.BasicSymbolic{Real}, ::Int64)
I also get this error if I simply run
julia> f(Symbolics.unwrap(t))
ERROR: MethodError: no method matching (::LinearInterpolation{Vector{Float64}, Vector{Float64}, true, Float64})(::SymbolicUtils.BasicSymbolic{Real})
Seems DataInterpolations is not supporting some latest Symbolics features?
julia> akima = DataInterpolations.AkimaInterpolation(sin.(0:10), 0:10);
julia> DataInterpolations.derivative(akima, 10.0)
0.0
julia> DataInterpolations.derivative(akima, prevfloat(10.0))
-1.1455895135058767
@JuliaRegistrator register()
It would be helpful to have methods for interpolating an n-dimensional array along a given axis as was discussed in #67 (comment).
This is currently possible just by "vectorizing" data along that dimension. Example:
data_3d = collect(reshape(1:125,5,5,5))
time = 1:5
vectorized_data = collect.(eachslice(data_3d,dims=3))
spline = CubicSpline(vectorized_data, t)
Is there an interest in having these conveniences or should that be left to the user?
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I recently tried to use Rasters.jl rasters (basically AbstractArray
s) with DataInterpolations, only for it to fail.
This turned out to be because it essentially called typeof(input_array)(multiplied_array)
, which wasn't defined for Rasters in such exactness (Raster(::Raster)
was defined, but not with all the type parameters written out).
I defined it and it worked perfectly, but I wanted to understand why this method is called in the first place.
Wouldn't multiplication give you the correct type anyway, or is this for e.g. Float64-Float32 conversion concerns?
If so, since AbstractArrays are typed as AbstractArray{N, T}
, would it be possible to just run T.(multiplied_array)
?
See #82 and SciML/DiffEqFlux.jl#380 (comment)
Essentially the issue is that in some cases differentiability w.r.t. the coefficients of the spline is needed for function fitting, like in SplineLayer
of DiffEqFlux. Currently we just have DoesNotExist()
which gives zero gradients in Zygote. While it's preferred to use the overload functions because they ensure numerical stability, on some interpolations it's okay, and those ones that are okay are just the most basic ones. So for now those are excluded from the derivative overload by #82 until analytical solutions for the coefficients are added, in which case we can add it right back when it's available. That fixes the DiffEqFlux tests while retaining the more numerically stable overloads on B-splines and such, which isn't the best solution because if you use the wrong interpolation you'll get zero'd gradients on the coefficients, but it's the best for now.
Currently, the show
method for the interpolants concatenates the vectors (i.e. shows vcat(u, t)
):
julia> u = rand(5);
julia> t = 0:4;
julia> LinearInterpolation(u, t)
10-element LinearInterpolation{Vector{Float64}, UnitRange{Int64}, true, Float64}:
0.5483587984330037
0.26039471258152735
0.19683677241504127
0.7099479665944533
0.08217814764401155
0
1
2
3
4
which is a consequence of having AbstractInterpolant <: AbstractVector
. I wonder if it would be cleaner to overwrite this, showing e.g. hcat(u, t)
instead? e.g.
julia> function Base.show(io::IO, mime::MIME"text/plain", F::DataInterpolations.AbstractInterpolation)
Base.show(io, mime, hcat(F.u, F.t))
end
julia> interp
5×2 Matrix{Float64}:
0.0589206 0.0
0.115624 1.0
0.862211 2.0
0.330109 3.0
0.430882 4.0
(or whatever is the best way for defining a new show
like this to also include 10-element LinearInterpolation{Vector{Float64}, UnitRange{Int64}, true, Float64}:
).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.