GithubHelp home page GithubHelp logo

Comments (29)

ViralBShah avatar ViralBShah commented on May 27, 2024 2

I would be ok with forbidding creation of sparse matrices where zero(T) is not defined. Would be good to hear other thoughts.

from sparsearrays.jl.

StefanKarpinski avatar StefanKarpinski commented on May 27, 2024 1

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

This should absolutely not be defined. The empty string is the multiplicative unit of the string semigroup which is why we have one(String) == "". There is no additive identity because we do not use + for concatenation. If we did, then "foo" + "bar" should produce the regular expression r"foo|bar". String and regular expressions form a semiring under the operations of * for concatenation and + for alternation. So the correct value for zero(String) if any, should be a regular expression that matches nothing, so we could have zero(::Type{String}) = r"(?!)".

from sparsearrays.jl.

KlausC avatar KlausC commented on May 27, 2024 1

It looks like we must either commit a "breaking change" or provoke a "type piracy" by the user, if we want to extend the element types of Sparse... beyond Numbers.
Defending my proposal of sparse_unstored etc.: it seems to be a crime of second degree, if we pirate these new functions compared to iszero and zero :-) .

Thinking more about adding type parameters, and why it disgusts me adding two functions as replacements of iszero and zero, I believe, these two functions are jointly representing one single concept (of storage). Maybe a question of economy and extensibility to encode this concept in a single parameter. With other words, can't we pass a single trait name to encompass what we want. That could look like so:

struct SparseMatrixCSC{Tv,Ti,NullityConcept} ... end

SparseMatrixCSC{Tv,Ti}(...) = SparseMatrixCSC{Tv,Ti,DefaultNullityTrait}(...)

iszero(x, ::Type{<:DefaultNullityTrait}) = iszero(x)
zero(T::Type, ::Type{<:DefaultNullityTrait}) = zero(T)

abstract type DefaultNullityTrait end

That would allow the use with different concepts to define

abstract type StringNullityTrait end
Base.iszero(x::AbstractString, ::Type{<:StringNullityTrait}) = isempty(x)
Base.zero(::Type{T}, ::Type{<:StringNullityTrait}) where T<:AbstractString = T("")

and inject this behavior into the sparse matrix using constructor SparseMatrixCSC{Tv,Ti,StringNullityTrait}(...)

from sparsearrays.jl.

Suavesito-Olimpiada avatar Suavesito-Olimpiada commented on May 27, 2024 1

I agree with that, but the problem remains on that we cannot change type for the current SparseArray.

What I think is that this allows to solve most of the problems without breaking anything.

Also allows to be extensible, for example, someone could define a DefaultArray with a general default value (no strictly zero). This would allow to have something like what @StefanKarpinski proposed in some places before (cannot find the link rn), but the ideas was that some operation over matrices (including for pure function, map) are closed if we allow a general default value.

Again, this would be more elegant (and general) if we didn't make the distintion of the zero case, but right now that is the accepted and stable interface for stdlib SparseArray.

from sparsearrays.jl.

SobhanMP avatar SobhanMP commented on May 27, 2024 1

can you provide an MWE?

from sparsearrays.jl.

Suavesito-Olimpiada avatar Suavesito-Olimpiada commented on May 27, 2024 1

Sorry for the pollution, I just noticed that although I had deved the main branch from here, it would still use the sysimage version. Downloading Julia nightly, it works as expected.

from sparsearrays.jl.

ViralBShah avatar ViralBShah commented on May 27, 2024

Happy to accept a PR and see this work.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

Tried, with mixed results ;-) Doing so will forbid creating sparse vectors or matrices of types T with no defined zero(T) for instance in SparseVector{Tv,Ti}(s::AbstractVector{Tv}) as it needs to determine if the values are non-zero. That would be fine I think, except that in line 1246 of stdlib/SparseArrays/test/sparse.jl there is this test

   A = sparse(["a", "b"])
   @test_throws MethodError findmin(A, dims=1)

That now fail in the creation of the vector. I don't know what to think, is SparseVector{String,Ti} something that should be supported? All remaining tests pass (both for vector and matrices).
All in all I think that the best would be to be able to specify the non-stored value for sparse arrays.

from sparsearrays.jl.

stevengj avatar stevengj commented on May 27, 2024

See also JuliaLang/julia#19561, JuliaLang/julia#11408.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

See also JuliaLang/julia#19561, JuliaLang/julia#11408.

Thanks @stevengj. I was unaware of JuliaLang/julia#11408. If there is no consensus regarding the drop of support for types T without zero(T), then maybe it is better close this (and the PR)?

from sparsearrays.jl.

andreasnoack avatar andreasnoack commented on May 27, 2024

I'm in favor of restricting SparseMatrixCSC to elements with a zero but regardless of that discussion, I think JuliaLang/julia#30580 improves that the situation for StaticArrays, right?

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

I'm in favor of restricting SparseMatrixCSC to elements with a zero but regardless of that discussion, I think JuliaLang/julia#30580 improves that the situation for StaticArrays, right?

Yes, but I admit that I found the example after I ran into the "bug" in the code. The current code uses != 0 in several places, which is of course wrong for non-numerical types that define zero; but at least it doesn't give an error for non-numerical types that do not define zero (although !=0 is always true there, so for instance the sparse-dense-sparse round-trip gives a matrix full of stored zeros). Using iszero would mean to really drop support for those types (e.g. String) even for simple storage (which I don't know if people really use, but at least it appears in some tests).

from sparsearrays.jl.

yurivish avatar yurivish commented on May 27, 2024

D4M sometimes uses sparse arrays to store non-numeric data.

I found this issue after being tripped up by the fact that broadcast non-numeric data didn't work, and was a little surprised that you couldn't broadcast an Any[1, 2] with a sparse matrix either, though I see now why that's the case. Regardless of the conclusions here, it would be nice if that error message were improved to be a bit more specific. Thanks to everyone for taking the time to thoroughly map out the territory — I learned a lot reading through some of the issues just now.

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

I wonder whether Base should maybe define zero(::String) == "", which would at least make that specific case work.

I get your point, but at least from the mathematical point of view it seems an awkward stretch IMHO: "" is the identity element of the multiplicative (non-abelian) semigroup (*,String), whereas zero(T) should be the identity of the additive group (+,T). I think all this awkwardness could be avoided if one could specify the "non-stored" element of sparse arrays. On numerical data, linear operations could be fast-tracked if the non-stored element iszero so as to retain the current performance (and a sensible default could mimic current or desired behavior). I'm not saying all this is easy to implement though of course, but it seems like a fun project.

from sparsearrays.jl.

yurivish avatar yurivish commented on May 27, 2024

Right, yeah, I realized later that I was confused. Returning regular expressions seems mathematically correct but not very nice for performance (or type-stability).

from sparsearrays.jl.

lr4d avatar lr4d commented on May 27, 2024

I ran into the following error today when trying to use spare arrays with non-numeric data types. I could create a SparseVector{Symbol} but couldn't use it as intended

julia> v = SparseArrays.sparsevec([1;22], [:a, :b])
22-element SparseVector{Symbol,Int64} with 2 stored entries:
  [1 ]  =  :a
  [22]  =  :b

julia> findfirst(isequal(:b), v)
ERROR: MethodError: no method matching zero(::Type{Symbol})

If these types do not really support working with non-numeric values, then I would also suggest to forbid their creation

from sparsearrays.jl.

KlausC avatar KlausC commented on May 27, 2024

As far as I know, the SparseVector and SparseMatrixCSC types have minimal requirements, which are support for iszero and zero. Both are essential for the intended functionalty of the structures - and make not much sense for Symbol or String.
Neverthless it is not desirable restrict the element types to (for example) Number s, because there are other senseful types, which have both iszero and zero.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

I wonder if we could not simply add two new parameters SparseMatrixCSC{Tv,Ti,Z,IZ}, and make the constructors make them default to zero and iszero respectively. This would make using SparseMatrixCSC as storage for non-numerical types possible, on pair with Matrix.

from sparsearrays.jl.

KlausC avatar KlausC commented on May 27, 2024

I would prefer not to add functions as type parameters.
Alternatively introduce new functions like sparse_unstored and is_sparse_unstored, which default to zero and iszero for Number, and which can be redefined for other types.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

Is there a particular reason why functions as type parameters are to be avoided? (honest question)
I don't like the sparse_unstored and is_sparse_unstored solution, because then user-defining it e.g. for String would be type piracy...

from sparsearrays.jl.

KlausC avatar KlausC commented on May 27, 2024

Is there a particular reason why functions as type parameters are to be avoided? (honest question)

The reason that made me feel uncomfortable with this was, that it introduced a breaking change into the type system.

would be type piracy...

Not, if we defined default implementation for String and other relevant types.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

Not, if we defined default implementation for String and other relevant types.

For every type? This doesn't feel right to me, as in a sense would be making String aware of sparse structures. And the problem would still stand for types defined in packages, no?

from sparsearrays.jl.

Suavesito-Olimpiada avatar Suavesito-Olimpiada commented on May 27, 2024

Hi! I was redirected to this issue form JuliaLang/julia#42536 since I'm using SparseArrays with Num from Symbolics.jl and all the (!)== 0 messes with the use since is returns an Expresion instead of a Bool.

Is there any consensus as to whether the iszero and zero interface for a type is required to work with SparseArrays?

XR: https://discourse.julialang.org/t/is-there-a-defined-minimal-interface-for-a-type-to-work-with-sparsearrays/69602

from sparsearrays.jl.

jlapeyre avatar jlapeyre commented on May 27, 2024

An alternative solution, with other benefits is here: https://github.com/JuliaLang/julia/issues/41036
This looks like it may be related to proposals involving isnullable, although I can't find them at the moment.

As far as I can tell https://github.com/JuliaLang/julia/issues/41036 involves no breakage, inefficiency, or pirating. In fact it clarifies the role of the unstored element.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

An alternative solution, with other benefits is here: JuliaLang/julia#41036 This looks like it may be related to proposals involving isnullable, although I can't find them at the moment.

As far as I can tell JuliaLang/julia#41036 involves no breakage, inefficiency, or pirating. In fact it clarifies the role of the unstored element.

This is similar I think to the sparse_unstored proposal above in this thread by @KlausC and the nullable solution by @Suavesito-Olimpiada in https://discourse.julialang.org/t/is-there-a-defined-minimal-interface-for-a-type-to-work-with-sparsearrays/69602 if I'm not mistaken.

from sparsearrays.jl.

Suavesito-Olimpiada avatar Suavesito-Olimpiada commented on May 27, 2024

I think that although the core idea of sparse arrays for linear algebra and storage is the same, they are quite different on the semantics.

Under the actual implementation, for linear algebra you want to have a additive "zero" element under a ring. But for storage sparse array, all you want to have is a "nullable" (unstored) element with no other properties attached to it. Under this circumstances the sparse array idea for linear algebra is a subset of the more general idea of a storage sparse array.

One solution that come to my mind that allow the current implementation to continue working is to define a AbstractStorageSparseArrays type where you can define a Unstored and IsUnstored type parameters which are functions that define the "default" unstored element and the check if an element is unstored.

Then base AbstractSparseArrrays on this new type and set this parameters as default to zero and iszero on this.

The basic idea is to have something as the following

abstract type AbstractStorageSparseArray{Tv, Ti, N, Unstored, IsUnstored} <: AbstractArray{Tv, N} end

abstract type AbstractSparseArray{Tv, Ti, N} <: AbstractStorageSparseArray{Tv, Ti, N, zero, iszero} end

const AbstractStorageSparseVector{Tv, Ti, Unstored, IsUnstored} = AbstractStorageSparseArray{Tv, Ti, 1, Unstored, IsUnstored}

const AbstractSparseVector{Tv, Ti} = AbstractSparseArray{Tv, Ti, 1}

struct SparseVector{Tv, Ti<:Integer} <: AbstractSparseVector{Tv, Ti}
    # fields...
end

And base the basic interface for sparse arrays on the new AbstractStorageSparseArray. The use of Unstored and IsUnstored as type parameters allows to define in the most convenient way what elements are not stored and the unstored value, and the definition of AbstractSparseArray does not change the current type.

This also avoids type piracy, allowing for example to do StorageSparseArray{String, Int, isempty, emptystring} (defining emptystring() = "" in the corresponding module).

Maybe I'm missing things, I've tried to go to read all the places where this came up, but it is really hard to find them all and remember all the points made.

from sparsearrays.jl.

abraunst avatar abraunst commented on May 27, 2024

I agree with almost everything (I proposed something similar some comments above), except that I think that having a generic unstored would be useful also for algebraic elements (e.g. Float64), so I would not separate between "storage" and "linear algebra" arrays. The distinction will come automatically because some algebraic operations will fail if the corresponding operations on elements are not defined...

from sparsearrays.jl.

SobhanMP avatar SobhanMP commented on May 27, 2024

this has been fixed

julia> nnz(sparse(Matrix(s)))
0

from sparsearrays.jl.

Suavesito-Olimpiada avatar Suavesito-Olimpiada commented on May 27, 2024

Still not working with Symbolics. I think it should not be closed yet.

from sparsearrays.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.