Comments (17)
I see - good point, this wouldn't be good :-(
(If string concatenation was addition, this wouldn't have been a problem...)
from sparsearrays.jl.
In theory the empty string should be one(String)
julia> one(String)
""
from sparsearrays.jl.
You can always add these at your own risk as I have not idea what they will break.
julia> Base.zero(::String) = ""
julia> Base.iszero(x::String) = length(x) == 0
julia> sparse(["123", "", "234"])
3-element SparseVector{String, Int64} with 2 stored entries:
[1] = "123"
[3] = "234"
IMHO, it might make sense to have an internal zero and iszero so that it's possible to overwrite them for sparse arrays only.
from sparsearrays.jl.
If I were writing an application, sure. But I'm writing a library, so changing Base
behavior this way is probably not a good idea.
Having SparseArrays.zero
and SparseArrays.iszero
which by default forward to Base
but allow overriding for additional types would be nice.
from sparsearrays.jl.
Nice workaround, should have thought of that.
It would bubble up to everywhere in the client code (e.g. eltype
would return Wrap
instead of String
) but that might be acceptable overhead.
from sparsearrays.jl.
Interesting...
I think the empty string still makes sense in sparse string vectors though.
from sparsearrays.jl.
I agree.
@ViralBShah @dkarrasch @LilithHafner Does it make sense to you?
from sparsearrays.jl.
I'd like to have an overlay structure similar to Jutho/SparseArrayKit.jl#13 . However SparseArrays.jl can't depend on FillArrays.jl and I don't know how to work around nicely.
from sparsearrays.jl.
Having
SparseArrays.zero
andSparseArrays.iszero
which by default forward toBase
but allow overriding for additional types would be nice.
That sounds very sensible. It provides more flexibility at seemingly no cost. I wonder though what applications one has in mind, if one defines zero
as something that is not the neutral element w.r.t. addition. Is it just about storing sparse rectangular data in the sense that "most" elements equal a fixed ("trivial") element? Or does some sort of algebra also come into play?
from sparsearrays.jl.
First, zero
(whether for Base
or specialized in SparseArrays
) works based on the type of the array element. This makes it impractical for using it for the "most common value" of an array.
Second, SparseArrays
code assumes that the zero
value is zero w.r.t addition and multiplication, and breaking this assumptions would require modifying it in many places - not to mention, breaking client code which also makes this assumption.
Having CompressedArrays
working similarly to SparseArrays
and allow for an arbitrary "most common" value per array instance would have been awesome (and would have reduced the need for things like nullable/masked arrays). However, somehow this idea never caught on...
from sparsearrays.jl.
The problem is of abstractions. The SparseMatrix data type and its numeric types are designed for numerical sparse matrix linear algebra calculations. So extending to non-numeric types is always going to feel unnatural. It would be nice to get as much as we can get for free within this design - I do get that.
For strings, isn't it better to use a different data structure like a dictionary?
from sparsearrays.jl.
There is a proposal floating around somewhere to support sparse arrays with custom "zero" values. Implementing that (either here or in a different package) would make a sparse array of strings fit naturally into the abstraction.
from sparsearrays.jl.
I don't like this
SparseArrays.zero(::Type{String}) == ""
because SparseArrays assumes zero * x == zero, and so a sparse array of strings will behave oddly under concatenation. A structural empty string concatenated with anything result in the empty string while stored empty strings concatenated with anything (x) will result in that thing (x).
from sparsearrays.jl.
It would be nice to get as much as we can get for free within this design - I do get that.
That's the idea.
For strings, isn't it better to use a different data structure like a dictionary?
Depends on the use case (of course). In my case, definitely not. Vectors have properties like length, and can be sliced, and masked, and so on. NamedArrays
makes them dictionary-like, if that's needed.
SparseArrays assumes zero * x == zero, and so a sparse array of strings will behave oddly under concatenation.
(As an aside, I would have been happier if string concatenation was considered to be addition rather than multiplication, commutativity be damned. E.g. sum(strings)
would have worked for concatenating all the strings in a vector which makes sense. But that's just me.)
In what context does the above matter? Note that you can't add strings so trying to do linear algebra on strings would fail anyway.
There is a proposal floating around somewhere to support sparse arrays with custom "zero" values. Implementing that (either here or in a different package) would make a sparse array of strings fit naturally into the abstraction.
That would be nice.
from sparsearrays.jl.
In what context does the above matter?
When performing broadcasted .*
:
julia> x = sprandstring(10, .1)
10-element SparseVector{String, Int64} with 2 stored entries:
[8 ] = "VlnHeUDF"
[10] = "LumRtFcf"
julia> y = sprandstring(10, .1)
10-element SparseVector{String, Int64} with 1 stored entry:
[7] = "SlumPG6z"
julia> julia> x .* y
10-element SparseVector{String, Int64} with 0 stored entries
That looks like a bug, and it would be hard to avoid without adding special cases to support the zero element not being a multiplicative zero.
(note: this example does not run, it is extrapolated from the results of sprand
)
from sparsearrays.jl.
There's a lot to consider when deciding whether string concatenation is +, *, or something else. When/if I develop a new language or stdlib spec from scratch this sort of thing will be worth considering. Within Julia, though, it's way to late to reconsider. You're right though. This would "just work" if that language design decision were different.
from sparsearrays.jl.
Maybe the solution is adding a wrapper that does the job? something like
struct Wrap
s::Union{Nothing,String}
end
Wrap()=Wrap(nothing)
Base.zero(::Wrap) = Wrap()
Base.iszero(x::Wrap) = x.s === nothing
sparse(Wrap.(["123", nothing, "234"]))
from sparsearrays.jl.
Related Issues (20)
- Single precision support in CHOLMOD HOT 3
- Cholesky F.PtL \ Av where Av is a view does not work HOT 1
- Base.stack is underperforming for SparseArrays HOT 3
- Regression in invalidations caused by SparseArrays HOT 3
- Elementwise multiplication by a view of a dense matrix gives a dense matrix
- `findmin(A; dims=1)` is much slower than manually looping over. HOT 1
- Memory Mapped SparseArrays HOT 3
- Extra allocations when using generalized `mul!` operation
- Attempting to run sparse `qr` produces StackOverflow when run on a sparse matrix of `ForwardDiff.Dual`. HOT 6
- Inconsistent addition between sparse and dense HOT 1
- `ldiv` of `LUFactorization` can throw `SingluarException` HOT 1
- Thread-safe dropstored! HOT 1
- Merge SparseMatricesCSR.jl in HOT 2
- Support zero-based indices HOT 3
- Windows threading tests fail in GitHub Actions CI but pass in Buildkite CI
- Problem when running old benchmarks in Oceananigans HOT 6
- Sparse matrix format interfaces HOT 3
- Clarify Cholmod version incompatibility message at build time and run time HOT 7
- Row-wise and column-wise scaling of a sparse matrix runs out of memory HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparsearrays.jl.