Comments (4)
Moving code from Zarr here sounds like a good strategy. Also I've just fixed HDF5 indexing interface and written one for ArchGDAL, which could help integrate those too at some stage. We should aim high for this :)
Yes I was hoping this could help more packages, and as soon as we demonstrate this has some value we can try to make more packages backed by this.
In terms of extra features I was thinking it would be great to handle broadcast?, reducing methods, and show in sane ways, so you can run code written for Array on AbstractDiskArray and it works without crashing/stalling? So summing a disk-based array should sum chunk by chunk so larger than ram files just work without ever having to think about it.
Yes, this would definitely be on the roadmap. I think, in order to tackle low-hanging fruits first I would start with show
and reduce
. I think broadcast
can already get a bit tricky, because multiple arrays are involved and you have to start thinking about chunk alignment and other tricky things. Since I have implemented a lot of this functionality already in ESDL (though in a much less principled way) I would not put priority into this, but would definitely support anyone else trying to tackle this.
I would also be happy to move any other pieces from GeoData that you think might be useful to a broader class of disk-based arrays. So far I was a bit hesitant to subtype AbstractArray
, but maybe it would make sense to be ambitious and simply show that we finally want to provide a complete array experience, so I don't mind doing this.
from diskarrays.jl.
Thanks for starting this. I haven't done anything yet, I have a lot of packages mid-way to release so my capacity is pretty low for a while, but I would like to contribute to this, especially if it can remove code from GeoData.jl.
Moving code from Zarr here sounds like a good strategy. Also I've just fixed HDF5 indexing interface and written one for ArchGDAL, which could help integrate those too at some stage. We should aim high for this :)
In terms of extra features I was thinking it would be great to handle broadcast
?, reducing methods, and show
in sane ways, so you can run code written for Array on AbstractDiskArray
and it works without crashing/stalling? So summing a disk-based array should sum chunk by chunk so larger than ram files just work without ever having to think about it.
view
is also interesting... I've had to write a bunch of windowing code in GeoData.jl to deal with lazy loading propagating views from stacks to arrays, and SubArray
s not working with non-Arrays. So it would be good if DiskArray <: AbstractArray
and we have methods to cover where Array methods break with disk based arrays.
from diskarrays.jl.
I forgot to mention, in order to implement reductions, we would need some concept of chunking. Do you think we should make that part of this package (i.e. move code from ChunkedArrayBase here) or shall we depend on this package?
from diskarrays.jl.
I agree broadcast will be the hardest part, that's a good idea to leave it until last.
I was imagining chunking would be integral to a lot of this too, but I'm not sure how your packages work.. but just depending on ChunkedArrayBase could be fine? maybe a lot of these methods would even be in chunked array base? I'm not sure what the best plan is there.
from diskarrays.jl.
Related Issues (20)
- DiskArrays interface for files that are not yet open HOT 2
- Linear indexing? HOT 2
- common_chunks fails in broadcast with mixed ndims HOT 5
- ConstructionBase.jl dep HOT 2
- Lazy mean, sum etc? HOT 17
- Chunks do not align in dimension 1 HOT 2
- Implement chunked `foreach`/non allocating chunked iteration. HOT 2
- Does this strictly have anything to do with Disk? HOT 4
- Chunked broadcast with objects of smaller dimensions is broken. HOT 3
- Implement PermutedDimsArray for DiskArrays HOT 3
- Iteration is broken with chunking HOT 4
- What is the most efficient way to access discrete columns of a chuncked array? HOT 6
- With changes to chunking, `copyto!` now fails in places it used to succeed. HOT 3
- Changes to iteration over an `Unchunked` SubDiskArray? HOT 3
- unlimited dimensions in NetCDF HOT 15
- `view` with an AbstractRange (not AbstractUnitRange) is broken HOT 1
- Batch getindex is not typesafe
- Fix method ambiguities
- Handle concatenation with `cat` HOT 4
- Docs link is broken HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diskarrays.jl.