GithubHelp home page GithubHelp logo

About this package about diskarrays.jl HOT 4 OPEN

meggart avatar meggart commented on July 17, 2024
About this package

from diskarrays.jl.

Comments (4)

meggart avatar meggart commented on July 17, 2024 1

Moving code from Zarr here sounds like a good strategy. Also I've just fixed HDF5 indexing interface and written one for ArchGDAL, which could help integrate those too at some stage. We should aim high for this :)

Yes I was hoping this could help more packages, and as soon as we demonstrate this has some value we can try to make more packages backed by this.

In terms of extra features I was thinking it would be great to handle broadcast?, reducing methods, and show in sane ways, so you can run code written for Array on AbstractDiskArray and it works without crashing/stalling? So summing a disk-based array should sum chunk by chunk so larger than ram files just work without ever having to think about it.

Yes, this would definitely be on the roadmap. I think, in order to tackle low-hanging fruits first I would start with show and reduce. I think broadcast can already get a bit tricky, because multiple arrays are involved and you have to start thinking about chunk alignment and other tricky things. Since I have implemented a lot of this functionality already in ESDL (though in a much less principled way) I would not put priority into this, but would definitely support anyone else trying to tackle this.

I would also be happy to move any other pieces from GeoData that you think might be useful to a broader class of disk-based arrays. So far I was a bit hesitant to subtype AbstractArray, but maybe it would make sense to be ambitious and simply show that we finally want to provide a complete array experience, so I don't mind doing this.

from diskarrays.jl.

rafaqz avatar rafaqz commented on July 17, 2024

Thanks for starting this. I haven't done anything yet, I have a lot of packages mid-way to release so my capacity is pretty low for a while, but I would like to contribute to this, especially if it can remove code from GeoData.jl.

Moving code from Zarr here sounds like a good strategy. Also I've just fixed HDF5 indexing interface and written one for ArchGDAL, which could help integrate those too at some stage. We should aim high for this :)

In terms of extra features I was thinking it would be great to handle broadcast?, reducing methods, and show in sane ways, so you can run code written for Array on AbstractDiskArray and it works without crashing/stalling? So summing a disk-based array should sum chunk by chunk so larger than ram files just work without ever having to think about it.

view is also interesting... I've had to write a bunch of windowing code in GeoData.jl to deal with lazy loading propagating views from stacks to arrays, and SubArrays not working with non-Arrays. So it would be good if DiskArray <: AbstractArray and we have methods to cover where Array methods break with disk based arrays.

from diskarrays.jl.

meggart avatar meggart commented on July 17, 2024

I forgot to mention, in order to implement reductions, we would need some concept of chunking. Do you think we should make that part of this package (i.e. move code from ChunkedArrayBase here) or shall we depend on this package?

from diskarrays.jl.

rafaqz avatar rafaqz commented on July 17, 2024

I agree broadcast will be the hardest part, that's a good idea to leave it until last.

I was imagining chunking would be integral to a lot of this too, but I'm not sure how your packages work.. but just depending on ChunkedArrayBase could be fine? maybe a lot of these methods would even be in chunked array base? I'm not sure what the best plan is there.

from diskarrays.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.