Comments (6)
Finally the issue got solved through merging #59. Starting from your example:
z = rand(100,100,1000)
z[3,5,:]
z[5,9,:]
z[10,6,:]
You can now write as
z[[CartesianIndex((3,5)), CartesianIndex((5,9)),CartesianIndex((10,6))]]
and DiskArrays will make sure every affected chunks will be accessed only once, so you have optimal read performance from remote sources. ALternatively you can do:
mask = falses(100,100)
mask[3,5] = true
mask[5,9] = true
mask[10,6] = true
z[mask,:]
which will in the end access the same machinery as the example mentioned above.
from diskarrays.jl.
Hi @alex-s-gardner I would really like do dig into this right now, but due to private issues + some project meetings this week I have to postpone this until next week. In case you don't hear anything back, feel free to ping me to remind me of the issue.
from diskarrays.jl.
Digging a bit more I've come up with the following:
mask = falses(size(z))
mask[[CartesianIndex((1,5)),CartesianIndex((5,9)),CartesianIndex((10,6))],:] .= true
z[mask]
is this the most efficient approach?
from diskarrays.jl.
In my specific case indexing using a logical array is extremely inefficient which leaves me without a practical solution:
mask = falses(size(foo["var"]))
mask[1, 1, :] .= true
foo["var"][mask]
takes 30 seconds to read in and:
foo["var"][1,1,:]
takes 0.5 seconds to read
from diskarrays.jl.
Just a note that the above should have been written as:
z[[CartesianIndex((3,5)), CartesianIndex((5,9)),CartesianIndex((10,6))], :]
from diskarrays.jl.
@meggart I noticed the the dimensions get moved around in unintuitive ways
dc = Zarr.zopen(path2zarr)
size(dc["v"])
(834, 834, 84396)
r = [1 4 6 1]
c = [7 8 9 3]
cartind = CartesianIndex.(r,c)
size(dc["v"][cartind, :])
(1, 84396, 4)
cartind = CartesianIndex.(r',c')
size(dc["v"][cartind, :])
(4, 84396, 1)
This is different behavior that non DiskArrays
z = rand(100,100,1000)
cartind = CartesianIndex.(r,c)
size(z[cartind,:])
(1, 4, 1000)
cartind = CartesianIndex.(r',c')
size(z[cartind,:])
(4, 1, 1000)
from diskarrays.jl.
Related Issues (20)
- Don't load whole chunk when you don't need to, e.g. with NetCDF HOT 7
- Indxing type stability HOT 5
- Indexing with vectors can give the wrong result HOT 1
- Remove length from Stateful Iterators HOT 3
- map is broken, includeing missing collect_similar for DiskGenerator HOT 3
- vec(view(c, :, :, 1)) fails: reshape is restricted to adding singleton dimensions HOT 7
- NetCDF tests fail on DiskArray main HOT 1
- Indexing with not enough dimensions does not error HOT 2
- Add a lazy `stack` method HOT 1
- @assume_effects :removable breaks compilation on Julia 1.9. HOT 5
- ArchGDAL tests on julia 1.6 fail because `@kwdef` is not defined HOT 10
- Profile inference for DiskArrays indexing and other tasks. HOT 2
- numind and CartesianIndex HOT 1
- copyto! fails when one of the axes has a zero length index
- `setindex_disk_nobatch!` writes `undef` values for `StepRange` indices
- `zip` triggers lots of invalidations HOT 7
- Add `IntvertibleBroadcastDiskArray`
- Changes in v0.4.3 seem to break upstream packages HOT 1
- Create an additional interface for defining chunks for asynchronous IO HOT 2
- ArgumentError: Unable to determine chunksize of non-range views. HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diskarrays.jl.