Comments (5)
I ended up implemented the second distributed approach, i.e. one pb file per sample. Now it saves ok for big gs
> save_gs(gs_big, tmp)
Done
To reload it, use 'load_gs' function
> list.files(tmp)
[1] "90b6757a-26ab-4158-bfd2-fb4272fd1054.pb" "s1.h5"
[3] "s1.pb" "s10.h5"
[5] "s10.pb" "s100.h5"
[7] "s100.pb" "s11.h5"
...
[195] "s96.pb" "s97.h5"
[197] "s97.pb" "s98.h5"
[199] "s98.pb" "s99.h5"
[201] "s99.pb"
And sub-loading is more efficient than before
> system.time(gs1 <- load_gs(tmp, select = c("s1", "s100")))
user system elapsed
2.290 0.068 2.382
> sampleNames(gs1)
[1] "s1" "s100"
from cytolib.
This buffer size limitation was introduced by switching to protobuf-lite
(RGLab/RProtoBufLib#6 (comment)), which doesn't support iostream
and thus imposes the size restriction from using StringOutputStream wrapped over single string buffer
from cytolib.
I see. Is it worth switching back then? I kept pretty detailed notes on minimization of the protobuf
bundle, so if we want to do that again after moving back to the full library, it should be reasonably quick.
from cytolib.
Yeah, switching back to full version of protobuf will be one quick solution. There are two other alternatives, which require the change of the existing message format
- still save to the single pb file , but with multiple string buffer writes to the same file preceded by a small int byte that records each buffer size (so that they can be reloaded by multiple buffer reads)
- write each gh(i.e. sample) to its own pb file
The second approach will be potentially good for concurrent loading as well as efficient sub-loading through select
argument (i.e. load_gs(path, select = c(1:3))
) since it no longer has to the load and parse the entire message for all samples.
Either of the two could still fail theoretically if the single sample reaches the same buffer limit (when the total number of gates are huge and events number is large enough). This probably would not happen practically. (Or I could be wrong on this, given the nature of faust
application)
Anyway, in the short run, I will do the switching. The discussion above is for the record in future.
from cytolib.
This is great @mikejiang!
from cytolib.
Related Issues (20)
- Determine whether to keep MemCytoFrame class HOT 8
- Complete documentation and build doc website
- move readonly lock from cytoframe to h5cytoframe
- cytolib installation error with Rtools 4.0 + R-testing HOT 2
- Rework time channel scaling HOT 1
- deal with redundant channels from some individual samples HOT 4
- Revert to Artistic lisence - The Fred Hutchinson software license makes CRAN packages impossible HOT 1
- load_gs requires write permissions, even with h5_readonly = TRUE HOT 2
- library(cytolib); library(mzR) crashes R 4.0.0 alpha on Linux HOT 6
- library(cytolib) crashes Rfast::rowMaxs() HOT 4
- Installation error in macOS HOT 16
- cytolib.dylib vs .so extension on macOS with R 4 HOT 7
- Loading cytolib crashes mbkmeans HOT 6
- Installation error on Ubuntu 16.04 HOT 15
- Installation failed HOT 2
- Install fails with BH 1.78.0-0 HOT 19
- cytolib install R4.1.2 on ubuntu18 failed HOT 1
- failed cytolib installation with 'no matching function for call to 'weakly_canonical'' HOT 11
- General question about H5 backed gating HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cytolib.