Here is the reproducible example <div class="highlight highlight-source-r notransl

This is great <a class="user-mention notranslate" data-hovercard-type="user" data-hove

fail to save big gs about cytolib HOT 5 CLOSED

rglab commented on July 25, 2024

fail to save big gs

from cytolib.

Comments (5)

mikejiang commented on July 25, 2024 1

I ended up implemented the second distributed approach, i.e. one pb file per sample. Now it saves ok for big gs

> save_gs(gs_big, tmp)
Done
To reload it, use 'load_gs' function

> list.files(tmp)
  [1] "90b6757a-26ab-4158-bfd2-fb4272fd1054.pb" "s1.h5"                                  
  [3] "s1.pb"                                   "s10.h5"                                 
  [5] "s10.pb"                                  "s100.h5"                                
  [7] "s100.pb"                                 "s11.h5"                                 
...
[195] "s96.pb"                                  "s97.h5"                                 
[197] "s97.pb"                                  "s98.h5"                                 
[199] "s98.pb"                                  "s99.h5"                                 
[201] "s99.pb"

And sub-loading is more efficient than before

> system.time(gs1 <- load_gs(tmp, select = c("s1", "s100")))
   user  system elapsed 
  2.290   0.068   2.382 
> sampleNames(gs1)
[1] "s1"   "s100"

from cytolib.

mikejiang commented on July 25, 2024

This buffer size limitation was introduced by switching to protobuf-lite (RGLab/RProtoBufLib#6 (comment)), which doesn't support iostream and thus imposes the size restriction from using StringOutputStream wrapped over single string buffer

from cytolib.

jacobpwagner commented on July 25, 2024

I see. Is it worth switching back then? I kept pretty detailed notes on minimization of the protobuf bundle, so if we want to do that again after moving back to the full library, it should be reasonably quick.

from cytolib.

mikejiang commented on July 25, 2024

Yeah, switching back to full version of protobuf will be one quick solution. There are two other alternatives, which require the change of the existing message format

still save to the single pb file , but with multiple string buffer writes to the same file preceded by a small int byte that records each buffer size (so that they can be reloaded by multiple buffer reads)
write each gh(i.e. sample) to its own pb file

The second approach will be potentially good for concurrent loading as well as efficient sub-loading through select argument (i.e. load_gs(path, select = c(1:3))) since it no longer has to the load and parse the entire message for all samples.

Either of the two could still fail theoretically if the single sample reaches the same buffer limit (when the total number of gates are huge and events number is large enough). This probably would not happen practically. (Or I could be wrong on this, given the nature of faust application)

Anyway, in the short run, I will do the switching. The discussion above is for the record in future.

from cytolib.

DillonHammill commented on July 25, 2024

This is great @mikejiang!

from cytolib.

Recommend Projects

fail to save big gs about cytolib HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs