GithubHelp home page GithubHelp logo

Comments (7)

benjeffery avatar benjeffery commented on September 26, 2024 1

I'm hitting this exact same exception - the encode keeps going which isn't ideal.

from bio2zarr.

jeromekelleher avatar jeromekelleher commented on September 26, 2024 1

I pushed an update in #80 which should help debug this @shz9. If you run encode with -v it should give you helpful messages about the minimum RAM required per array.

Doing things better will require some refactoring, which we should probably do as part of making the encode job work in parallel over a cluster.

from bio2zarr.

jeromekelleher avatar jeromekelleher commented on September 26, 2024

I just hit the same issue and it was due to the worker getting killed by the OOM killer.

I suspect what happened here is that you had just enough memory reserved for 4 workers for all the fields except PL. These fields are huge (each chunk is nearly 1GB), so I'm not surprised the cluster killed it.

This is not obvious, so we should potentially intercept the BrokenProcessPool exception in the main process and add an informative mesage like "you probably ran out of memory".

I think the simplest thing for now is to just remove PL from your experiments. Edit the schema JSON and delete the PL field, and it should all work fine.

Also a general question related to this: Do you think it's possible to pick up the encoding work from where it left off if things like this happen instead of starting over?

I think that's closely related to how we're going to split this up into manageable bits for cluster scheduling. See #71 and #77 for discussions on how we're doing this for explode (and I think some high-level discussion about encode too).

from bio2zarr.

jeromekelleher avatar jeromekelleher commented on September 26, 2024

Good to know, I think I know how to fix

from bio2zarr.

shz9 avatar shz9 commented on September 26, 2024

In addition to the informative exception, do you think it'd be possible to allow the user to set a --max-memory flag based on which we can determine memory-friendly chunksizes for the encoding stage? We can re-chunk afterwards for optimal compression if needed? Alternatively, if we don't want to change the chunksizes, we can automatically reduce the number of workers for arrays that may have large chunks?

If it's of interest, I have this function that determines chunking patterns based on on number of cores / data type:
https://github.com/shz9/magenpy/blob/579504c7cd8a61808ab8b880e1627ef3ffe5fc8d/magenpy/stats/ld/utils.py#L547

def optimize_chunks_for_memory(chunked_array, cpus=None, max_mem=None):
    """
    Determine optimal chunks that fit in max_mem. Max_mem should be numerical in GiB
    Modified from: Sergio Hleap
    """

    import psutil
    import dask.array as da

    if cpus is None:
        cpus = psutil.cpu_count()

    if max_mem is None:
        max_mem = psutil.virtual_memory().available / (1024.0 ** 3)

    chunk_mem = max_mem / cpus
    chunks = da.core.normalize_chunks(f"{chunk_mem}GiB", shape=chunked_array.shape, dtype=chunked_array.dtype)

    return chunked_array.chunk(chunks)

from bio2zarr.

jeromekelleher avatar jeromekelleher commented on September 26, 2024

Ooh, max-memory is a great idea! We could associate a memory value with each future (say 3 times the number of bytes in one chunk of the array) and then stop submitting when the total for the outstanding futures exceeds this. I expect this would work quite well, especially if we try and mix up the big chunks with smaller ones.

We should follow this up in a separate issue

from bio2zarr.

jeromekelleher avatar jeromekelleher commented on September 26, 2024

Closing this as we've added the max-memory argument as well.

from bio2zarr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.