The current Python implementation of the behaviour HeapManager when there is memory pressure is a clear bottleneck of the system.
Although this only affects applications that are memory-space-bounded (which, arguably, are not a lot right now), it may become relevant at some point.
The current strategy is looking into the system memory usage and react accordingly by serializing objects one by one, and checking when the system is not under memory pressure anymore.
- We may want to use Python interpreter memory usage instead of the system memory usage.
This is specially relevant in deployments in which there are several Execution Environments: if there is unbalance between them, it doesn't make sense that all of them start evicting objects at the same time.
- We may improve performance by doing several store objects in parallel.
When there is memory pressure, we will probably need to serialize (in order to evict from memory) a bunch of objects. So parallelizing that is an obvious HPC strategy, given that there are a lot of I/O overheads and locking calls.
- We should improve the stop-I-don't-have-enough-memory
When the memory is filled, some applications may generate new data faster than the eviction rate. This can result in OOM errors (it has happened to me). I did some tests on adding blocking to the server side, but a proper gRPC error of not enough resources like a RESOURCE_EXHAUSTED
status code may be cleaner and more scalable.
- We may want to add hysteresis to the memory pressure threshold
I have some dirty workaround which consists in using a MEMORY_EASE
threshold. When the eviction of objects start, instead of comparing against the memory pressure threshold, it compares to this MEMORY_EASE
. This ensures that there is more margin. Also, if we implement the RESOURCE_EXHAUSTED
approach (see above), we may use that status code for the memory presusre threshold but keep evicting until the memory ease threshold is reached.