If we optimize the model it basically says that after executing a function, then clear

The current API is like <div class="highlight highlight-source-python notranslate

Memory optimized runs, `generate_actions` and `execute_actions` about heavylight HOT 11 CLOSED

MatthewCaseres commented on September 18, 2024

Memory optimized runs, `generate_actions` and `execute_actions`

from heavylight.

Comments (11)

MatthewCaseres commented on September 18, 2024

The current API is like

model.RunModel(100)
model.OptimizeMemoryAndReset()
model.data = bigger_data
model.RunModel(100)

This issue is basically saying that we should do it like this -

model.RunModel(100)
model.GenerateActions()
model.data = bigger_data
model.ExecuteActions()

Is the status quo fine

Yes, there will be cache misses if you mess up, but it will not impact the actual results. I think it is reasonably easy to tell if you have messed up as well. Check the cache/cache misses and it will be clear.

How beneficial is the proposed change

I think it is nice to have, but it doesn't necessarily allow the user to do anything that they can't do otherwise.

Why not fix

It is nice but not necessary.

from heavylight.

lewisfogden commented on September 18, 2024

My view (and I've been pretty deliberate on this in the basic heavylight library) is that clearing caches is dangerous and hard to get right, so I have just not allowed it. Instead if I need a lot of mps run I plan to run in batches (and spread over cpus using ray or similar), so I think the problem is more batch optimisation (for me).

If we did want to clear cache, I think because every model is generally expressed in terms of t and t - 1, (and t = 0 is the initial data), then we could just flush t - 2 values, excluding those we want to maintain for storage? (i.e. with the @store or @agg decorators)

from heavylight.

lewisfogden commented on September 18, 2024

(thanks for re-opening!) I think a lot of model frameworks get tied up trying to project and discount in the same model (e.g. func(t) depends on t+1) - if we don't discount until the cashflow is fully projected then it makes this kind of issue much simpler to resolve?

from heavylight.

MatthewCaseres commented on September 18, 2024

Reopen for discussion.

1. clearing the cache is a great idea

I don't think clearing the cache is dangerous, because you clear the whole cache before runs and everything will turn out right. Clearing half the cache and then changing model.data, bad idea. Clearing the whole cache, always works.

2. batches

I've thought about running in batches but it really will be much more performant to clear the cache during model execution. Because the batches can be like 1000x bigger. Initial experiments show a 1000x reduction in memory consumption for BasicTerm_ME. Also, using the big arrays will make a big difference on GPU some preliminary tests showed no speedup for 10k modelpoints but good speedups for 100k modelpoints.

I can run 10 million modelpoints on my mac with the optimizations and there is no memory pressure, the cache size is .4 GB, this is preferable to setting up a computing cluster that can handle 400gb at once, or running ~30 batches without the optimization.

3. Clearing caches based on t-2

The problem with flushing t-2 values is that it isn't really a general strategy. when the user gives a func(t: int, timing: str) it will be hard to clear. And it will still be unfun to implement (have implemented this before), so rather do it in a way that is general.

4. Reopening the ticket

If functions clearing caches of other functions after execution based on the internal state of model.cache_graph.can_clear is very icky (it is) and you want something better we can migrate to an approach that has no impact on the caching of functions when executed normally.

I will probably reset cache before RunModel or ExecuteActions so that the results are always reflecting the model state at the time of the function call.

from heavylight.

MatthewCaseres commented on September 18, 2024

(thanks for re-opening!) I think a lot of model frameworks get tied up trying to project and discount in the same model (e.g. func(t) depends on t+1) - if we don't discount until the cashflow is fully projected then it makes this kind of issue much simpler to resolve?

can you elaborate on discounting and how it relates to size of cache?

from heavylight.

lewisfogden commented on September 18, 2024

I've done a very crude cache_clear in this branch/code - just to show the rough principle (and subsequent cache misses will look horrific rather than regenerating the cache, deleting keys would fix this rather than setting them to None): https://github.com/lewisfogden/heavylight/blob/dev_mem_opt/src/heavylight/examples/protection/run_model_np.py

Discounting: Probably doesn't matter from the way we are expecting people to model, it's more an issue if you were to write a model that refers to t+1 as well as t-1 then you can't predict cache misses without doing a full model network graph etc.

from heavylight.

lewisfogden commented on September 18, 2024

on 1: agree fully clearing cache is good & necessary (using class instances does this automatically and reduces the risk).

From reading generate_actions etc, this looks like it does a single (or small) run, to evaluate the run order, then follows this order for the complete run, and once the value has been viewed for the last time, drops it from the cache? This is almost like reference counting in CPython etc to track object use and deferencing.

I'm going to have a good play with your optimising code, I've only briefly skimmed it and run a few examples :)

from heavylight.

MatthewCaseres commented on September 18, 2024

yeah people have to write the code a certain way to optimize the cache anyhow. can't sum(pols_if(t) for t in range) cause then nothing will evict

The implementation you provide is pretty sensible. But it isn't super precise on the condition for evicting from cache, and it might be like 2-10x as much memory consumed.

from heavylight.

MatthewCaseres commented on September 18, 2024

so what I made this ticket for is this:

Instead of making the user call the functions in the exact same order to avoid unintentional cache misses (if they call RunModel twice with same proj len this will happen so no worries?) what if we saved the run order that they had and then replayed it?

That is the gist of my intention here.

from heavylight.

lewisfogden commented on September 18, 2024

Yeah it makes sense, particularly for using vectorised calculations where there aren't any branches (due to use of np.where for conditionals rather than if:elif:else) and the order is completely deterministic.

from heavylight.

MatthewCaseres commented on September 18, 2024

yep that is a caveat.

I'm going to go prioritize this just so the implementation is as clean as it can be.

from heavylight.

Memory optimized runs, `generate_actions` and `execute_actions` about heavylight HOT 11 CLOSED

Comments (11)

Is the status quo fine

How beneficial is the proposed change

Why not fix

1. clearing the cache is a great idea

2. batches

3. Clearing caches based on t-2

4. Reopening the ticket

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs