Comments (11)
The current API is like
model.RunModel(100)
model.OptimizeMemoryAndReset()
model.data = bigger_data
model.RunModel(100)
This issue is basically saying that we should do it like this -
model.RunModel(100)
model.GenerateActions()
model.data = bigger_data
model.ExecuteActions()
Is the status quo fine
Yes, there will be cache misses if you mess up, but it will not impact the actual results. I think it is reasonably easy to tell if you have messed up as well. Check the cache/cache misses and it will be clear.
How beneficial is the proposed change
I think it is nice to have, but it doesn't necessarily allow the user to do anything that they can't do otherwise.
Why not fix
It is nice but not necessary.
from heavylight.
My view (and I've been pretty deliberate on this in the basic heavylight library) is that clearing caches is dangerous and hard to get right, so I have just not allowed it. Instead if I need a lot of mps run I plan to run in batches (and spread over cpus using ray or similar), so I think the problem is more batch optimisation (for me).
If we did want to clear cache, I think because every model is generally expressed in terms of t
and t - 1
, (and t = 0 is the initial data), then we could just flush t - 2
values, excluding those we want to maintain for storage? (i.e. with the @store
or @agg
decorators)
from heavylight.
(thanks for re-opening!) I think a lot of model frameworks get tied up trying to project and discount in the same model (e.g. func(t)
depends on t+1
) - if we don't discount until the cashflow is fully projected then it makes this kind of issue much simpler to resolve?
from heavylight.
Reopen for discussion.
1. clearing the cache is a great idea
I don't think clearing the cache is dangerous, because you clear the whole cache before runs and everything will turn out right. Clearing half the cache and then changing model.data, bad idea. Clearing the whole cache, always works.
2. batches
I've thought about running in batches but it really will be much more performant to clear the cache during model execution. Because the batches can be like 1000x bigger. Initial experiments show a 1000x reduction in memory consumption for BasicTerm_ME. Also, using the big arrays will make a big difference on GPU some preliminary tests showed no speedup for 10k modelpoints but good speedups for 100k modelpoints.
I can run 10 million modelpoints on my mac with the optimizations and there is no memory pressure, the cache size is .4 GB, this is preferable to setting up a computing cluster that can handle 400gb at once, or running ~30 batches without the optimization.
3. Clearing caches based on t-2
The problem with flushing t-2
values is that it isn't really a general strategy. when the user gives a func(t: int, timing: str)
it will be hard to clear. And it will still be unfun to implement (have implemented this before), so rather do it in a way that is general.
4. Reopening the ticket
If functions clearing caches of other functions after execution based on the internal state of model.cache_graph.can_clear
is very icky (it is) and you want something better we can migrate to an approach that has no impact on the caching of functions when executed normally.
I will probably reset cache before RunModel or ExecuteActions so that the results are always reflecting the model state at the time of the function call.
from heavylight.
(thanks for re-opening!) I think a lot of model frameworks get tied up trying to project and discount in the same model (e.g.
func(t)
depends ont+1
) - if we don't discount until the cashflow is fully projected then it makes this kind of issue much simpler to resolve?
can you elaborate on discounting and how it relates to size of cache?
from heavylight.
I've done a very crude cache_clear in this branch/code - just to show the rough principle (and subsequent cache misses will look horrific rather than regenerating the cache, deleting keys would fix this rather than setting them to None): https://github.com/lewisfogden/heavylight/blob/dev_mem_opt/src/heavylight/examples/protection/run_model_np.py
Discounting: Probably doesn't matter from the way we are expecting people to model, it's more an issue if you were to write a model that refers to t+1
as well as t-1
then you can't predict cache misses without doing a full model network graph etc.
from heavylight.
on 1: agree fully clearing cache is good & necessary (using class instances does this automatically and reduces the risk).
From reading generate_actions etc, this looks like it does a single (or small) run, to evaluate the run order, then follows this order for the complete run, and once the value has been viewed for the last time, drops it from the cache? This is almost like reference counting in CPython etc to track object use and deferencing.
I'm going to have a good play with your optimising code, I've only briefly skimmed it and run a few examples :)
from heavylight.
yeah people have to write the code a certain way to optimize the cache anyhow. can't sum(pols_if(t) for t in range)
cause then nothing will evict
The implementation you provide is pretty sensible. But it isn't super precise on the condition for evicting from cache, and it might be like 2-10x as much memory consumed.
from heavylight.
so what I made this ticket for is this:
Instead of making the user call the functions in the exact same order to avoid unintentional cache misses (if they call RunModel twice with same proj len this will happen so no worries?) what if we saved the run order that they had and then replayed it?
That is the gist of my intention here.
from heavylight.
Yeah it makes sense, particularly for using vectorised calculations where there aren't any branches (due to use of np.where
for conditionals rather than if:elif:else
) and the order is completely deterministic.
from heavylight.
yep that is a caveat.
I'm going to go prioritize this just so the implementation is as clean as it can be.
from heavylight.
Related Issues (20)
- What utility/support items do we need? HOT 4
- Finish setting up codecov HOT 1
- Create automation around releases HOT 3
- Begin to host documentation site HOT 11
- Update codecov badge HOT 2
- Issue with negative indexes HOT 4
- Memory optimization, not optimal HOT 1
- write tests
- `_Cache.values` returns insertion order, not time order. HOT 2
- method level aggregation on LightModel HOT 3
- Release 1.0.6 HOT 10
- ban kwargs? HOT 3
- source code generator compatible with AI compilers HOT 1
- Colab Notebook doesn't run? HOT 3
- Differences between `Model` and `LightModel` HOT 4
- What is ideal behavior for band lookups above max value? HOT 1
- Thoughts on bringing heavylight under a single API? HOT 7
- renaming the package HOT 4
- Methods are sorted alphabetically when cached, change to sort by order defined HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from heavylight.