Versions tested hypothesis 6.100.4 Python 3.12.3</l

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Or alternatively record and account for the pauses using <code class="not

memory leak present even in trivial tests about hypothesis HOT 10 OPEN

pspacek commented on July 19, 2024

memory leak present even in trivial tests

from hypothesis.

Comments (10)

jobh commented on July 19, 2024 1

So, I ran the test in memray to get better measurements, and in summary I don't see strong evidence of leaks. What I think we're seeing is that memory consumption has two main drivers:

Short term, the caches. Mainly the conjecture data cache, at around 30kB per example. Bounded, mainly by CACHE_SIZE in engine.py.
Long term, DataTree which tracks previously tried data to avoid repeats, at around 1kB per example. Unbounded.

As evidence (running the stateful test above, about 70k examples):

Default run

Run with CACHE_SIZE=1000

Possible actions:

Check the sizes above with intuition, if they are much bigger than expected then it may be worth investigating further.
The DeadlineExceeded, which I guess is caused by GC, is an annoying flake. Should we perhaps force periodic gc.collect() outside of the tests, or add gc callbacks to account for the time taken?
Possibly tune down CACHE_SIZE or somehow be more selective in what we cache or retain.

@Zac-HD , @tybug , does this sound reasonable?

from hypothesis.

jobh commented on July 19, 2024 1

Good idea to compare with pre-ir! v6.98.0 below, at first I left it running over late breakfast and it didn't finish... I guess the exhaustion tracking has improved! Anyway, with number of examples set equal:

You'll notice immediately that we use a lot less memory in the post-ir world: the (probable) DataTree contribution is halved or more. Yay!

Less obviously, we produce more reference cycles, so the GC works harder. It's worth keeping an eye on this during development/migration, since data structures are harder than code to "fix" afterwards .

from hypothesis.

jobh commented on July 19, 2024

I think this is probably the conjecture data cache in internal/conjecture/engine.py. It is bounded but large, at CACHE_SIZE = 10000 examples.

There is a memory/performance trade-off here, and the benefits of this cache are mainly in the shrinking phase.

from hypothesis.

jobh commented on July 19, 2024

FWIW: Changing to CACHE_SIZE = 1000 looks like it caps the stateful_minimal test to under 50MB. But there are other large-ish caches, so to identify leaks as opposed to plus-size caches we probably need to run 5-10k examples just to get to steady state.

from hypothesis.

pspacek commented on July 19, 2024

I wanted to see when memory consumption flattens out so I've modified stateful_minimal.py.gz with max_examples=5000000 and ran it with:

$ pytest --hypothesis-show-statistics --hypothesis-explain stateful_minimal.py

Occasionally this fails with:

E               hypothesis.errors.Flaky: Inconsistent results from replaying a test case!
E                 last: VALID from None
E                 this: INTERESTING from DeadlineExceeded at /usr/lib/python3.12/site-packages/hypothesis/core.py:846

Sometimes it does finish, after about 5 minutes:

stateful_minimal.py::stateful_minimal::runTest:

  - during reuse phase (0.00 seconds):
    - Typical runtimes: ~ 1ms, of which ~ 1ms in data generation
    - 1 passing examples, 0 failing examples, 0 invalid examples

  - during generate phase (313.17 seconds):
    - Typical runtimes: ~ 0-3 ms, of which ~ 0-3 ms in data generation
    - 69508 passing examples, 0 failing examples, 0 invalid examples

  - Stopped because nothing left to do

At this point it ate 440 MB of memory.

Here's memory consumption over time, with 10 second sampling interval:

RssAnon:	  24320 kB
RssAnon:	  93056 kB
RssAnon:	 162176 kB
RssAnon:	 229632 kB
RssAnon:	 291840 kB
RssAnon:	 337616 kB
RssAnon:	 354128 kB
RssAnon:	 355280 kB
RssAnon:	 355280 kB
RssAnon:	 355360 kB
RssAnon:	 355360 kB
RssAnon:	 355360 kB
RssAnon:	 355360 kB
RssAnon:	 354792 kB
RssAnon:	 354792 kB
RssAnon:	 355816 kB
RssAnon:	 361192 kB
RssAnon:	 366056 kB
RssAnon:	 371176 kB
RssAnon:	 375656 kB
RssAnon:	 382952 kB
RssAnon:	 389480 kB
RssAnon:	 393064 kB
RssAnon:	 403048 kB
RssAnon:	 411624 kB
RssAnon:	 421224 kB
RssAnon:	 429544 kB
RssAnon:	 435688 kB
RssAnon:	 439912 kB
RssAnon:	 450792 kB

(Generated with trivial while true; do grep RssAnon /proc/`pgrep -f unittest`/status; sleep 10; done.)

It is possible that it caches all 69508 examples? Or why it should be still consuming more memory?

Please note I'm not saying this is some huge problem - at least until we get to hypofuzzing state machines (which is the original motivation for my experiment).

from hypothesis.

jobh commented on July 19, 2024

Hm. I wonder if that DeadlineExceeded might be from a GC run. Also wonder if that post-plateau growth would be influenced by forcing GC occasionally?

from hypothesis.

tybug commented on July 19, 2024

I think either of you has a better intuition for memory size than I. I would say it is very possible memory usage has increased recently due to ir-related churn and we should do a detailed pass one things settle. In particular 1kb/example for DataTree sounds a bit high to me. I wonder what these graphs look like pre-ir; say v6.98.0.

from hypothesis.

Zac-HD commented on July 19, 2024

That's larger than I was expecting, but makes sense once I actually think about the data structures and objects involved. I agree that it's plausible the IR migration is causing temporarily higher memory usage, both because we're carrying around two representations now, and because the IR hasn't really been memory-optimized yet. Not worth trying to fix that before we finish migrating though.

Adding gc.collect() before each (or each 1000th) invocation of the user code isn't a terrible idea, but I'm not sure it'd avoid occasional gc pauses anyway - for that you'd have to call gc.freeze(), and that's a semantics change I don't want.

from hypothesis.

jobh commented on July 19, 2024

[...] it's plausible the IR migration is causing temporarily higher memory usage, both because we're carrying around two representations now, and because the IR hasn't really been memory-optimized yet. Not worth trying to fix that before we finish migrating though.

Yep, agree on this (with both of you). We can revisit afterwards - for the record, if nothing else.

Adding gc.collect() before each (or each 1000th) invocation of the user code isn't a terrible idea, but I'm not sure it'd avoid occasional gc pauses anyway - for that you'd have to call gc.freeze(), and that's a semantics change I don't want.

Or alternatively record and account for the pauses using gc.callbacks. Worth it? I don't know, but it is likely to hit any test with high example count and default deadline setting.

from hypothesis.

Zac-HD commented on July 19, 2024

Or alternatively record and account for the pauses using gc.callbacks. Worth it? I don't know, but it is likely to hit any test with high example count and default deadline setting.

Oh, that's a better idea, we can attribute gc time correctly in statistics and observability output too.

from hypothesis.

memory leak present even in trivial tests about hypothesis HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs