Comments (10)
So, I ran the test in memray to get better measurements, and in summary I don't see strong evidence of leaks. What I think we're seeing is that memory consumption has two main drivers:
- Short term, the caches. Mainly the conjecture data cache, at around 30kB per example. Bounded, mainly by
CACHE_SIZE
inengine.py
. - Long term,
DataTree
which tracks previously tried data to avoid repeats, at around 1kB per example. Unbounded.
As evidence (running the stateful test above, about 70k examples):
Possible actions:
- Check the sizes above with intuition, if they are much bigger than expected then it may be worth investigating further.
- The
DeadlineExceeded
, which I guess is caused by GC, is an annoying flake. Should we perhaps force periodicgc.collect()
outside of the tests, or add gc callbacks to account for the time taken? - Possibly tune down
CACHE_SIZE
or somehow be more selective in what we cache or retain.
@Zac-HD , @tybug , does this sound reasonable?
from hypothesis.
Good idea to compare with pre-ir! v6.98.0
below, at first I left it running over late breakfast and it didn't finish... I guess the exhaustion tracking has improved! Anyway, with number of examples set equal:
You'll notice immediately that we use a lot less memory in the post-ir world: the (probable) DataTree
contribution is halved or more. Yay!
Less obviously, we produce more reference cycles, so the GC works harder. It's worth keeping an eye on this during development/migration, since data structures are harder than code to "fix" afterwards .
from hypothesis.
I think this is probably the conjecture data cache in internal/conjecture/engine.py
. It is bounded but large, at CACHE_SIZE = 10000
examples.
There is a memory/performance trade-off here, and the benefits of this cache are mainly in the shrinking phase.
from hypothesis.
FWIW: Changing to CACHE_SIZE = 1000
looks like it caps the stateful_minimal
test to under 50MB. But there are other large-ish caches, so to identify leaks as opposed to plus-size caches we probably need to run 5-10k examples just to get to steady state.
from hypothesis.
I wanted to see when memory consumption flattens out so I've modified stateful_minimal.py.gz with max_examples=5000000
and ran it with:
$ pytest --hypothesis-show-statistics --hypothesis-explain stateful_minimal.py
Occasionally this fails with:
E hypothesis.errors.Flaky: Inconsistent results from replaying a test case!
E last: VALID from None
E this: INTERESTING from DeadlineExceeded at /usr/lib/python3.12/site-packages/hypothesis/core.py:846
Sometimes it does finish, after about 5 minutes:
stateful_minimal.py::stateful_minimal::runTest:
- during reuse phase (0.00 seconds):
- Typical runtimes: ~ 1ms, of which ~ 1ms in data generation
- 1 passing examples, 0 failing examples, 0 invalid examples
- during generate phase (313.17 seconds):
- Typical runtimes: ~ 0-3 ms, of which ~ 0-3 ms in data generation
- 69508 passing examples, 0 failing examples, 0 invalid examples
- Stopped because nothing left to do
At this point it ate 440 MB of memory.
Here's memory consumption over time, with 10 second sampling interval:
RssAnon: 24320 kB
RssAnon: 93056 kB
RssAnon: 162176 kB
RssAnon: 229632 kB
RssAnon: 291840 kB
RssAnon: 337616 kB
RssAnon: 354128 kB
RssAnon: 355280 kB
RssAnon: 355280 kB
RssAnon: 355360 kB
RssAnon: 355360 kB
RssAnon: 355360 kB
RssAnon: 355360 kB
RssAnon: 354792 kB
RssAnon: 354792 kB
RssAnon: 355816 kB
RssAnon: 361192 kB
RssAnon: 366056 kB
RssAnon: 371176 kB
RssAnon: 375656 kB
RssAnon: 382952 kB
RssAnon: 389480 kB
RssAnon: 393064 kB
RssAnon: 403048 kB
RssAnon: 411624 kB
RssAnon: 421224 kB
RssAnon: 429544 kB
RssAnon: 435688 kB
RssAnon: 439912 kB
RssAnon: 450792 kB
(Generated with trivial while true; do grep RssAnon /proc/`pgrep -f unittest`/status; sleep 10; done
.)
It is possible that it caches all 69508 examples? Or why it should be still consuming more memory?
Please note I'm not saying this is some huge problem - at least until we get to hypofuzzing state machines (which is the original motivation for my experiment).
from hypothesis.
Hm. I wonder if that DeadlineExceeded
might be from a GC run. Also wonder if that post-plateau growth would be influenced by forcing GC occasionally?
from hypothesis.
I think either of you has a better intuition for memory size than I. I would say it is very possible memory usage has increased recently due to ir-related churn and we should do a detailed pass one things settle. In particular 1kb/example for DataTree
sounds a bit high to me. I wonder what these graphs look like pre-ir; say v6.98.0
.
from hypothesis.
That's larger than I was expecting, but makes sense once I actually think about the data structures and objects involved. I agree that it's plausible the IR migration is causing temporarily higher memory usage, both because we're carrying around two representations now, and because the IR hasn't really been memory-optimized yet. Not worth trying to fix that before we finish migrating though.
Adding gc.collect()
before each (or each 1000th) invocation of the user code isn't a terrible idea, but I'm not sure it'd avoid occasional gc pauses anyway - for that you'd have to call gc.freeze()
, and that's a semantics change I don't want.
from hypothesis.
[...] it's plausible the IR migration is causing temporarily higher memory usage, both because we're carrying around two representations now, and because the IR hasn't really been memory-optimized yet. Not worth trying to fix that before we finish migrating though.
Yep, agree on this (with both of you). We can revisit afterwards - for the record, if nothing else.
Adding
gc.collect()
before each (or each 1000th) invocation of the user code isn't a terrible idea, but I'm not sure it'd avoid occasional gc pauses anyway - for that you'd have to callgc.freeze()
, and that's a semantics change I don't want.
Or alternatively record and account for the pauses using gc.callbacks
. Worth it? I don't know, but it is likely to hit any test with high example count and default deadline setting.
from hypothesis.
Or alternatively record and account for the pauses using
gc.callbacks
. Worth it? I don't know, but it is likely to hit any test with high example count and default deadline setting.
Oh, that's a better idea, we can attribute gc time correctly in statistics and observability output too.
from hypothesis.
Related Issues (20)
- Filter-rewriting for comparisons on dates, times, and datetimes
- `test_drawing_from_recursive_strategy_is_thread_safe` failed on Python 3.13.0b1
- Improve our internal coverage tests HOT 3
- Error when using from_type with optional integers with numeric constraints HOT 8
- Follow up on IR shrinking tasks
- `st.from_regex()` alphabet improvements
- Busy loop randomly runs 6x slower causing flaky DeadlineExceeded errors HOT 5
- Issues with django.forms.ModelChoiceField and ModelMultipleChoiceField HOT 1
- example generation regression between `6.47.0` -> `6.103.1` HOT 1
- `hypothesis.extra.pandas`: generate timezone-aware datetime columns
- Warning from tracer causes Flaky HOT 1
- Interest in a phone number strategy? HOT 1
- Improve testing story for Python 3.14 and free-threading builds
- `hypothesis codemod` doesn't update `Healthcheck.all()`
- Handle Django upgrades like Python versions in `./build.sh upgrade-requirements`
- Failing test for Django 5.0 HOT 1
- Using `builds` arguments for reprs may produce worse results than pretty printing HOT 2
- Improve error message when a package only has submodules for ghostwriter HOT 3
- Improve support for new and user-defined Numpy dtypes (e.g. `np.dtypes.StringDType`)
- Change Flaky to be an ExceptionGroup
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hypothesis.