Comments (2)
I'd guess the problem here is that you're imposing several interacting constraints, by picking a num_rows
and num_cols
up front. It's more efficient to let Hypothesis handle that internally, where it can incrementally choose whether to add each additional row. I think an efficient version of your strategy would look something like:
from hypothesis import strategies as st
from hypothesis.extra.pandas import column, data_frames, indexes
_index_strategy = st.one_of(st.text(), st.integers(), st.floats())
"""A strategy for generating dataframe indexes."""
@st.composite
def random_dataframes(draw: st.DrawFn):
"""Generate pandas dataframes of random shape and content."""
_col_strategy = st.builds(column, name=_index_strategy, dtype=st.sampled_from([int, float, str]))
columns = draw(st.lists(_col_strategy, max_size=5, unique_by=lambda c: c.name))
index_strategy = indexes(elements=_index_strategy, max_size=2)
return draw(data_frames(columns=columns, index=index_strategy))
and omitting the index_strategy
(in favor of setting the first column as the index) and constraints on the number of rows would probably make this even faster.
from hypothesis.
Hi @Zac-HD, thanks a lot for your answer. Now that you explained the problem, it makes a lot of sense to me, and it indeed solved the issue! I was wondering: wouldn't it make sense to link something like the document you shared also in the hypothesis documentation? In fact, I was looking several times for resources about the used shrinking concepts to better understand what is happening behind the scenes (it would have probably helped my prevent the problem in the first place) but I couldn't really find much ...
from hypothesis.
Related Issues (20)
- Improve our internal coverage tests HOT 3
- Error when using from_type with optional integers with numeric constraints HOT 8
- Follow up on IR shrinking tasks
- `st.from_regex()` alphabet improvements
- Busy loop randomly runs 6x slower causing flaky DeadlineExceeded errors HOT 5
- Issues with django.forms.ModelChoiceField and ModelMultipleChoiceField HOT 1
- example generation regression between `6.47.0` -> `6.103.1` HOT 1
- `hypothesis.extra.pandas`: generate timezone-aware datetime columns
- Warning from tracer causes Flaky HOT 1
- Interest in a phone number strategy? HOT 1
- Improve testing story for Python 3.14 and free-threading builds
- `hypothesis codemod` doesn't update `Healthcheck.all()`
- Handle Django upgrades like Python versions in `./build.sh upgrade-requirements`
- Failing test for Django 5.0 HOT 1
- Using `builds` arguments for reprs may produce worse results than pretty printing HOT 2
- Improve error message when a package only has submodules for ghostwriter HOT 3
- Improve support for new and user-defined Numpy dtypes (e.g. `np.dtypes.StringDType`)
- Change Flaky to be an ExceptionGroup
- Error while patching time.perf_counter HOT 1
- Thread safety when usage with Pytest, and Coverage cause unexpected hang at exit
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hypothesis.