emdgroup / baybe Goto Github PK
View Code? Open in Web Editor NEWBayesian Optimization and Design of Experiments
Home Page: https://emdgroup.github.io/baybe/
License: Apache License 2.0
Bayesian Optimization and Design of Experiments
Home Page: https://emdgroup.github.io/baybe/
License: Apache License 2.0
When trying to install baybe in an env with Poetry, it fails with the error Package 'baybe[telemetry]' is listed as a dependency of itself.
Steps to reproduce on MacOS:
poetry --version
> Poetry (version 1.8.2)
poetry init
# create a dummy package with Poetry, Python version ~3.11.0
cat pyproject.toml
> ...
> [tool.poetry.dependencies]
> python = "~3.11.0"
>
>
> [build-system]
> requires = ["poetry-core"]
> build-backend = "poetry.core.masonry.api"
> ...
poetry add baybe
> Using version ^0.8.1 for baybe
>
> Updating dependencies
> Resolving dependencies... (0.3s)
>
> Package 'baybe[telemetry]' is listed as a dependency of itself.
When de-serializing Campaign objects, we sometimes have to deal with inf
values (for example, as boundaries to the objective target). When these values are serialized to JSON, they are replaced by Infinity
objects. Infinity
objects are actually Javascript objects, but are not supported by JSON specifications (https://web.archive.org/web/20160414190115/http://tools.ietf.org/html/rfc4627).
Python's JSON library has no problem transforming back and forth between JSON and dict objects when Infinity is present. However, some other solutions that utilize JSON (for example, MongoDB) do not understand Infinity
and cast the value to null
. There could be potential problems for other downstream consumers that strictly adhere to the JSON specifications.
Here are some resources discussing the issue and potential solutions.:
https://medium.com/the-magic-pantry/infinity-and-json-cde6df62c17c
https://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript
This issue is intended to serve as a place where minor visual issues regarding the documentation can be collected. Once enough such have been identified, they will be resolved in a corresponding PR.
Note in particular that this issue should not be used to collect major changes. Basically, if you see something in our documentation and think "Huh, that just looks a bit weird and would benefit from reformatting", then this is the place to just put it 😄
The current user guides for campaigns needs an overhaul.
The docs at https://emdgroup.github.io/baybe/ seem to be built from the main branch instead of 0.8.2. The API has changed in main so the example on the front page no longer works with the version installed from pypi.
Maybe this is intended, but here is an example of the campaign
object from the README:
Campaign(searchspace=SearchSpace(discrete=SubspaceDiscrete(parameters=[CategoricalParameter(name='Granularity', _values=('coarse', 'medium', 'fine'), encoding=<CategoricalEncoding.OHE: 'OHE'>), NumericalDiscreteParameter(name='Pressure[bar]', encoding=None, _values=(1.0, 5.0, 10.0), tolerance=0.2), SubstanceParameter(name='Solvent', data={'Solvent A': 'COC', 'Solvent B': 'CCC', 'Solvent C': 'O', 'Solvent D': 'CS(=O)C'}, decorrelate=True, encoding=<SubstanceEncoding.MORDRED: 'MORDRED'>)], exp_rep= Granularity Pressure[bar] Solvent
0 coarse 1.0 Solvent A
1 coarse 1.0 Solvent B
2 coarse 1.0 Solvent C
3 coarse 1.0 Solvent D
4 coarse 5.0 Solvent A
5 coarse 5.0 Solvent B
6 coarse 5.0 Solvent C
7 coarse 5.0 Solvent D
8 coarse 10.0 Solvent A
9 coarse 10.0 Solvent B
10 coarse 10.0 Solvent C
11 coarse 10.0 Solvent D
12 medium 1.0 Solvent A
13 medium 1.0 Solvent B
14 medium 1.0 Solvent C
15 medium 1.0 Solvent D
16 medium 5.0 Solvent A
17 medium 5.0 Solvent B
18 medium 5.0 Solvent C
19 medium 5.0 Solvent D
20 medium 10.0 Solvent A
21 medium 10.0 Solvent B
22 medium 10.0 Solvent C
23 medium 10.0 Solvent D
24 fine 1.0 Solvent A
25 fine 1.0 Solvent B
26 fine 1.0 Solvent C
27 fine 1.0 Solvent D
28 fine 5.0 Solvent A
29 fine 5.0 Solvent B
30 fine 5.0 Solvent C
31 fine 5.0 Solvent D
32 fine 10.0 Solvent A
33 fine 10.0 Solvent B
34 fine 10.0 Solvent C
35 fine 10.0 Solvent D, metadata= was_recommended was_measured dont_recommend
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
5 False False False
6 False False False
7 False False False
8 False False False
9 False False False
10 True True False
11 False False False
12 False False False
13 False False False
14 False False False
15 True True False
16 False False False
17 False False False
18 False False False
19 False False False
20 False False False
21 False False False
22 False False False
23 False False False
24 False False False
25 False False False
26 False False False
27 False False False
28 False False False
29 True True False
30 False False False
31 False False False
32 False False False
33 False False False
34 False False False
35 False False False, empty_encoding=False, constraints=[], comp_rep= Granularity_coarse Granularity_medium Granularity_fine Pressure[bar] \
0 1 0 0 1.0
1 1 0 0 1.0
2 1 0 0 1.0
3 1 0 0 1.0
4 1 0 0 5.0
5 1 0 0 5.0
6 1 0 0 5.0
7 1 0 0 5.0
8 1 0 0 10.0
9 1 0 0 10.0
10 1 0 0 10.0
11 1 0 0 10.0
12 0 1 0 1.0
13 0 1 0 1.0
14 0 1 0 1.0
15 0 1 0 1.0
16 0 1 0 5.0
17 0 1 0 5.0
18 0 1 0 5.0
19 0 1 0 5.0
20 0 1 0 10.0
21 0 1 0 10.0
22 0 1 0 10.0
23 0 1 0 10.0
24 0 0 1 1.0
25 0 0 1 1.0
26 0 0 1 1.0
27 0 0 1 1.0
28 0 0 1 5.0
29 0 0 1 5.0
30 0 0 1 5.0
31 0 0 1 5.0
32 0 0 1 10.0
33 0 0 1 10.0
34 0 0 1 10.0
35 0 0 1 10.0
Solvent_MORDRED_SpAbs_A Solvent_MORDRED_nHetero Solvent_MORDRED_ATS1dv \
0 2.828427 1.0 12.000000
1 2.828427 0.0 4.000000
2 0.000000 1.0 0.000000
3 3.464102 2.0 5.333333
4 2.828427 1.0 12.000000
5 2.828427 0.0 4.000000
6 0.000000 1.0 0.000000
7 3.464102 2.0 5.333333
8 2.828427 1.0 12.000000
9 2.828427 0.0 4.000000
10 0.000000 1.0 0.000000
11 3.464102 2.0 5.333333
12 2.828427 1.0 12.000000
13 2.828427 0.0 4.000000
14 0.000000 1.0 0.000000
15 3.464102 2.0 5.333333
16 2.828427 1.0 12.000000
17 2.828427 0.0 4.000000
18 0.000000 1.0 0.000000
19 3.464102 2.0 5.333333
20 2.828427 1.0 12.000000
21 2.828427 0.0 4.000000
22 0.000000 1.0 0.000000
23 3.464102 2.0 5.333333
24 2.828427 1.0 12.000000
25 2.828427 0.0 4.000000
26 0.000000 1.0 0.000000
27 3.464102 2.0 5.333333
28 2.828427 1.0 12.000000
29 2.828427 0.0 4.000000
30 0.000000 1.0 0.000000
31 3.464102 2.0 5.333333
32 2.828427 1.0 12.000000
33 2.828427 0.0 4.000000
34 0.000000 1.0 0.000000
35 3.464102 2.0 5.333333
Solvent_MORDRED_AATS0dv Solvent_MORDRED_ATSC2p
0 4.222222 1.072052
1 0.545455 -0.939884
2 5.333333 0.002031
3 3.844444 -3.587209
4 4.222222 1.072052
5 0.545455 -0.939884
6 5.333333 0.002031
7 3.844444 -3.587209
8 4.222222 1.072052
9 0.545455 -0.939884
10 5.333333 0.002031
11 3.844444 -3.587209
12 4.222222 1.072052
13 0.545455 -0.939884
14 5.333333 0.002031
15 3.844444 -3.587209
16 4.222222 1.072052
17 0.545455 -0.939884
18 5.333333 0.002031
19 3.844444 -3.587209
20 4.222222 1.072052
21 0.545455 -0.939884
22 5.333333 0.002031
23 3.844444 -3.587209
24 4.222222 1.072052
25 0.545455 -0.939884
26 5.333333 0.002031
27 3.844444 -3.587209
28 4.222222 1.072052
29 0.545455 -0.939884
30 5.333333 0.002031
31 3.844444 -3.587209
32 4.222222 1.072052
33 0.545455 -0.939884
34 5.333333 0.002031
35 3.844444 -3.587209 ), continuous=SubspaceContinuous(parameters=[], constraints_lin_eq=[], constraints_lin_ineq=[])), objective=Objective(mode='SINGLE', targets=[NumericalTarget(name='Yield', mode='MAX', bounds=Interval(lower=-inf, upper=inf), bounds_transform_func=None)], weights=[100.0], combine_func='GEOM_MEAN'), strategy=TwoPhaseStrategy(allow_repeated_recommendations=False, allow_recommending_already_measured=False, initial_recommender=FPSRecommender(), recommender=SequentialGreedyRecommender(surrogate_model=GaussianProcessSurrogate(model_params={}, _model=None), acquisition_function_cls='qEI', hybrid_sampler='None', sampling_percentage=1.0), switch_after=1), measurements_exp= Granularity Pressure[bar] Solvent Yield BatchNr FitNr
0 medium 1.0 Solvent D 79.8 1 NaN
1 coarse 10.0 Solvent C 54.1 1 NaN
2 fine 5.0 Solvent B 59.4 1 NaN
3 medium 1.0 Solvent D 79.8 2 NaN
4 coarse 10.0 Solvent C 54.1 2 NaN
5 fine 5.0 Solvent B 59.4 2 NaN, numerical_measurements_must_be_within_tolerance=True, n_batches_done=2, n_fits_done=0, _cached_recommendation=Empty DataFrame
Columns: []
Index: [])
This issue is created to make everybody (specifically @Scienfitz and @AdrianSosic) aware of me starting to re-write the user guide for campaigns.
The corresponding discussion will be closed, and all comments regarding this issue that I should be aware of before opening the PR can be posted here.
Thanks for sharing this work! BayBE is an amazing tool for BO and I sincerely enjoy going through the code. Amazing work, highly appreciated!
Here's one issue that I found:
When one includes operators such as ">", "<", "<=", ">=", a ValueError is given. This is probably due to the list of valid operators, which is set to be ["=", "==", "!="] in conditions.py file (_valid_tolerance_operators = ["=", "==", "!="]
). Might be worth taking a look into it (or did I miss something?)
Cheers!
The user guide for transfer learning needs to be written.
Steps to reproduce:
pip install baybe
python -c "from baybe.objectives import SingleTargetObjective"
(from README)
i do see the module in the repo, but it appears it's not packaged in the version on PyPI?
The tests that ensure an invalid config is checked failes in python 3.12 (not in others)
seems related to cattrs
also, it is possible to write a config with recomEnDer
(spelling mistake) and it will take a default instead of throwing an error
PR for Python 3.12 upgrade: #153
The user guide for targets needs an overhaul.
Hello! I get an error when trying to run simulate_experiment. The code errors here in simulation.py
searchspace = campaign.searchspace.discrete.exp_rep
missing_inds = searchspace.index[
searchspace.merge(lookup, how="left", indicator=True)["_merge"]
== "left_only"
]
campaign.searchspace.discrete.metadata.loc[
missing_inds, "dont_recommend"
] = True
The error is IndexError: boolean index did not match indexed array along dimension 0; dimension is 131220 but corresponding boolean dimension is 131227
I'm a bit confused since the error happens because the length of the missing_inds
and campaign.searchspace.discrete.metadata
don't match up, but they are both derived from campaign.searchspace.discrete
?
What might be going on here?
Hi team,
I'm running baybe in a container under a non-root user. I'm setting BAYBE_TELEMETRY_ENABLED=false
. However, some telemetry code still seems to be executed.
File "/workdir/.venv/lib/python3.12/site-packages/baybe/telemetry.py", line 109, in <module>
--
hashlib.sha256(getpass.getuser().upper().encode()).hexdigest().upper()[:10]
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/getpass.py", line 169, in getuser
return pwd.getpwuid(os.getuid())[0]
^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'getpwuid(): uid not found: 1000'
As a workaround, I will create a user in the container so that this call doesn't throw an exception. However, I would expect that no code is executed as part of loading the baybe module when telemetry is disabled.
In general, the telemetry code seems pretty heavy (making suspicious syscalls to retrieve hostname, uid, tries to make an HTTP request which will likely raise some flags when scanning code bases for malicious behavior or backdoors).
I'd also wanted to mention that this hash is irreversible and cannot identify the user or their machine
is not necessarily true. You can easily pre-compute a rainbow table of all hashes for valid usernames and then do a reverse lookup.
The user guide for surrogates needs an overhaul.
I had a nice time reviewing this repository! Overall I think it's a really comprehensive, clean, and well-documented project. Thank you for open-sourcing it! Find below some questions and suggestions:
A Colab notebook or similar would be really good I think. See e.g., https://colab.research.google.com/drive/1VEHXBLVkn5NZ7N-Oj6-dc_hkIfwFcUE-?usp=sharing. I needed to %pip install 'baybe[chem,simulation]' numpy==1.24.4
on Colab, otherwise it seems to work OK (see numpy/numpy#25150 (comment)). Consider moving "Quick Start" into a tutorial notebook and provide a Colab link. It looks like you already have it set up to convert Jupyter notebooks into html pages (e.g., https://emdgroup.github.io/baybe/examples/Constraints_Discrete/mixture_constraints.html)
It appears that the README example is doing only a single iteration. I would have expected to see an optimization loop and some information about best parameters, though I get that this is geared more towards wetlab scientists.
Maybe clarify in the README that people can choose from different scaling methods with a link to the docs? I eventually happened upon that part of the docs.
I think the detailed installation information could go into an "Advanced Installation" section (either a separate README that gets incorporated into the docs, or near the end of the README). Within the main README in a "quick installation" section, then include a link to the advanced installation instructions. See #95
The README example doesn't have much by way of outputs (e.g., print statements and expected output). See #95
Same for visual representation, such as an optimization trace using BayBE on a task. Are there any built-in visualization methods? If not, consider including at least some examples of visualizing performance
It would be nice to have an "Edit on GitHub" link on your documentation pages -- it makes it a lot easier for others to contribute I think. See #94
It would be nice if the user guide linked to a corresponding tutorial or section of tutorials. For example, linking https://emdgroup.github.io/baybe/userguide/strategy.html to https://emdgroup.github.io/baybe/examples/Basics/strategies.html#
At a glance, this was difficult to parse: Similar to the SequentialStrategy, the StreamingSequentialStrategy enables the utilization of arbitrary iterables to select recommender. Note that this strategy is however not serializable.
(https://emdgroup.github.io/baybe/userguide/strategy.html#the-streamingsequentialstrategy). I think I kind of get it, but not necessarily when or how I would want to use it.
I think there is too much granularity on some of your docs pages on pages like https://emdgroup.github.io/baybe/examples/Constraints_Discrete/Constraints_Discrete.html (i.e., lots of repeat, not a whole lot of valuable information gained from the bottom-most headings). No worries if this would be difficult to change.
It would be nice to get some more details about each of the "Examples" sections rather than needing to click into each one to better understand what it's about. I.e., https://emdgroup.github.io/baybe/examples/examples.html could have some text at the top.
As I'm going into more of the tutorials, I'm seeing that it's really comprehensive. For example, a demonstration of adding existing data https://emdgroup.github.io/baybe/examples/Backtesting/full_initial_data.html. I think there needs to be a better way to highlight/organize/point people to the tutorials they care about most. Happy to discuss more.
I don't think "Backtesting" is common terminology for chem/materials informatics communities, at least in North America. It seems to be more common in finance, for example: https://en.wikipedia.org/wiki/Backtesting. When I wandered into https://emdgroup.github.io/baybe/_autosummary/baybe.simulation.html#module-baybe.simulation, I finally realized that what you refer to as simulation and backtesting is what I would typically refer to as benchmarking. I was thinking that maybe you implemented multi-task BO, where you could leverage physics-based simulations to help inform wetlab/experimental search campaigns. It took a while before this became clear to me.
Right now, "Transfer learning: Mix data from multiple campaigns and accelerate optimization" is mentioned on https://emdgroup.github.io/baybe/misc/readme_link.html#, but it doesn't seem like this is really implemented yet, other than https://emdgroup.github.io/baybe/_autosummary/baybe.simulation.simulate_transfer_learning.html#baybe.simulation.simulate_transfer_learning. However, it doesn't appear to me that transfer learning is being used here. Even going through the function (https://emdgroup.github.io/baybe/_modules/baybe/simulation.html#simulate_transfer_learning), it was a bit tough to realize what was happening until I looked up TaskParameter
. Suddenly, it made sense to me that what you're referring to as a task parameter is what I refer to as a contextual variable. This is also really good for me to see that contextual variable optimization is supported. However, I don't really consider this as transfer learning. In my mind, transfer learning means using one model to inform another. In contextual Bayesian optimization, certain variables are being fixed at each prediction. Perhaps I misunderstood something though. I imagine this will become clearer once https://emdgroup.github.io/baybe/userguide/transfer_learning.html has been developed.
It seems that Expected Hypervolume Improvement (EHVI) isn't one of the supported options for multi-objective optimization. Could you comment on this? With the DESIRABILITY
mode, are each of the targets modeled independently prior to scalarization? If not, I tend to have a hard time referring to something like this as multi-objective optimization. In my mind, it's single-objective optimization of a fixed scalarization of several objectives. As alluded to in https://emdgroup.github.io/baybe/userguide/objective.html#desirability, it's good that a clarification is made about the scales being combined.
Do you perform conditioning on your batches (i.e., compute a joint acquisition function value)? For example, using fantasy point modeling. This is one of the easiest "gotchas" of batch optimization. See facebook/Ax#778 (comment) and https://youtu.be/JzgkSR6FFyM?si=dzv3RVvjKrZlkjlH
What needs does BayBE fulfill that other packages don't? I think the README should clarify what makes BayBE stand apart from others and reference these other packages, too. For example, there's Ax (https://ax.dev), Gauche (https://github.com/leojklarner/gauche), Atlas (https://github.com/aspuru-guzik-group/atlas), Olympus, and https://github.com/experimental-design/bofire.
For example:
I keep what is probably an overly inclusive list of GitHub repos at https://github.com/stars/sgbaird/lists/optimization-and-tuning and a shortlist at https://github.com/AccelerationConsortium/awesome-self-driving-labs/blob/main/readme.md#optimization. I added BayBE
to these lists recently.
I'm also interested to see an optimization comparison/benchmark of using the Mordred encoding with the solvent vs. treating it as a purely categorical variable.
I can appreciate that BayBE seems well-maintained from a software developer perspective! This is welcome in the fields of chemistry and materials science, which understandably often lacks this.
There are a lot of dependencies. I'm glad you split them up into groups!
I notice you have a lot of >=
dependencies in https://github.com/emdgroup/baybe/blob/main/pyproject.toml. Is this overly restrictive? It's OK if you don't think so.
The docstrings look really nice, and it's nice to have the function cross-linking across the API docs.
I look forward to seeing how you use hypothesis testing here!
Feel free to convert to a discussion if desired, and happy to refactor into multiple items if that would be better.
Hi! I was wondering if it would be possible to add a feature where simulations will still return the results compiled up to the point of an error?
The situation I'm running into when running on larger datasets is a botorch error to the tune of
All attempts to fit the model have failed.
I am in the process of troubleshooting what about the dataset is causing the failure, but in the meantime it would be nice to see the results up to that point, which should include dozens of batches of experiments.
Also, if you have any experience with what might be causing an error like this, that would be helpful!
Referring to this comment in a botorch thread: pytorch/botorch#1226 (comment)
I initially wondered if this could be my issue, but baybe should prevent this from being an issue since it identifies duplicate parameter values and randomly picks one.
In baybe/examples/Basics /campaign.py, on line 80 recommendation = campaign.recommend(batch_size=2)
.
The argument to campaign.recommend has been changed to batch_quantity
.
Please update the example.
This is also a problem in baybe/examples/Serialization /basic_serialization.py on lines 97, 98.
We currently ignore two ONNX vulnerabilities that appear in all versions <=1.15. ONce version 1.16.0 is released (announced for march 18) we need to check these again.
/remind me to deploy on Mar 18
The user guide for search spaces needs an overhaul.
I experienced unexpected behaviour in the validation when using a string as input type for active_values in the TaskParamter instead of a list.
I created the TaskParameter with a single-char string for active_values, i.e. active_value="A"
for testing purposes. The Campaign object could be created without a problem. Unexpectedly, I got an error when creating the Campaign object in case the string for active_values was longer than 1, for example active_value="C12"
:
As @Scienfitz already pointed out, it is related to the input type, i.e. list vs string, and that the string is interpreted as a list in the validation. Therefore "C" is not in ["C12", ..]
fails whereas "A" is not in ["A", ..]
works in case of a single-char string. To provide robustness or a clearer error message in that case, the validation could check whether active_values is a list or string and act accordingly.
When I input active_values as a list, i.e. active_value=["C12"]
, the validation and creation of the Campaign object works without problems.
I am trying to run Baybe 0.7.2 with python 3.11 on a Windows machine.
I am trying out the example script from basic serialization (examples/Serialization/basic_serialization.py).
When trying to import Campaign, I get the following:
Cell In[2], line 1
from baybe import Campaign
File ~\code\auto_flow\.venv\Lib\site-packages\baybe\__init__.py:5
from baybe.campaign import Campaign
File ~\code\auto_flow\.venv\Lib\site-packages\baybe\campaign.py:13
from baybe.parameters.base import Parameter
File ~\code\auto_flow\.venv\Lib\site-packages\baybe\parameters\__init__.py:14
from baybe.parameters.substance import SubstanceParameter
File ~\code\auto_flow\.venv\Lib\site-packages\baybe\parameters\substance.py:13
from baybe.utils import (
ImportError: cannot import name 'get_canonical_smiles' from 'baybe.utils' (C:\Users\M316675\code\auto_flow\.venv\Lib\site-packages\baybe\utils\__init__.py)
I had a look, and it seems that the problem is that substance.py explicitly imports get_canonical_smiles
. However, in the ./utils/chemistry.py module, get_canonical_smiles
is only defined if rdkit is installed.
I don't have rdkit installed, so it fails when trying to import 'get_canonical_smiles' in substance.py.
I would recommend including rdkit in the dependencies for this project.
The user guide for the parameters might need an overhaul but needs to checked more thoroughly.
Is there a way to calculate an upper bound for the shape of the search space? I'm trying to find a way to give the user feedback on the complexity of the search space and also how much memory needs to be allocated to proceed with the campaign and generate experiments.
Hi,
When running a simulation in match mode, the code fails when trying to generate results because it attempts to take a np.mean() of the Interval(), returning error TypeError: unsupported operand type(s) for: 'Interval' and 'int'
The error occurs on line 551 of simulation.py here
elif target.mode is TargetMode.MATCH:
match_val = np.mean(target.bounds)
In the meantime just changing the line locally to
match_val = np.mean([target.bounds.lower, target.bounds.upper])
Assuming this is just from outdated code since you mentioned this was an old module in need of a refresh!
The user guide for recommenders needs an overhaul.
The user guide for strategies needs an overhaul.
The user guide for objectives needs an overhaul.
Thank you for the wonderful tool! I have a question about the architecture. Is there a way to add already observed experimental data to a campaign in experiments? Also, is there a way to check the average and variance of the recommended experimental points and exploration space, as well as the evaluation values of the acquisition function in a dataframe?
The user guide for the constraints might need an overhaul but needs to checked more thoroughly.
I'm wondering if the recommendation times I'm encountering are expected given my setup:
Machine:
MacBook Air, 15 inch, M2, 2023
Memory: 16 GB
OS: Sonoma 14.4.1
Python: 3.11.8
Model:
Single NumericalTarget
Parameters: 4 SubstanceParameters (~140 total SMILES molecules), 4 NumericalContinuousParameters
Constraints: 4 numerical parameters must sum to 1.0
Recommender: TwoPhaseMetaRecommender(
initial_recommender=RandomRecommender(),
recommender=SequentialGreedyRecommender())
So when I add 1000 datapoints via campaign.add_measurements() it takes ~4 days to make a recommendation with a batch size of 3. I started a test with only 10 datapoints and it is still running from overnight.
Does this sound expected given my machine, model, and data? If so, what would be the recommended ways to improve the speed? For the molecules I've tried with and without mordred & decorrelation, doesn't seem to make a big difference.
If this doesn't sound expected, how would you recommend I troubleshoot what could be causing the issue?
Thanks in advance!
Hi!
I was wondering if there is anywhere hidden inside baybe that could be locking a random seed somewhere inadvertently?
I am seeing that in a sequence where I call the baybe simulate_experiment
method in a loop I am sometimes getting the same exact results on multiple iterations, and this carries over to a different method (defined by me) which also relies on calls to np.random
. Im noticing that when I call simulate_experiment
in the same loop as my method, my method is also returning identical results, but when I comment out simulate_experiment
, my method goes back to returning random results.
I also tried manually setting the random_seed input to an iteration integer every time of the loop, but this still happens.
I can work on putting together a repro of this but just wanted to go ahead and put this out there to see if theres anywhere obvious this might be happening.
Thanks for being such a cool abstraction for deisng experiments using the powerful Bayesian methods.
I have a set of experimental data that each contains various parameters. I am not sure it is by design, but in experiments it is natural that for multiple experiments that a parameter to hold similar values (duplicates), however, when defining the parameters either CategoricalParameter or NumericalDiscreteParameter, or NumericalContinuousParameter...I get the traceback error as an example for one of the parameters:
ValueError: Cannot assign the following values containing duplicates to parameter FeatureName: (1, 2, 3, 4, 5, 6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6).
Hi, is it/would it be possible to expose the underlying model of a campaign in order to calculate predicted means and variances of a a set of new measurements (not necessarily those recommended by the campaign, but any user-specified measurement that exists within the search space)?
Ideally I'd like to be able to quantify the performance of the model on a set of known measurements as well.
Thanks!
Hi,
I am wondering what is the best way to encode a feature that is a variable-length vector of integers. Would this work out of the box?
If not, the way I would naturally do this is to find the datapoint with the longest vector for this feature, let's call this length N
. Then I would define N
features, where each element in the vector is a different feature. For datapoints with len(vector) < N
, all features between len(vector)
and N
would be assigned a value of 0
.
This seems not ideal though, since the number and identify of features depends on the data, which will change over time.
Is there a better way to do this, or is specific accommodation of this on the roadmap anywhere? Thanks!!
The user guide for simulations needs to be written.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.