The following peer review was solicited as part of the Distill review process.
The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to Austin Huang for taking the time to review this article.
General Comments
Missing Tools for Reasoning
Acquisition functions are introduced from a definitional standpoint and their behavior is illustrated for a relatively artificial example. Sometimes the methods are shown to work, sometimes they don't. How does one think about implementation alternatives when working on a new problem? The article provides few conceptual tools for the reader to apply these methods successfully.
There's also serious issues with model misspecification underneath the surface of these implementations (see for example, Thompson Sampling discussion). However, the article doesn't even raise the topic - the discussion starts from a fixed model specification and anecdotally shows methods either working or not under a narrow example.
Relatedly, there's a section entitled ""Why is it easier to optimize the acquisition function?"" This framing may be misleading since ""easiness"" isn't the goal. The real question seems to be ""Why is it beneficial to optimize the acquisition function?"" or perhaps ""is it even beneficial to optimize with respect to an acquisition function""?
Does the Hero Plot Illustrate a Cental Aspect of the Discussion?
An interactive visualization communicates a response function to the variables that can be affected by input. In the hero plot, this corresponds to the response of the activation function as a function of the epsilon hyperparameter in a PI acquisition function for fixed data and ground truth. It also shows the CDF for two slices of X (1.0 and 5.0) which are intermediate computations used by the activation function.
Is that particular relationship sufficiently central to the article to be front and center? There are other relationships that seem more central to the topic that could have been highlighted (how choice of acquisition functions compare, how the activation function changes with data). The plot is nice to interact with for thinking about exploration/exploitation in PI, but it doesn't seem to be an obvious choice as the hero plot.
Minor visual issue - the vertical labels look buggy, with 0.00e+0 cutting through the axis line.
Grey backgrounds don't fit Distill's Template
The patch of grey rectangle background for each figure doesn't fit the aesthetic of the distill template. The convention in other articles seems to be white-on-white with no boundary or occasionally a horizontal ribbon that runs the width of the page for visualizations with lots of margin content.
Animations are Overused
Note in other distill articles, animations are used sparingly, and usually just at the top figure or concluding figure.
Looping animations were overused and ultimately not a good way to illustrate a dependency relationship compared to a visual with a control.
Even if the content in those figures is kept as is with a slider http://worrydream.com/LadderOfAbstraction/, this would be an improvement by not being distracting and allowing the reader to examine relationships between iterations more carefully.
Introduction to EI is Confusing
Perhaps the framing using the unknown ground truth was the original motivation but here it just makes the reasoning convoluted without adding much insight. Don't see any reason not to just jump to the definition as described by the name - expected improvement (i.e. the 2nd equation).
Thompson Sampling
""It has a low overhead of setting up."" - not sure why this is specifically pointed out in the case of TS, is overhead any lower to set up than the other acquisition functions?
The statement that ""This will ensure an exploratory behaviour."" is contradicted by the animation demonstration that follows. From that demo's figures, it would actually seem nearly impossible to reach the global minimma without refining the underlying GP model - there's not enough noise in the function distribution to adequately explore. However the example is simply left without further comment.
Hyperparameter Tuning - Axis Labels
Using the horizontal label ""# of Hyper-Parameters Tested"" is a confusing label description since it doesn't really refer to the # of hyper-parameters tested, but rather the # of values that have been evaluated.
Hyperparameter Tuning - Changing colormap scale makes it impossible to track the function evolution
The colormaps should probably not rescale with each iteration - it makes it very difficult to track the evolution of the acquisition function between frames.
As mentioned above, replacing all or most animations with a slider control would also improve the legibility of the figure.
Legend tweaks
- The legend positioning for the top ""hero"" plot looks buggy. ""GT"", ""GP"" and ""\epsilon"" are glued to the point without any spacing. The alignment looks very off
- Not sure why ""GT"" is abbreviated when longer captions like ""Acquisition function"" are not.
- ""Train points"" -> ""Training points""
- Given the legends are already really busy ""(Tie randomly broken)"" would be better as a linked footnote."
"# Minor Writing Improvements
- ""Older problem - Earlier in the active learning problem ... "" can remove the preface and start with ""In the active learning problem ...""
- ""We can write a general form of an acquisition function ..."" this sentence could be more weight and made more explicit about stating that mu(x) models exploitation and sigma(x) represents the value of exploration. It's implied by the phrasing, but could be clearer.
- Don't nest parenthesis in parenthesis ""(of function values (gold in our case))""
- ""We can obtain a closed form solution as below"" - expression in terms of CDF is not usually considered ""closed form"" https://en.wikipedia.org/wiki/Closed-form_expression would just avoid using the phrase
- "" h_{t+1} is our GP posterior of the ground truth"" - guessing intends to refer to the ""posterior mean"" since it needs to be a function
- ""first vanilla acquisition function"" - reference UCB directly instead of referring to it as ""first vanilla acquisition function""
- (try to find the global maxima that might be near this “best” location)"" - this parenthetical remark is confusing and doesn't add to the statement.
- ""easily"" is used a lot throughout the article and in almost all cases the sentence improves by the omission of this unnecessary subjective qualifier. ""equation can be easily converted..."", ""One can easily change ..."", ""We can easily apply the BO for more dimensions"", ""... can easily be incorporated into BO."" (2 times in the same sentence in the last example)
Concluding Comments
Bayesian optimization and active learning aren't particularly popular to write about currently. I also suspect there's quite a bit of interest in the topic, particularly in industry and applied machine learning contexts.
Given that, this article does contribute to a notable gap in the research distillation space. However, I think more work needs to be put into this manuscript to raise the quality of communication to be comparable to other distill articles."
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results
Advancing the Dialogue |
Score |
How significant are these contributions? |
3/5 |
Outstanding Communication |
Score |
Article Structure |
2/5 |
Writing Style |
3/5 |
Diagram & Interface Style |
3/5 |
Impact of diagrams / interfaces / tools for thought? |
3/5 |
Readability |
3/5 |
Scientific Correctness & Integrity |
Score |
Are claims in the article well supported? |
3/5 |
Does the article critically evaluate its limitations? How easily would a lay person understand them? |
3/5 |
How easy would it be to replicate (or falsify) the results? |
4/5 |
Does the article cite relevant work? |
4/5 |
Does the article exhibit strong intellectual honesty and scientific hygiene? |
3/5 |