I use gpt-3.5-turbo run the source code,when the tree structure iterates, After generate_program execution, the out['next_step'] attribute will appear like 'Since the input consists of only two numbers, there are no possible forbidden steps. To obtain 24, we can use the following step: 20 - 4 = 16'.Then the regular expression "[^0-9+\-*/.(),=\s]" will matches unrelated strings,so that the next judgment statement jumps into continue,the correct branch 20 - 4 = 16 is ignored .
Thanks for your excellent work! May I ask a few questions about your experiments on the MATH dataset?
Which parts in your prompt correspond to the Proposer, Verifier, and Reporter, respectively? It looks like the hints, intermediate questions and answers, and final answer are all parts of "proposing", but I fail to find anything about "verifying and "reporting". Did I miss anything?
Is the evaluation also done by GPT4, according to this function? How well does it correlate with oracle judgements?