madaan / self-refine Goto Github PK
View Code? Open in Web Editor NEWLLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
Home Page: https://selfrefine.info
License: Apache License 2.0
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
Home Page: https://selfrefine.info
License: Apache License 2.0
Hi,
Thank you for your fantastic work!
It seems like the instructions for conducting PIE evaluation are absent. Would you be able to provide instructions on how to use the pie_eval.py script? I'm particularly uncertain about the process of obtaining the .report file. Thanks!
Will you release the CommonGen-Hard dataset soon? It looks like that it has 20-30 concepts for each sentence. Very curious about this dataset.
Hello. For code-related tasks, do you plan to update your code to replace the Codex with another model? Do you have a suggestion for the alternative model & do you plan to push the updated code?
Would you please provide the instructions for evaluating model responses in the Dialogue Response Generation task?
I want to use self-refine for reasoning task, such as open-book qa for example.
For the few-shot examples for the initial generation. Does the examples have to be bad examples?
If I have good examples, could I use them for the initial stage and hope that through iterations, it gets even better?
However if i were to use already good examples, it might be tough to come up with even better ones in the few-shot examples for the refine stage?
Hi All,
Thank you for your lovely work.
There is no run.py in PIE folder.
Thank you
Great work! We are currently trying to reproduce your results such that we can build on top of your insights. I see that you are still working on this codebase. Are there already some benchmarks that are fully implemented in this repository that you do not intend to develop further in the near future? Because when I am running the different benchmarks I sometimes run into some errors and I am not sure when the error is from my side, or just because some functionality is missing.
For example: When I am running the CommonGen benchmark on a reduced test set, I observe a lot of errors in the output file. My main source of error seems that the feedback from GPT does not have the intended structure, such that exceptions in the code occur. I did not change any of the training prompts/instructions prompts you provide in this file and use "gpt-3.5-turbo". Did you also observe this behaviour?
I also noticed two different things, which I thought I would notify you about:
Thanks a lot!
Could this repo get an MIT or Apache license to fully free up anyone to take and adapt work found here and innovate further?
Hello,I am really interested in your fine work and trying to reproduce the results!
Can you share the prompts and examples for the code optimization?
I am having hard time reproducing the code optimization results.
I have an additional question regarding code optimization in your paper. Specifically, I'm interested in how you calculated the percentage of programs that were optimized. When reproducing the results, I noticed that some codes performed worse, while others actually showed improvement.
Was there an attempt to test this library with LLaMa 2 model
In the Yelp benchmark, the file for the task_measure is missing. I.e. The class SentimentTransferMeasurement, can't be found. Can you upload this file? Thanks!
There is never a feedback
Hello! First of all this is a super nice paper.
I am trying to wrap my head around the concept of the paper. What I don't understand is this:
No matter what the output from the LM is, the LM is prompted again with the same question and the generated code/text (code+comments), until the LM itself says "it is correct", with a maximum of max_attempts for each question?
The paper reports improvements over 5 iterations, so if the model outputs "it is correct" the same output is used for the next iteration? Just want to make sure I understood this correctly.
Amazing work! We are interested in building on this work. Are there any plans for releasing the long-form sentiment reversal Yelp datatset in this work? Thanks!
how can i get a ANTHROPIC_API_KEY?
Hello! I'm running python -u src/gsm/run.py
with "gpt-3.5-turbo" and has the following error. This is happening because "def solution():" not in entire_output in feedback.py.
1%|โ | 8/1319 [03:04<8:42:28, 23.91s/it]
An error occurred: list index out of range. Traceback (most recent call last):
File "/home/ubuntu/code/hideodeo/self-refine/src/utils.py", line 39, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/code/hideodeo/self-refine/src/gsm/run.py", line 40, in iterative_gsm
fb_and_maybe_soln = task_feedback(solution=solution)
File "/home/ubuntu/code/hideodeo/self-refine/src/gsm/feedback.py", line 42, in __call__
solution = entire_output.split("def solution():")[1]
IndexError: list index out of range
. Left retries: 2.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.