Hello, I'm trying to reproduce paper numbers on arc_challenge by run

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I fixed the beam_searh argument in the . Thanks again for reporting the issue!<b

beam_width argument in retrieval_lm/run_short_form.py about self-rag HOT 4 CLOSED

carlosandrea commented on July 17, 2024

beam_width argument in retrieval_lm/run_short_form.py

from self-rag.

Comments (4)

fate-ubw commented on July 17, 2024

I hava met the same problem as you
change the 313 line in self-rag/retrieval_lm/run_short_form.py . I think the author made wrong with this code

    def generate(prompt, evidences, max_new_tokens):
        return call_model_rerank_w_scores_batch(prompt, evidences=evidences, model=model, max_new_tokens=max_new_tokens,
                                                rel_tokens=rel_tokens, ret_tokens=ret_tokens, grd_tokens=grd_tokens, ut_tokens=ut_tokens,
                                                threshold=args.threshold, use_seqscore=args.use_seqscore,
                                                w_rel=args.w_rel, w_sup=args.w_sup, w_use=args.w_use, mode=args.mode, closed=args.task in ["fever", "arc_c"])

from self-rag.

carlosandrea commented on July 17, 2024

@fate-ubw I have done the same thing for so far only able to make run short_form with : always_retrieve mode, other mode are throwing error.
Did you make it run ?
I have some issues reproducing paper numbers, while self.rag numbers are in line, I have some strange value for LLama-2 7B :
Very low value for PUB : 0
Very high value for ARC : 0.91

from self-rag.

AkariAsai commented on July 17, 2024

Thank you so much for reporting! I was changing the codebase before releasing and seems forgot to fix the variable name. I will fix it.
@carlosandrea Would you mind sharing your excat evaluation command? I can help debugginng. I haven't seen that issue on my side, so some more info helps me to dig into the issue!

from self-rag.

AkariAsai commented on July 17, 2024

I fixed the beam_searh argument in the script. Thanks again for reporting the issue!
@carlosandrea could you create a separate issue for the llama2 performance, and include the command you used? One possible reason is, in some previous issues, people got strange numbers when they are using a script written for self-rag for baselines. Self-RAG embeds retrieved context in a way different from other baselines, and some models show incredibly low performance when the context is not given in front of the prompts.

from self-rag.

Recommend Projects

beam_width argument in retrieval_lm/run_short_form.py about self-rag HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs