GithubHelp home page GithubHelp logo

Comments (5)

jonhue avatar jonhue commented on July 17, 2024

I would also be very curious to hear how this file can be reproduced. I'm assuming that https://github.com/AkariAsai/self-rag/tree/main/data_creation/generator summarizes the process, but the readme seems a bit outdated and there is no reference to where the initial input file is from. I would really appreciate a reference to a place that has more guideance on this!

from self-rag.

AkariAsai avatar AkariAsai commented on July 17, 2024

@Gera001 Hi, do you mind which eval task data do you want to learn the details of the creation process?
@jonhue Thanks for the question! I guess you are intereted in the training data creation? Our initial instruction-tuning data comes from processed data from open-instruct as well as the KILT and some other knowledge-intensive task data we manually processed. I can upload the source data without any Self-RAG processing if it helps! I apologize the README of the training data creation is outdated... The training and evaluation is mostly done by myself, and I am only one who maintain this repository, and I've been hectic with teaching duties and other projects as a Ph.D. student...
I hope I can make more time to clean up and nicely package the code bases before ICLR. Sorry for the inconvenience!

from self-rag.

jonhue avatar jonhue commented on July 17, 2024

Thank you for your answer @AkariAsai! Totally understand the time constraints, and really appreciate that you try to make this project easy to reproduce! I myself am interested in trying out some different retrievers, so if you are able to share the source data before running the retriever to retrieve similar passages, it would be very helpful! Thank you :)

from self-rag.

innovation64 avatar innovation64 commented on July 17, 2024

@AkariAsai hi I wonder know how to reproduce the eval_data/popqa_longtail.jsonl content, I use your retriever can't reproduce same answer with top 20 docs. All the passages(the tsv file) , embedding file are same ,but I can't reproduce the same results. I found your eval_data for example like popqa_longtail.jsonl which top 1 has corrective article about 0.49035025017869904 , top 20 has corrective article about 0.6426018584703359, but I use your offered embedding and tsv file only can get 0.23802716225875625 and 0.3173695496783417 , this is unreasonable, i want to why would this happen?

from self-rag.

innovation64 avatar innovation64 commented on July 17, 2024

and i also want to know popqa_longtail_w_gs.jsonl, what's kind of artificial process have you gays down. @AkariAsai could you share this details thx

from self-rag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.