GithubHelp home page GithubHelp logo

Comments (5)

mohdsanadzakirizvi avatar mohdsanadzakirizvi commented on August 27, 2024

The following is my experiment script, I am using only prompt_set 1 and avoiding test evals for quicker prototyping:

#!/usr/bin/env bash

# Expected command line argument values.
valid_systems=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
valid_models=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base" "none")
valid_datasets=("hotpotqa" "2wikimultihopqa" "musique" "iirc")

# Function to check if an argument is valid
check_argument() {
    local arg="$1"
    local position="$2"
    local valid_values=("${!3}")
    if ! [[ " ${valid_values[*]} " =~ " $arg " ]]; then
        echo "argument number $position is not a valid. Please provide one of: ${valid_values[*]}"
        exit 1
    fi

    if [[ $position -eq 2 && $arg == "none" && $1 != "oner" ]]; then
        echo "The model argument can only be 'none' only if the system argument is 'oner'."
        exit 1
    fi
}

# Check the number of arguments
if [[ $# -ne 3 ]]; then
    echo "Error: Invalid number of arguments. Expected format: ./reproduce.sh SYSTEM MODEL DATASET"
    exit 1
fi

# Check the validity of arguments
check_argument "$1" 1 valid_systems[*]
check_argument "$2" 2 valid_models[*]
check_argument "$3" 3 valid_datasets[*]

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1

echo ">>>> Pick the best HP and save the config with that HP. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1 --best

from ircot.

mohdsanadzakirizvi avatar mohdsanadzakirizvi commented on August 27, 2024

@HarshTrivedi did you get a chance to look at this issue? Any help is greatly appreciated!

from ircot.

HarshTrivedi avatar HarshTrivedi commented on August 27, 2024

@mohdsanadzakirizvi Sorry for the late response.

What you are doing seems correct. But to see the prompt is affected or not, you should be putting a breakpoint/print statement elsewhere. Put a breakpoint here and ensure that the self.prompt is what you expect it to be.

I left some notes about how to navigate the code/flow better here some time ago. Should be helpful to figure out where the relevant code is based on the config.

Lastly, note that the command python runner.py $1 $2 $3 write --prompt_set 1 is responsible for instantiating the base config with HPs and dumping them in instantiated_configs/. So if you make any change in the base config you want to make sure the write command is called. If the above doesn't work as expected, it'll be a good sanity check to see if the instantiated configs are as you expect them to be as per the change in the base config.

from ircot.

mohdsanadzakirizvi avatar mohdsanadzakirizvi commented on August 27, 2024

Thanks for your response! Another question that I had was, how do you pick up the dev and eval sets? I have seen the subsampled files in the folders but where do you read them? It seems to me that inference/ircot.py's "StepByStepCOT..." only get executed when I call the predict on eval:

python runner.py $1 $2 $3 predict --prompt_set 1 --best --eval_test --official

Which doesn't make sense. Shouldn't it also run during the predict on dev too? If so, why am I not getting any print in the output? Are we not making a LLM prediction on dev set unless eval is involved?

from ircot.

HarshTrivedi avatar HarshTrivedi commented on August 27, 2024

Just to clarify: If you leave out the --best --eval_test in that command, it'll run prediction on all instantiated configs on the dev set. The ----best refers to best HP on the dev set, and --eval_test refers to the test set (as opposed to the default dev set). The --official flag is to use officially released eval scripts of respective datasets (instead of internally developed ones).

It seems you already know this. So when you drop these additional flags and still don't get any log/breakpoint in StepByStepCOTGen, it perhaps might mean that you there are no HPs setup to run.

Note that runner.py is a wrapper around run.py for running a batch of experiments. Whenever you run runner.py, it emits out what run.py commands it runs. The run.py is also a wrapper around a collection of scripts like predict.py, evaluate.py, etc. It'll also emit out what exact script it calls. It will be something like python predict.py {file_path} {evaluation_path} for the predict command all the way down. For debugging purposes, perhaps, going down at the root level may be easier.

from ircot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.