I am trying to experiment with prompts and I'm unable to check whether the system is p

The following is my experiment , I am using only prompt_set 1 and avoiding test

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How are prompts being picked up? about ircot HOT 5 OPEN

mohdsanadzakirizvi commented on August 27, 2024

How are prompts being picked up?

from ircot.

Comments (5)

mohdsanadzakirizvi commented on August 27, 2024

The following is my experiment script, I am using only prompt_set 1 and avoiding test evals for quicker prototyping:

#!/usr/bin/env bash

# Expected command line argument values.
valid_systems=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
valid_models=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base" "none")
valid_datasets=("hotpotqa" "2wikimultihopqa" "musique" "iirc")

# Function to check if an argument is valid
check_argument() {
    local arg="$1"
    local position="$2"
    local valid_values=("${!3}")
    if ! [[ " ${valid_values[*]} " =~ " $arg " ]]; then
        echo "argument number $position is not a valid. Please provide one of: ${valid_values[*]}"
        exit 1
    fi

    if [[ $position -eq 2 && $arg == "none" && $1 != "oner" ]]; then
        echo "The model argument can only be 'none' only if the system argument is 'oner'."
        exit 1
    fi
}

# Check the number of arguments
if [[ $# -ne 3 ]]; then
    echo "Error: Invalid number of arguments. Expected format: ./reproduce.sh SYSTEM MODEL DATASET"
    exit 1
fi

# Check the validity of arguments
check_argument "$1" 1 valid_systems[*]
check_argument "$2" 2 valid_models[*]
check_argument "$3" 3 valid_datasets[*]

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1

echo ">>>> Pick the best HP and save the config with that HP. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1 --best

from ircot.

mohdsanadzakirizvi commented on August 27, 2024

@HarshTrivedi did you get a chance to look at this issue? Any help is greatly appreciated!

from ircot.

HarshTrivedi commented on August 27, 2024

@mohdsanadzakirizvi Sorry for the late response.

What you are doing seems correct. But to see the prompt is affected or not, you should be putting a breakpoint/print statement elsewhere. Put a breakpoint here and ensure that the self.prompt is what you expect it to be.

I left some notes about how to navigate the code/flow better here some time ago. Should be helpful to figure out where the relevant code is based on the config.

Lastly, note that the command python runner.py $1 $2 $3 write --prompt_set 1 is responsible for instantiating the base config with HPs and dumping them in instantiated_configs/. So if you make any change in the base config you want to make sure the write command is called. If the above doesn't work as expected, it'll be a good sanity check to see if the instantiated configs are as you expect them to be as per the change in the base config.

from ircot.

mohdsanadzakirizvi commented on August 27, 2024

Thanks for your response! Another question that I had was, how do you pick up the dev and eval sets? I have seen the subsampled files in the folders but where do you read them? It seems to me that inference/ircot.py's "StepByStepCOT..." only get executed when I call the predict on eval:

python runner.py $1 $2 $3 predict --prompt_set 1 --best --eval_test --official

Which doesn't make sense. Shouldn't it also run during the predict on dev too? If so, why am I not getting any print in the output? Are we not making a LLM prediction on dev set unless eval is involved?

from ircot.

HarshTrivedi commented on August 27, 2024

Just to clarify: If you leave out the --best --eval_test in that command, it'll run prediction on all instantiated configs on the dev set. The ----best refers to best HP on the dev set, and --eval_test refers to the test set (as opposed to the default dev set). The --official flag is to use officially released eval scripts of respective datasets (instead of internally developed ones).

It seems you already know this. So when you drop these additional flags and still don't get any log/breakpoint in StepByStepCOTGen, it perhaps might mean that you there are no HPs setup to run.

Note that runner.py is a wrapper around run.py for running a batch of experiments. Whenever you run runner.py, it emits out what run.py commands it runs. The run.py is also a wrapper around a collection of scripts like predict.py, evaluate.py, etc. It'll also emit out what exact script it calls. It will be something like python predict.py {file_path} {evaluation_path} for the predict command all the way down. For debugging purposes, perhaps, going down at the root level may be easier.

from ircot.

How are prompts being picked up? about ircot HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs