Unleashing the power of OpenAI's GPT, we embark on an exploration of the remarkable cognitive capabilities that reside within its vast neural network. Our work is built upon existing work from "Using cognitive psychology to understand GPT-3" (Binz & Schulz, 2023), which examined the cognitive capabilities of GPT-3 using various canonical experiments. Two investigations were conducted: vignette-based experiments, which involved providing participants with a short and predetermined description of hypothetical scenarios, and task-based experiments, which generated scenarios programmatically on a trial-by-trial basis. During each investigation, the cognitive ability of GPT-3 was assessed in four well-known domains: decision-making, information search, deliberation, and causal reasoning. GPT-3 achieved a 50% accuracy for the vignette-based experiments, showed near human performance for task-based experiments, and even showed signatures of model-based reinforcement learning.
Are newer large language models better engaged with the world? With the advent of ChatGPT, GPT-4 and the concept of prompt engineering, we are interested in learning whether these models exhibit improvements in cognitive abilities.
We replicated the vignette-based and task-based experiments described in (Binz & Schulz, 2023) on GPT-3. For vignette-based experiments, we also extended them to assess the performance of GPT-3.5 and GPT-4, and came up with our own adversarial vignettes. For both sets of experiments, prompt engineering is implemented.
Please refer to Final Report.pdf in the Final Report folder for the full paper.