Awesome-LLM-in-Social-Science

Below we compile awesome papers that

evaluate Large Language Models (LLMs) from a perspective of Social Science.
align LLMs from a perspective of Social Science.
employ LLMs to create simulation environments, facilitating research or addressing issues in diverse fields of Social Science.
contribute surveys or perspectives on the above topics.

Evaluation, alignment, and simulation are by no means orthogonal. For example, evaluations require simulations. We categorize these papers based on our understanding of their focus.

Welcome to contribute and discuss!

1. 📚 Survey
1. 🔎 Evaluation
- 2.1. ❤️ Value
- 2.2. 🩷 Personality
- 2.3. 🔞 Morality
- 2.4. 🎤 Opinion
- 2.5. 🧠 Ability
1. ⛑️ Alignment
1. 🚀 Simulation
1. 👁️‍🗨️ Perspective

1. 📚 Survey

The Rise and Potential of Large Language Model Based Agents: A Survey, 2023, [paper], [repo].
A Survey on Large Language Model based Autonomous Agents, 2023, [paper], [repo].
AI Alignment: A Comprehensive Survey, 2023.11, [paper], [website].
Aligning Large Language Models with Human: A Survey, 2023, [paper], [repo].
Large Language Model Alignment: A Survey, 2023, [paper].
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives, 2023.12, [paper].
A Survey on Evaluation of Large Language Models, 2023.07, [paper], [repo].
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models, 2023.08, [paper], [repo].
Large Language Model based Multi-Agents: A Survey of Progress and Challenges, 2024.01, [paper], [repo].

2. 🔎 Evaluation

2.1. ❤️ Value

Heterogeneous Value Evaluation for Large Language Models, 2023.03, [paper], [code].

TL;DR: This paper introduces the A2EHV method to assess how well these models align with a range of human values categorized under the Social Value Orientation (SVO) framework.
Measuring Value Understanding in Language Models through Discriminator-Critique Gap, 2023.10, [paper].

TL;DR: This paper introduces Value Understanding Measurement (VUM) framework to quantitatively assess an LLM's understanding of values. This is done by measuring the discriminator-critique gap (DCG), which evaluates both the model's knowledge of values ("know what") and the reasoning behind this knowledge ("know why").
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values, 2023.11, [paper].
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties, AAAI24, [paper], [code].

2.2. 🩷 Personality

Who is GPT-3? An Exploration of Personality, Values and Demographics, 2022.09, [paper]
[BFI] Identifying and Manipulating the Personality Traits of Language Models, 2022,12, [paper]
[BFI] Evaluating and Inducing Personality in Pre-trained Language Models, NeurIPS 2023 (spotlight), [paper]
[BFI] Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs, 2023.05, [paper]
[BFI] Personality Traits in Large Language Models, 2023.07, [paper]
[BFI] Revisiting the Reliability of Psychological Scales on Large Language Models, 2023.05, [paper]
[BFI] Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation, ACL 2023 workshop, [paper]
[BFI] AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories, Journal, 2024.01, [paper]
Does GPT-3 Demonstrate Psychopathy? Evaluating Large Language Models from a Psychological Perspective, 2022.12, [paper]
Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots, 2023.10, [paper]
[MBTI] Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models, 2023.07, [paper]
[MBTI] Can ChatGPT Assess Human Personalities? A General Evaluation Framework, 2023.03, EMNLP 2023, [paper], [code].

TL;DR: (1) Using LLM to evaluate MBTI of different groups of people via prompt engineering. (2) Unbiased prompts by averaging over randomly permuted options. (3) Converting the original subject of the question statements into a target subject (e.g., men, barbers). (4) Ask LLM "is it right/wrong" instead of "do you agree/disagree". (5) Metrics to evaluate consistency, robustness, and fairness.
[MBTI] Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models, 2024.01, [paper]
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench, ICLR 2024, [paper], [code]

TL;DR: (1) Using 13 psychometric scales. (2) Directly prompt LLMs to generate numbers. (3) Discussing reliability and validity.

2.3. 🔞 Morality

Aligning AI With Shared Human Values, 2020, [paper].
Exploring the psychology of GPT-4's Moral and Legal Reasoning, 2023.08, [paper].

TL;DR: The paper investigates GPT-4's moral and legal reasoning compared to humans across several domains, using vignette-based studies. It reveals significant parallels and differences in GPT-4's responses, offering insights into its alignment with human moral judgments.
Probing the Moral Development of Large Language Models through Defining Issues Test

TL;DR: Defining Issues Test (DIT) based on Kohlberg's model of moral development is used to evaluate the ethical reasoning abilities of LLMs. GPT-3 performs at random baseline level while GPT-4 achieves the highest moral development score equivalent to graduate students.
Moral Foundations of Large Language Models, 2023.10, [paper].
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity, 2023.06, [paper]
Evaluating the Moral Beliefs Encoded in LLMs, 2023.07, [paper]

2.4. 🎤 Opinion

More human than human: measuring ChatGPT political bias, 2023, [paper].

TL;DR: This paper proposed empirical designs to measure political bias in ChatGPT, showing that ChatGPT exhibits a significant and systematic political bias towards the Democrats in the US, Lula in Brazil, and the Labour Party in the UK.
Towards Measuring the Representation of Subjective Global Opinions in Language Models, 2023.07, [paper], [website].

TL;DR: This study explores how to quantitatively assess the representation of subjective global opinions in LLMs. It introduces a dataset from cross-national surveys to capture diverse global perspectives, and develops a metric to measure the similarity between LLM-generated responses and human responses conditioned on nationality, revealing biases and stereotypes in the model's responses.

2.5. 🧠 Ability

Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity, 2021, [paper].
Can Large Language Models Transform Computational Social Science?, 2023, [paper], [code].

TL;DR: This document provides a roadmap for using LLMs as CSS tools, including prompting best practices and an evaluation pipeline. Evaluations show that LLMs can serve as zero-shot data annotators and assist with challenging creative generation tasks.
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents, 2023, [paper], [code].

TL;DR: The paper introduces SOTOPIA, a novel interactive environment for evaluating social intelligence in language agents through goal-driven social interactions. Experiments using SOTOPIA reveal gaps between SOTA models and human social intelligence, despite models showing some promising capabilities.
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View, 2023, [paper], [code].

TL;DR: This paper explores collaboration mechanisms among LLMs in a multi-agent system by drawing insights from social psychology. Multi-agent collaboration strategies are more important than scaling up single LLMs; fostering effective collaboration is key for more socially-aware AI.
Using large language models in psychology, 2023, [paper].

TL;DR: This paper explores the potential applications and concerns of using LLMs in psychological research, and recommends investments in high-quality datasets, performance benchmarks, and infrastructure to enable responsible use of LLMs.
Playing repeated games with Large Language Models, 2023.05, [paper].

TL;DR: This paper studies Large Language Models' (LLMs) cooperative and coordinated behavior by letting them play repeated 2-player games. The key findings are that LLMs like GPT-4 perform well in competitive games but struggle to coordinate and alternate strategies in games requiring more cooperation.
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods, 2023, [paper].
Using cognitive psychology to understand GPT-3, 2023.02, PNAS, [paper].
Large language models as a substitute for human experts in annotating political text, 2024.02, [paper].

3. ⛑️ Alignment

ValueNet: A New Dataset for Human Value Driven Dialogue System, AAAI 2022, [paper], [dataset].
Fine-tuning language models to find agreement among humans with diverse preferences, 2022, [paper].

Keywords: consensus, fine-tuning, diverse preferences, alignment

TL;DR: This work fine-tunes LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions, especially on moral and political issues.
Training Socially Aligned Language Models in Simulated Human Society, 2023, [paper], [code].

Keywords: Stable Alignment, social alignment, societal norms and values, simulated social interactions, contrastive supervised learning

TL;DR: This paper presents a training paradigm that permits LMs to learn from simulated social interactions for their social alignment. The model trained under such a paradigm better handles “jailbreaking prompts”.
[Norm] Align on the Fly: Adapting Chatbot Behavior to Established Norms, 2023.12, [paper], [code].

TL;DR: Using RAG to align LLMs with dynamic, diverse human values such as social norms.
[MBTI] Machine Mindset: An MBTI Exploration of Large Language Models, 2023.12, [paper], [code].

TL;DR: Train LLM toward certain MBTI via instruction tuning and direct preference optimization (DPO).
Agent Alignment in Evolving Social Norms, 2024.01, [paper].

4. 🚀 Simulation

Out of One, Many: Using Language Models to Simulate Human Samples, 2022, [paper].

TL;DR: This work introduces "algorithmic fidelity" - the degree to which the relationships between ideas, attitudes, and contexts in a model mirror those in human groups. They propose 4 criteria for assessing algorithmic fidelity and demonstrate that GPT-3 exhibits a high degree of fidelity for modeling public opinion and political attitudes in the U.S.
Social Simulacra: Creating Populated Prototypes for Social Computing Systems, 2022, [paper].

Keywords: social computing prototypes, social simulacra, LLMs, system design refinement

TL;DR: This paper proposes Social Simulacra, a social computing prototype, to mimic authentic social interactions within a system populated by diverse community members, each with distinct behaviors such as posts, replies, and anti-social tendencies.
Generative Agents: Interactive Simulacra of Human Behavior, 2023, [paper], [code].

Keywords: generative agents, sandbox environment, natural language communication, emergent social behaviors, Smallville

TL;DR: This paper introduces generative agents and their architecture for memory storage, reflection, retrieval, etc. The agents produce believable individual and emergent social behaviors in an interactive sandbox environment.
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies, 2023, [paper], [code].

TL;DR: This paper presents a methodology for simulating Turing Experiments (TEs) and applies it to replicate well-established findings from economic, psycholinguistic, and social psychology experiments. The results show that larger language models provide more faithful simulations, except for a "hyper-accuracy distortion" (being unhumanly accurate) present in some recent models.
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?, 2023 [paper], [code].

TL;DR: LLMs can be used like economists use homo economicus. Experiments using LLMs show qualitatively similar results to the original economic research. It is promising to use LLM to search for novel social science insights to test in the real world.
$S^3$: Social-network Simulation System with Large Language Model-Empowered Agents, 2023, [paper].

Keywords: social network simulation, agent-based simulation, information/attitude/emotion propagation, user behavior modeling

TL;DR: This paper introduces the Social-network Simulation System (S3) to simulate social networks via LLM-based agents. Evaluations using two real-world scenarios, namely gender discrimination and nuclear energy, display high accuracy in replicating individual attitudes, emotions, and behaviors, as well as successfully modeling the phenomena of information, attitude, and emotion propagation at the population level.
Rethinking the Buyer’s Inspection Paradox in Information Markets with Language Agents, 2023, [paper].

Keywords: buyer’s inspection paradox, information economics, information market, language model, agent

TL;DR: This work explores the buyer's inspection paradox in a simulated information marketplace, highlighting enhanced decision-making and answer quality when agents temporarily access information before purchase.
SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series, 2023, [paper].

Keywords: lifelong learning, human society analysis, hyperportfolio, time series investment, Analyst-Assistant-Actuator architecture, Hypothesis and Proof prompting

TL;DR: The paper introduces SocioDojo, a new environment and hyperportfolio task for training lifelong agents to analyze and make decisions about human society, along with a novel Analyst-Assistant-Actuator architecture and Hypothesis & Proof prompting technique. Experiments show the proposed method achieves over 30% higher returns compared to state-of-the-art methods in the hyperportfolio task requiring societal understanding.
Humanoid Agents: Platform for Simulating Human-like Generative Agents, 2023, [paper], [code].

Keywords: humanoid agents, generative agents, basic needs, emotions, relationships

TL;DR: This paper proposes Humanoid Agents, a system that guides generative agents to behave more like humans by introducing dynamic elements that affect behavior - basic needs like hunger and rest, emotions, and relationship closeness.
When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm, 2023, [paper], [code].

Keywords: user behavior analysis, user simulation, recommender system, profiling/memory/action module

TL;DR: This work employs LLM for user simulation in recommender systems. The experiments demonstrate the superiority of RecAgent over baseline simulation systems and its ability to generate reliable user behaviors.
Large Language Model-Empowered Agents for Simulating Macroeconomic Activities, 2023, [paper].

Keywords: macroeconomic simulation, agent-based modeling, prompt-engineering, perception/reflection/decision-making abilities

TL;DR: This work leverages LLM-based agents for macroeconomic simulation. Experiments show that LLM-based agents make realistic decisions, reproducing classic macro phenomena better than rule-based or other AI agents.
Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence, 2023, [paper].

Keywords: Generative Agend-Based Modeling, norm diffusion, social dynamics

TL;DR: The authors demonstrate Generative Agent-Based Modeling (GABM) through a simple model of norm diffusion, where agents decide on wearing green or blue shirts based on peer influence. The results show emergence of group norms, sensitivity to agent personas, and conformity to asymmetric adoption forces.
Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models, 2023.06, NeurIPS 2023, [paper].

TL;DR: We present a new algorithm for using outputs from LLMs for downstream statistic analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. (用LLM的输出进行社会科学的文档标签的下游统计分析)
Epidemic Modeling with Generative Agents, 2023.07, [paper], [code].

Keywords: epidemic modeling, generative AI, agent-based model, human behavior, COVID-19

TL;DR: The paper presents a new epidemic modeling approach using generative AI to empower individual agents with reasoning ability. The generative agent-based model collectively flattens the epidemic curve, mimicking patterns like multiple waves, through AI-powered decision-making without imposed rules.
Emergent analogical reasoning in large language models, 2023.08, nature human behavior, [paper].

Keywords: GPT-3, Analogical Reasoning, Zero-Shot Learning, Cognitive Processes, Human Comparison

TL;DR: This paper investigates the emergent analogical reasoning capabilities of GPT-3, demonstrating its proficiency in various analogy tasks compared to college students. The research highlights GPT-3's potential in zero-shot learning and its similarity to human cognitive processes in problem-solving.
MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents, 2023.10, [paper].

Keywords: agent simulation, job fair environment, task-oriented coordination

TL;DR: The paper introduces "MetaAgents" to enhance coordination in LLMs through a novel collaborative and reasoning approach, tested in a simulated job fair environment. The study reveals both the potential and limitations of LLM-based agents in complex social coordination tasks.
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars, 2023.11, [paper], [code].

TL;DR: This paper presents WarAgent, an AI system simulating historical conflicts, revealing how historical and policy factors critically drive the inevitability and nature of wars.
Emergence of Social Norms in Large Language Model-based Agent Societies, 2024.03, [paper], [code].

5. 👁️‍🗨️ Perspective

A social path to human-like artificial intelligence, 2023.11, Nature Machine Intelligence, [paper].

TL;DR: This paper explores the social pathways to human intelligence, highlighting the roles of collective living, social relationships, and key evolutionary transformations in the development of intelligence.

henry-yeh / awesome-llm-in-social-science Goto Github PK

awesome-llm-in-social-science's Introduction

Awesome-LLM-in-Social-Science

Table of Contents

1. 📚 Survey

2. 🔎 Evaluation

2.1. ❤️ Value

2.2. 🩷 Personality

2.3. 🔞 Morality

2.4. 🎤 Opinion

2.5. 🧠 Ability

3. ⛑️ Alignment

4. 🚀 Simulation

5. 👁️‍🗨️ Perspective

awesome-llm-in-social-science's People

Contributors

Stargazers

Watchers

Forkers

awesome-llm-in-social-science's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs