-
Chat partner
-
Consultant: e.g., Financial planning
-
Teacher: e.g., R Programming
-
Counselor
-
Designer
What you probably have tried…
-
Proof-reading, copy-editing for papers
-
Debugging for code
-
Inspiring research ideas
Today: Using ChatGPT to annotate political documents at scale
How to extract information you need from a large number of political documents using ChatGPT?
Laborious, tedious, BUT ESSENTIAL part of political social data analytics!
What may need annotation for research:
-
Social media posts
-
Newspapers
-
Politicians’ pronouncement (e.g., speeches, campaign leaflets)
-
Laws and regulations
-
Historical archives
-
Manual annotation: Hand-code all the documents following rules
-
Machine annotation
-
Keyword matching
-
Topic modeling
-
Dictionary-based sentiment analysis
-
-
Manual + Machine annotation
-
Hand-code a portion of documents
-
Train machine learning models
-
Use machine learning models to code the remainder of documents
-
A systematic evalation of GPT’s performance in political text annotation
-
Two Benchmarks
-
Trained annotators
-
Crowdsourced annotators
-
-
Two Metrics
-
Accuracy
-
Intercoder agreement
-
-
Various tasks
-
Vary in terms of complexity
-
Vary in terms of novelty (being part of GTP’s training data or not)
-
-
Clearly defined metrics
-
My favorite replication materials
https://doi.org/10.7910/DVN/PQYF6M
-
Wu, P. Y., Nagler, J., Tucker, J. A., & Messing, S. (2023). Large language models can be used to estimate the latent positions of politicians.
-
Argyle, L. P., Bail, C. A., Busby, E. C., Gubler, J. R., Howe, T., Rytting, C., ... & Wingate, D. (2023). Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale. Proceedings of the National Academy of Sciences, 120(41).
-
Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337-351.
An emerging literature!
-
Inputs
-
Transparent
-
Interpretable
-
-
Process
-
Follow rules
-
Be sophisticated
-
-
Outputs
-
Structured
-
Reproducible
-
Getting the input, process, and output right:
-
Give helpful inputs to guide GPT through the process
-
Scientifically evaluate GPT’s outputs
- Step 1: Create instructions and a “training” set
- Step 2: Apply ChatGPT to the “training” set
- Step 3: Evaluate ChatGPT’s performance with the training set
- Step 4: Apply ChatGPT to annotate documents outside the “training” set
Before an AI assistant starts working, you should:
- Figure what you want
- Tell AI precisely what you want
However, AI can help you:
- Clarify what you want
- Better articulate what you want
Example: Using ChatGPT to label a set of tweets’ relevance to “content moderation” (Gilardi et al. 2023)
“Content moderation” refers to the practice of screening and monitoring content posted by users on social media sites to determine if the content should be published or not, based on specific rules and guidelines.”
We will use the authors’ replication materials:
-
Instructions in
data/instructions.txt
-
A “training” annotated set in
data/training_set.csv
Write instructions as if you are writing them for a human assistant.
-
Be clear and specific about:
-
AI’s role and tasks
-
Inputs
-
Desired outputs
-
-
Be organized. But do not underestimate AI’s ability to understand.
Ultimately, your instructions should make sense to both humans and AIs.
instructions <- readLines("data/input/instructions.txt")
instructions <- str_c(instructions, collapse = "\n")
cat(instructions) # Print the instruction (not shown on slides)
Can we improve it? ChatGPT can help!
ChatGPT’s graphic user interface suffices (no programming is required).
-
The comment on the original instruction is exageratively harsh. I just wanted to get GPT’s attention…
-
Excuse me for the grammatical mistake in my prompt.
I made a few more changes to the instructions. The finalized
instructions is saved in data/instructions_improved_2.txt
.
instructions <- readLines("data/input/instructions_improved_2.txt") |>
str_c(collapse = "\n")
cat(instructions) # Print the improved instruction, not shown on slides
-
Clarify concepts
-
Expand elaborations (e.g., possible scenarios)
You can start with half-cooked instructions and ask ChatGPT to help you build it up. You will be pleasantly surprised how well ChatGPT does.
ChatGPT can inspire, but it cannot replace thinking and decision-making. You should know what you want.
A straightforward but tedious step
-
Annotate a small set of documents using the instructions you have developed. This can be called a “training” set or a “gold standard” set.
-
Usually, each document should be annotated by two persons.
-
Conflicts between two coders? It happens a lot!
-
Discard those with disagreement
-
Discuss and reach an agreement
-
Give them to a third “tie-breaker”
-
-
Adjust the instructions based on tricky cases/ disagreement.
d <- read_csv("data/input/tweets_training.csv",
col_types = cols(status_id = col_character()))
# A small text cleaning to replace line breaks with spaces
d <- d |> mutate(text = str_replace_all(text, "(\n|\r)", " "))
Q: How do they deal with conflicts?
For this in-class demonstration, let’s make a smaller “training” set.
set.seed(12)
d_train <- d |>
select(status_id, text, relevant_ra) |>
filter(!is.na(relevant_ra)) |>
sample_n(40) |>
arrange(status_id)
table(d_train$relevant_ra)
##
## 0 1
## 20 20
-
Now, it is time to delegate the work to the AI!
-
Do not rush into this step. But there is no need to do too much marginal improvement on the instructions and the training set.
-
To discuss:
-
How to annotate one example document?
-
How to annotate many documents in batch?
-
First, I will demonstrate how to use ChatGPT to annotate one document.
to_annotate_id <- d_train$status_id[1]
to_annotate_text <- d_train$text[1]
to_annotate_raLabel <- d_train$relevant_ra[1]
cat(to_annotate_id, "\n\n", to_annotate_text, "\n\n", to_annotate_raLabel)
1212265682634604544
ONCE ! Update : the police are still investigating so please dont report his account anymore Tweets are needed for his further actions Just unfollow and block
1
# Required packages
library(httr) # Package to communicate with the ChatGPT API
library(jsonlite) # Package to parse results JSON-format results returned from ChatGPT
api_key <- readLines("data/input/api/api_key.txt")
api_url <- readLines("data/input/api/api_url_gpt3.5.txt")
# Make an API call
response <- POST(
url = api_url,
add_headers(`Content-Type` = "application/json", `api-key` = api_key),
encode = "json",
body = list(
temperature = 0,
# 0 to 1. How "creative" you want ChatGPT to be. Smaller = less creative.
messages = list(
list(role = "system", content = instructions),
list(role = "user", content = to_annotate_text))
)
)
dir.create("data/output_gpt3.5")
write_rds(response, str_c("data/output_gpt3.5/", to_annotate_id, ".rds"))
# Save ChatGPT's outputs
response <- read_rds(str_c("data/output_gpt3.5/", to_annotate_id, ".rds"))
content(response)
## $id
## [1] "chatcmpl-8W48pOVM7shiMPXXqUYI4aHcWg25z"
##
## $object
## [1] "chat.completion"
##
## $created
## [1] 1702653107
##
## $model
## [1] "gpt-35-turbo"
##
## $choices
## $choices[[1]]
## $choices[[1]]$finish_reason
## [1] "stop"
##
## $choices[[1]]$index
## [1] 0
##
## $choices[[1]]$message
## $choices[[1]]$message$role
## [1] "assistant"
##
## $choices[[1]]$message$content
## [1] "0"
##
##
##
##
## $usage
## $usage$prompt_tokens
## [1] 362
##
## $usage$completion_tokens
## [1] 1
##
## $usage$total_tokens
## [1] 363
# This gets ChatGPT's response
content(response)$choices[[1]]$message$content
## [1] "0"
# Compare with RA label
to_annotate_raLabel
## [1] 1
# Note: packages have been loaded, and api_key and api_url have been defined.
for (i in 1:5){ # Only loop through 5 for demo purpose
to_annotate_id <- d_train$status_id[i]
to_annotate_text <- d_train$text[i]
# Above I do a small string operation -- replacing line breaks by spaces
to_annotate_raLabel <- d_train$relevant_ra[i]
response <- POST(
url = api_url,
add_headers(`Content-Type` = "application/json", `api-key` = api_key),
encode = "json",
body = list(
temperature = 0,
# 0 to 1. How "creative" you want ChatGPT to be. Smaller = less creative.
messages = list(
list(role = "system", content = instructions),
list(role = "user", content = to_annotate_text))
)
)
to_annotate_gptLabel <- content(response)$choices[[1]]$message$content
write_rds(response, str_c("data/output_gpt3.5/", to_annotate_id, ".rds"))
Sys.sleep(1) # Sleep for 1 seconds after finishing each doc.
message(i, " of ", nrow(d_train))
# Optional below: Print results to get a "live update"
message("status_id: ", to_annotate_id, "\n", "text: ", to_annotate_text)
message("Human: ", to_annotate_raLabel, "\t", "ChatGPT: ", to_annotate_gptLabel, "\n")
}
# Get file names of ChatGPT's responses. Note that file names are the status_id
# This helps us identify which file is from which tweet
file_names <- list.files("data/output_gpt3.5")
gpt_labels <- rep(NA, length(file_names))
for (i in seq_along(file_names)){
response <- read_rds(file.path("data/output_gpt3.5", file_names[i]))
gpt_labels[i] <-
ifelse(
is.null(content(response)$choices[[1]]$message$content),
NA, content(response)$choices[[1]]$message$content)
# The above ifelse() function handles the situation when the output is "content-mderated" by Microsoft!
}
d_gptLabel <- tibble(
status_id = str_remove(file_names, "\\.rds$"),
relevant_gpt = gpt_labels)
d_gptLabel |> print(n = 3)
## # A tibble: 40 × 2
## status_id relevant_gpt
## <chr> <chr>
## 1 1212265682634604544 0
## 2 1212425759421292546 1
## 3 1217297045108723713 0
## # ℹ 37 more rows
d_train_merge <- d_train |> inner_join(d_gptLabel, by = "status_id")
d_train_merge |> print(n = 5)
## # A tibble: 40 × 4
## status_id text relevant_ra relevant_gpt
## <chr> <chr> <dbl> <chr>
## 1 1212265682634604544 ONCE ! Update : the police a… 1 "0"
## 2 1212425759421292546 The release of these updated ter… 1 "1"
## 3 1217297045108723713 My account would get suspended i… 1 "0"
## 4 1221020543417233408 - Huawei Prepared for Any Furthe… 0 "1\n1\n0\n0"
## 5 1223437508014493696 Here in Ontario there’s many peo… 0 "0"
## # ℹ 35 more rows
Make a Confusion Matrix to compare GPT’s and humans’ results.
confusion_mat <- with(d_train_merge, table(relevant_ra, relevant_gpt, useNA = "ifany"))
confusion_mat
## relevant_gpt
## relevant_ra 0 1 1\n1\n0\n0
## 0 15 4 1
## 1 5 15 0
How good do you think this result is? It agrees with the human annotator in 3/4 of the tasks.
Check anomalies
# Check the tweet with wrong output
d_train_merge |> filter(relevant_gpt == "1\n1\n0\n0") %>% .$text |> print()
[1] “- Huawei Prepared for Any Further U.S. “Attacks” - Google and IBM have both called for global regulations on AI - Social media use not linked to poor mental health - Spotify is testing letting influencers post stories to introduce their own playlists https://t.co/BOv2kYNIjf”
# Recode manually
d_train_merge <- d_train_merge |>
mutate(relevant_gpt =
case_match(relevant_gpt, "1\n1\n0\n0" ~ "1", .default = relevant_gpt))
# Regenerate the confusion matrix
confusion_mat <- with(d_train_merge, table(relevant_ra, relevant_gpt, useNA = "ifany"))
confusion_mat
## relevant_gpt
## relevant_ra 0 1
## 0 15 5
## 1 5 15
library(caret)
confusionMatrix(confusion_mat)
## Confusion Matrix and Statistics
##
## relevant_gpt
## relevant_ra 0 1
## 0 15 5
## 1 5 15
##
## Accuracy : 0.75
## 95% CI : (0.588, 0.8731)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.001111
##
## Kappa : 0.5
##
## Mcnemar's Test P-Value : 1.000000
##
## Sensitivity : 0.750
## Specificity : 0.750
## Pos Pred Value : 0.750
## Neg Pred Value : 0.750
## Prevalence : 0.500
## Detection Rate : 0.375
## Detection Prevalence : 0.500
## Balanced Accuracy : 0.750
##
## 'Positive' Class : 0
##
-
Manually examine the tweets where the human annotator disagree with ChatGPT
-
Think: How can you improve the instructions?
-
Do not rule out the possibility that human annotator make mistakes.
-
You are confident about your instructions
-
Apply the model to documents outside the “training” set.
-
Estimate the cost
-
Watch out for rate limit (add enough pauses between iterations)
-
Do it in batch
-
Add error-handling modules
-
Save all raw outputs as individual files with unique identifiers
-
You can start over without redoing what has been done
-
Check for rate limit error code
-
for (i in 1:nrow(d_to_annotate)) {
to_annotate_id <- d_to_annotate$status_id[i]
to_annotate_text <- d_to_annotate$text[i]
file_path <- file.path(dir_output, str_c(to_annotate_id, ".rds"))
# Check if a document has been processed, if so, skipped
if (file.exists(file_path)){
response <- read_rds(file_path)
gpt_label <- content(response)$choices[[1]]$message$content
error_msg <- ifelse(is.null(gpt_label), content(response)$error$code, "")
if (error_msg %in% c("", "content_filter")){
message(i, " of ", nrow(data_batch), " label: ", gpt_label, error_msg, " SKIPPED!")
next
}
}
response <- POST(
url = api_url,
add_headers(`Content-Type` = "application/json", `api-key` = api_key),
encode = "json",
body = list(
temperature = 0,
messages = list(
list(role = "system", content = instructions),
list(role = "user", content = to_annotate_text))))
gpt_label <- content(response)$choices[[1]]$message$content
error_msg <- ifelse(is.null(gpt_label), content(response)$error$code, "")
write_rds(response, file_path)
Sys.sleep(1)
message(i, " of ", nrow(data_batch), " label: ", gpt_label, error_msg)
}
api_url_gpt4 <- readLines("data/input/api/api_url_gpt4.txt")
dir.create("data/api/output_gpt4")
for (i in 1:5){ # Just do first 5 for demo purpose
to_annotate_id <- d_train$status_id[i]
to_annotate_text <- d_train$text[i]
to_annotate_raLabel <- d_train$relevant_ra[i]
response <- POST(
url = api_url_gpt4,
add_headers(`Content-Type` = "application/json", `api-key` = api_key),
encode = "json",
body = list(
temperature = 0,
messages = list(
list(role = "system", content = instructions),
list(role = "user", content = to_annotate_text))
)
)
to_annotate_gptLabel <- content(response)$choices[[1]]$message$content
write_rds(response, str_c("data/output_gpt4/", to_annotate_id, ".rds"))
Sys.sleep(1) # Sleep for 1 seconds after finishing each doc.
message(i, " of ", nrow(d_train))
# Optional below: Print results to get a "live update"
message("status_id: ", to_annotate_id, "\n", "text: ", to_annotate_text)
message("Human: ", to_annotate_raLabel, "\t", "ChatGPT: ", to_annotate_gptLabel, "\n")
}
# Get file names of ChatGPT's responses. Note that file names are the status_id
# This helps us identify which file is from which tweet
file_names <- list.files("data/output_gpt4")
gpt_labels <- rep(NA, length(file_names))
for (i in seq_along(file_names)){
response <- read_rds(file.path("data/output_gpt4", file_names[i]))
gpt_labels[i] <-
ifelse(
is.null(content(response)$choices[[1]]$message$content),
NA, content(response)$choices[[1]]$message$content)
# The above ifelse() function handles the situation when the output is "content-mderated" by Microsoft!
}
d_gptLabel <- tibble(
status_id = str_remove(file_names, "\\.rds$"),
relevant_gpt4 = gpt_labels)
d_gptLabel |> print(n = 5)
## # A tibble: 40 × 2
## status_id relevant_gpt4
## <chr> <chr>
## 1 1212265682634604544 1
## 2 1212425759421292546 1
## 3 1217297045108723713 1
## 4 1221020543417233408 0
## 5 1223437508014493696 0
## # ℹ 35 more rows
d_train_merge_2 <- d_train_merge |> inner_join(d_gptLabel, by = "status_id")
d_train_merge_2 |> print(n = 5)
## # A tibble: 40 × 5
## status_id text relevant_ra relevant_gpt relevant_gpt4
## <chr> <chr> <dbl> <chr> <chr>
## 1 1212265682634604544 ONCE ! Update … 1 0 1
## 2 1212425759421292546 The release of the… 1 1 1
## 3 1217297045108723713 My account would g… 1 0 1
## 4 1221020543417233408 - Huawei Prepared … 0 1 0
## 5 1223437508014493696 Here in Ontario th… 0 0 0
## # ℹ 35 more rows
write_csv(d_train_merge_2, "data/output_clean/labeled.csv")
with(d_train_merge_2, table(relevant_ra, relevant_gpt4, useNA = "ifany"))
## relevant_gpt4
## relevant_ra 0 1
## 0 20 0
## 1 0 20
100% CORRECT!
ChatGPT 4 is much more expensive than GPT 3.5. But it is probably worth the money.
Source: Microsoft Azure
ChatGPT may refuse to give a response if the input violates its content moderation policy.
Below is the error message you get if it happens (for the Azure version of ChatGPT)
The response was filtered due to the prompt triggering Azure OpenAI’s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766
Source: https://go.microsoft.com/fwlink/?linkid=2198766
A recent student’s project on the Israel-Palestine conflict
-
Used ChatGPT to label tweets containing keywords related to the recent Israel-Palestine conflict
-
15% of the tweets submitted for classification gets NULL results because of content moderation
What words distinguishes content-moderated tweets from the others? (Odds Ratio)
The content moderation policy may limit our ability to use ChatGPT on studies about discourse related to wars, conflicts, and violence!
- Step 1: Create instructions and a “training” set
- Write instructions
- Create a “training” annotated set
- Step 2: Apply ChatGPT to the “training” set
- Give ChatGPT the instruction
- Use ChatGPT to annotate the “training” set
- Clean ChatGPT’s outputs
- Step 3: Evaluate ChatGPT’s performance
- Examine how well ChatGPT’s annotation agrees with the “training” set
- If satisfactory, go to Step 4
- If not satisfactory:
- Examine cases where human coders disagree with the machine
- Modify instructions
- Check the “training” set
- Go back to Step 2
- Step 4: Apply ChatGPT to annotate documents outside the “training” set.
-
Rate limit
-
Empty strings (usually after text pre-processing)
-
Content moderation
-
Usually, the most difficult part is figuring out what YOU want the machine to do for you.
-
Test on small samples to avoid unnecessary costs.
-
Be open to being persuaded by the machine. Ask the machine to explain reasons if you are stuck.
-
Tools come and go
-
Long live Generative AIs
-
Skills with ChatGPT can be generalizable to other Generative AIs
-
You sacrifice things for simpler operation
-
Hand over more control to machines (and the companies selling them)
-
Less interpretability
-
Less reproducibility
-
-
If possible, use smaller, open-sourced, and locally-executable models