๐ช Life is a Circus and We are the Clowns ๐คก: Automatically Finding Analogies between Situations and Processes
This repository contains the code for the paper: https://arxiv.org/abs/2210.12197.
Authors: Oren Sultan, Dafna Shahaf, The Hebrew University of Jerusalem, Israel.
Conference: The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
The code is implemented in python 3.8.12. To run it, please install the requirements.txt file:
pip install -r minimalrequirements.txt
Explore the paper_experiments_results folder for restoring the results in the experiment
(each folder contains a separate README file).
Run runner.py for running our algorithm on a specific example of pairs of texts.
Note that you don't need to run coreference and qa_srl, as the output files have already exist in the repo.
(You should run coreference and qa_srl only if you use a new input text files,
by setting run_coref=False, run_qasrl=False in analogous_matching_algorithm function)
paper_experiments_results:
Contains the datasets, the labels of the annotators, as well as the data which generates the results in the figures
and tables of the three experiments. Each inner folder contains a separate README file.
data:
Includes the following folders:
original_text_files -- all the original texts files (including the stories and paragraphs from ProPara).
coref_text_files -- all the texts files after coreference (including the stories and paragraphs from ProPara).
propara -- data files relevant to ProPara dataset, output files of the ranking lists for the different models
(see Section 4.1 in the paper), and some code files to read and print stats on ProPara and the methods.
s2e-coref:
Contains the implementation code for the coreference model that we used (see Section 3.1 in the paper).
qasrl-modeling
Contains the implementation code for the QA-SRL model that we used (see Section 3.2 in the paper).
runner.py -- runner of our analogous matching algorithm on given pairs.
find_mappings.py -- run FMQ method on a given pair of texts (called from outside to generate_mappings function).
find_mappings_verbs.py -- run FMV method on a given pair of texts (called from outside to generate_mappings function).
sentence_bert.py -- run SBERT on a given pair of texts.
coref.py -- run our coreference implementation on input files.
qa-srl.py -- run our QA-SRL implementation on texts files (after coref).
run_propara_all_pairs_exp.py -- run experiment 1 (see Section 4.1 in the paper).
analogies_mining_exp_annotators_consistency.py -- run annotators consistency confusion matrix
(see Section 4.1 in the paper).
run_mappings_evaluation_exp.py -- run experiment 2 (see Section 4.2 in the paper).
run_robustness_to_paraphrases_exp.py -- run experiment 3 (see Section 4.3 in the paper).
@article{sultan2022life,
title={Life is a Circus and We are the Clowns: Automatically Finding Analogies between Situations and Processes},
author={Sultan, Oren and Shahaf, Dafna},
journal={arXiv preprint arXiv:2210.12197},
year={2022}
}
For inquiries, please send an email to [email protected].