GithubHelp home page GithubHelp logo

projectdossier / citationscreeningreplicability Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 80 KB

[ECIR 2022] Automation of Citation Screening for Systematic Literature Reviews Using Neural Networks: A Replicability Study

License: Apache License 2.0

Python 8.34% Jupyter Notebook 91.66%
systematic-literature-reviews citation-screening neural-networks replicability

citationscreeningreplicability's Introduction

CitationScreeningReplicability

arXiv

This repository is the official implementation of the ECIR 2022 paper Automation of Citation Screening for Systematic Literature Reviews using Neural Networks: A Replicability Study.

Citing

If you find our code useful, please cite our paper:

@inproceedings{kusa2022automation,
  title={Automation of Citation Screening for Systematic Literature Reviews Using Neural Networks: A Replicability Study},
  author={Kusa, Wojciech and Hanbury, Allan and Knoth, Petr},
  booktitle={European Conference on Information Retrieval},
  pages={584--598},
  year={2022},
  organization={Springer}
}

Installation

Tested with Python 3.8.

Install requirements with pip:

$ pip install -r requirements.txt

Datasets

Clinical

Original Clinical review datasets can be downloaded from here. Use src/data/prepare_clinical_data.py script to prepare the datasets. Make sure that the variable repository_path is set to a root of a bwallace/citation-screening/ repository.

Drug

Original Drug review datasets can be downloaded from here.

This dataset does not contain Abstract and Title information, so this data needs to be downloaded from PubMed using the article's PubMed ID. Place epc-ir.clean.tsv input file in a data/external/drug/ folder and run src/data/prepare_drug_data.py script.

SWIFT

Original SWIFT review datasets can be downloaded from here.

  • OHAT datasets (PFOA/PFOS, Bisphenol A (BPA), Transgenerational and Fluoride and neurotoxicity) are stored as four sheets in one Excel file.

  • CAMRADES dataset (Neuropathic pain) is stored as a separate Excel file.

Fluoride and neurotoxicity, and Neuropathic pain already contain a title and abstract data, so the only needed preparation step is a conversion of the Label column into a common format.

Other datasets consist only of PubMed IDs and assigned labels so, it is necessary to download abstract and title data using biopython.

src/data/prepare_swift_data.py script accept .tsv files, so you need to convert each dataset into separate .tsv file and place them in data/external/SWIFT/ folder.


For Drug and SWIFT datasets, in order to download documents from Pubmed, you need to set Entrez.email variable to your email address.

Results

Detailed results are stored in reports/ directory

  • results-document_features.csv file contains detailed results of input document feature influence for all models and datasets.
  • results-precision_at_95recall.csv file contains detailed precision@95% recall results for all models and datasets.
  • results-time.csv file contains training time measurement results for all models and datasets.

Figures

In order to recreate the figures, run jupyter notebook notebooks/plotting.ipynb.

Dataset statistics

In order to calculate dataset statistics, run src/data/dataset_statistics.py script.

citationscreeningreplicability's People

Contributors

wojciechkusa avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zhangbeibei1991

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.