GithubHelp home page GithubHelp logo

ref2ref2ref's Introduction

ref2ref2ref

Find citations recursively from DOIs and CrossRef.

Table of Contents

Introduction

The ref2ref2ref project aims to automate the retrieval of citations from an initial list of DOIs. It fetches related citations recursively up to a specified depth using the CrossRef API.

My motivation for this project was to automate the process of finding related citations for a list of DOIs. I wanted to find the most relevant citations for a list of DOIs, and see how they branch out but I didn't want to do it manually. I also wanted to learn how to use the CrossRef API.

After collecting the list of relevant references, its then possible to scrape them from the web (direct, unpaywall, scihub, etc.) using my other repository ref2pdf

Next use case is then to take the downloaded pdfs and use them to train an existing FOSS machine learning model. Maybe make a chatbot that can answer questions about the papers? Focussing on the ones in the first layer of the citation tree, then less so for the second layer, etc.

ATM. The output lists of dois are not tagged as to which layer they belong to. But each layer is saved in a separate file. So you can see which layer they belong to by looking at the file name.

After processing, you can take the list of DOIs and use them to download the pdfs, or throw the list into zotero for it to collect the metadata and maybe pdfs.

Easy improvements:

  • Connect the doi list output with Crossref again to fill in the metadata automatically.
  • Output a proper bibtex file.

One cool thing to do would be to automate the production of a graph of the citation tree.

Installation

  1. Clone this repository:

    git clone https://github.com/Stew-McD/ref2ref2ref.git
  2. Navigate to the project directory:

    cd ref2ref2ref
  3. Install the required packages:

    pip install -r requirements.txt

Usage

  1. Place your .bib file containing the initial DOIs in the data/input directory. (make it if you have to ๐Ÿ˜‰)

  2. Run the main.py script:

    python src/main.py
  3. The list of related DOIs will be saved in the data/output directory.

Configuration

The configuration file config.py in the src directory allows you to set the recursion depth and the input/output file names. See the comments in config.py for more details.

Contributing

If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcomed.

License

This project uses the Unlicense.

Contact

ref2ref2ref's People

Contributors

stew-mcd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.