A python script that utilizes Selenium and Beautiful Soup to scrape research papers off the web and put them in an excel sheet.
The website scraped was: The Institute for Operations Research and the Management Sciences (https://pubsonline.informs.org/)
The research articles from the year 2011 onwards are scraped.
- Python
- Google chromedriver (preferably a stable version)
- Selenium and BeautifulSoup libraries on Python
Pro Tip: Put the chromedriver executable in /usr/local/bin
file path.
Run the Python file on terminal using the following command:
python3 web_scrape.py
The output is an excel sheet by the name of research_articles.xlsx
. I have posted my output for reference too in a file called research_articles_final.xlsx
.
It contains the following columns:-
- Volume: Volume of the research article. Usually a year is associated with a particular volume. For eg: 2024 is Volume 70, 2023 is Volume 69, and so on.
- Issue: The issue in that particular volume. Usually a month is associated with a particular issue. For eg: January is Issue 1, February is Issue 2, and so on.
- Title: Title of the research article
- Author: The list of authors of the research article
- Abstract: The complete abstract text of the research article
- Accepted_By: The name of the person who accepted the research article
- Accepted_Dept: The department of the person who accepted the research article
- URL: The link that was scraped to get all the research article information