TripAdvisor Web Scraping

This project aims to scrape data from TripAdvisor for sentiment analysis purposes.

Introduction

In this project, we utilize web scraping techniques to extract data from TripAdvisor, focusing on customer reviews of various tourism attractions in Malaysia. The extracted data serves as the foundation for sentiment analysis, enabling us to gain insights into the opinions and emotions conveyed within these reviews.

Background

Tourism is a significant industry for Malaysia, with its diverse cultural heritage, natural landscapes, and modern attractions attracting millions of visitors each year. Understanding the sentiments expressed by tourists in their reviews can provide valuable insights for tourism stakeholders, including attraction operators, tourism boards, and policymakers.

However, one of the significant challenges in sentiment analysis research and applications is the limited availability of open-source data. This scarcity hinders the development and testing of sentiment analysis models and tools. To address this issue, this project was created to collect data from TripAdvisor, a popular platform where users share their experiences and opinions about tourism attractions.

By scraping data from TripAdvisor, we aim to build a dataset that can be used by researchers, developers, and enthusiasts for sentiment analysis tasks. This dataset not only provides valuable insights into the sentiments of travelers visiting Malaysian attractions but also contributes to the advancement of sentiment analysis research in general.

Installation

To get started with this project, follow these steps:

Clone the Repository:

git clone https://github.com/joc-rgb/tripadvisor-web-scraping.git

Navigate to the Project Directory:

cd tripadvisor-web-scraping

Install Dependencies:

npm install

Run the Scraper:

node scraper.js

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. Feel free to modify and distribute it as you see fit.

joc-rgb / tripadvisor-web-scraping Goto Github PK