This project aims to scrape data from TripAdvisor for sentiment analysis purposes.
In this project, we utilize web scraping techniques to extract data from TripAdvisor, focusing on customer reviews of various tourism attractions in Malaysia. The extracted data serves as the foundation for sentiment analysis, enabling us to gain insights into the opinions and emotions conveyed within these reviews.
Tourism is a significant industry for Malaysia, with its diverse cultural heritage, natural landscapes, and modern attractions attracting millions of visitors each year. Understanding the sentiments expressed by tourists in their reviews can provide valuable insights for tourism stakeholders, including attraction operators, tourism boards, and policymakers.
However, one of the significant challenges in sentiment analysis research and applications is the limited availability of open-source data. This scarcity hinders the development and testing of sentiment analysis models and tools. To address this issue, this project was created to collect data from TripAdvisor, a popular platform where users share their experiences and opinions about tourism attractions.
By scraping data from TripAdvisor, we aim to build a dataset that can be used by researchers, developers, and enthusiasts for sentiment analysis tasks. This dataset not only provides valuable insights into the sentiments of travelers visiting Malaysian attractions but also contributes to the advancement of sentiment analysis research in general.
To get started with this project, follow these steps:
- Clone the Repository:
git clone https://github.com/joc-rgb/tripadvisor-web-scraping.git
- Navigate to the Project Directory:
cd tripadvisor-web-scraping
- Install Dependencies:
npm install
- Run the Scraper:
node scraper.js
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. Feel free to modify and distribute it as you see fit.