I automated scrape the IMDb movie ratings and their details with the help of the BeautifulSoup and Selenium libraries of Python then store scraped data in CSV file
Data we need to extract
- Movie Title
- Year
- Duration
- Genre
- Rating
Below is the list of modules required to scrape from IMDB.
- requests: Requests library is an integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs or Web Scrapping, requests must be learned for proceeding further with these technologies. When one makes a request to a URI, it returns a response.
- bs4: BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster.
- pandas: Pandas is a library made over the NumPy library which provides various data structures and operators to manipulate the numerical data.
- Selenium: Selenium bindings provide a convenient API to access Selenium Web Driver like Firefox, Chrome, etc. Selenium WebDriver is an automation testing tool. When I say automation, it means it automates test scripts written in Selenium.