giuseppegambino / scraping-tripadvisor-with-python-2020 Goto Github PK
View Code? Open in Web Editor NEWPython implementation of web scraping of TripAdvisor with Selenium in a new 2019 website
License: MIT License
Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website
License: MIT License
hello
i'm using the new code that split "things to do" with restaurant.
Restaurant runs well 7 times then i have this error:
"---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
in
19 review = container[j].find_element_by_xpath(".//p[@Class='partial_entry']").text.replace("\n", " ")
20
---> 21 csvWriter.writerow([date, rating, title, review])
22
23 # change the page
~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f604' in position 290: character maps to "
please help
Didier
Thanks for your assistance the other time.
It's so so surprising that TripAdvisor has change a lot of things about their web html codes.
Well, I have managed to write and modify your codes to fit the new changes on their website.
Unfortunately for me, the new scrapy code I have ONLY scrapes the first review of each review page.
I will really appreciate if you can help me finish this code to scrap all reviews on a page.
Also you can look at how to capture the date {MM:YY} too.
Do reach me at esowah2000atgmail.
LouvreMuseum.zip
com
I'm thinking of using this (or part of this) code in a project, but couldn't find a license (I just want to make sure what I can and can't do with it :) ).
Hello Giuseppe. I'm currently writing my master thesis and would like to use this code for scraping hotel reviews, is that okay? It's really important for me. If so, would you kindly put a license so I can use it? Or else I can't, and this is the only code for scraping reviews that has been working for me. Thank you very much.
Thanks for your codes. I find it simple and straightforward to use.
I tried to make some adjustments to use it in a project but I faced this error:
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//label[@for="filters_detail_checkbox_trating__5"]"}
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@Class="nav next taLnk ui_button primary"]"}
I have attached a copy of my scripts to this.
I will appreciate your help.
Hey,
I am getting the following errors:
File "C:/Users/kaium/Scraping-TripAdvisor-with-Python-2020-master/restaurants_scraper.py", line 26, in
csvFile = open(path_to_file, 'a', encoding="utf-8")
OSError: [Errno 22] Invalid argument: 'E:\x07liza'
please have a look. Hope you can solve the issue. Thanks in advance!
Hello @giuseppegambino ! ๐
I just wanted to try the code and made some adjustments for firefox and the new cookie SDK used by tripadvisor but I cannot push my branch to the repo. Could you have a look at the permissions ?
Thx and have great EOY celebrations ๐พ ๐
Hi, I am a newbie to web scrapping and I followed the code to scape
https://www.tripadvisor.co.za/Restaurant_Review-g294265-d3735954-Reviews-Haidilao_Hot_Pot-Singapore.html
this webpage and found that the final csv file contains repetitive reviews. After inspection, I noticed that the code only managed to scrap the first page of the review and just kept repeating the same page scrapping for multiple times.
Can you please advise?
Thank you so much!
Hello,
I proceed till the final part and stoped at error:
"UnicodeEncodeError: 'charmap' codec can't encode character '\xc8' in position 3: character maps to
# to change the page if (check_exists_by_xpath('//a[@class="ui_button nav next primary "]')):
File "", line 1
if (check_exists_by_xpath('//a[@Class="ui_button nav next primary "]')):
^
IndentationError: unexpected indent
driver.find_element_by_xpath('//a[@class="ui_button nav next primary "]').click()
File "", line 1
driver.find_element_by_xpath('//a[@Class="ui_button nav next primary "]').click()
^
IndentationError: unexpected indent"
Can you explain what went wrong?
BR
Marek
hello Guiseppe,
i am using Python 3 and i have including the encoding="utf-8".
The code works better but after 60 changes of page, i have another error message:
"ElementClickInterceptedException: Message: element click intercepted: Element ... is not clickable at point (408, 564). Other element would receive the click:
...
any idea?
thks
Didier
First of all many thanks for this valuable tool! Although your scraper works perfectly with the example from Colosseo, when I try to extract hotel reviews, the following error appears:
[ElementClickInterceptedException: element click intercepted: Element ... is not clickable at point (845, 870). Other element would receive the click:
Thanks in advance for any help.
Hello @giuseppegambino!
I was trying your code (thanks a lot for publishing it), and while the code to scrape restaurants' reviews works, the code to scrape reviews for "things to do" does not work. I get this message when I run it:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//div[contains(@data-test-target, 'expand-review')]"}
Would it be possible for you to fix it? It would be extremely helpful. Thanks a lot and happy 2022!
Hi Giuseppe!
I premise that I am a very novice user of Python, and for the moment, I am using Google Colab to perform some operations. In particular, I am trying to extract the reviews on TripAdvisor at this link:(https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html).
I tried several attempts using BeautifulSoup:
import requests
from bs4 import BeautifulSoup as soup
import requests
from bs4 import BeautifulSoup as soup
import requests
from bs4 import BeautifulSoup as soup
html = requests.get(url)
bsobj = soup(html.content, 'html.parser')
reviews = []
for r in bsobj.findAll('q'):
reviews.append(r.span.text.strip())
print(r.span.text.strip())
for review in reviews:
print(review)`
The code seems to work, but the runtime is too long and eventually crashes because of a large idle time on Colab (I even tried inserting an automatic click to avoid the timeout, but it doesn't work).
After that, I tried following your script but when I run:
driver = webdriver.Safari()
I get this error:
"Exception: SafariDriver was not found; are you using Safari 10 or later? You can download Safari from https://developer.apple.com/safari/download/"
.
The point is that I have the latest version of Safari (version 16.5.1), and I also checked the Safari Development section "Allow remote automation". How do you think I can download the reviews into a txt file or put them into a dataframe?
Thank you in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.