Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

License: MIT License

Python 100.00%

python webscraping webscraper webscraper-website tripadvisor tripadvisor-scraper tripadvisorreview selenium

scraping-tripadvisor-with-python-2020's People

Contributors

Stargazers

Watchers

scraping-tripadvisor-with-python-2020's Issues

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f604' in position 290: character maps to <undefined>"

hello
i'm using the new code that split "things to do" with restaurant.
Restaurant runs well 7 times then i have this error:

"---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
in
19 review = container[j].find_element_by_xpath(".//p[@Class='partial_entry']").text.replace("\n", " ")
20
---> 21 csvWriter.writerow([date, rating, title, review])
22
23 # change the page

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f604' in position 290: character maps to "

please help
Didier

TripAdvisor Massive Changes

Thanks for your assistance the other time.

It's so so surprising that TripAdvisor has change a lot of things about their web html codes.
Well, I have managed to write and modify your codes to fit the new changes on their website.

Unfortunately for me, the new scrapy code I have ONLY scrapes the first review of each review page.

I will really appreciate if you can help me finish this code to scrap all reviews on a page.

Also you can look at how to capture the date {MM:YY} too.

Do reach me at esowah2000atgmail.

LouvreMuseum.zip
com

Missing LICENSE

I'm thinking of using this (or part of this) code in a project, but couldn't find a license (I just want to make sure what I can and can't do with it :) ).

Missing license

Hello Giuseppe. I'm currently writing my master thesis and would like to use this code for scraping hotel reviews, is that okay? It's really important for me. If so, would you kindly put a license so I can use it? Or else I can't, and this is the only code for scraping reviews that has been working for me. Thank you very much.

No such element error

Thanks for your codes. I find it simple and straightforward to use.

I tried to make some adjustments to use it in a project but I faced this error:

Message: no such element: Unable to locate element: {"method":"xpath","selector":"//label[@for="filters_detail_checkbox_trating__5"]"}
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@Class="nav next taLnk ui_button primary"]"}

I have attached a copy of my scripts to this.

I will appreciate your help.

Louvre Museum.zip

Ask Some error your code sir, can you explain me. In the last code sir.

problem in finding the accurate path

Hey,
I am getting the following errors:
File "C:/Users/kaium/Scraping-TripAdvisor-with-Python-2020-master/restaurants_scraper.py", line 26, in
csvFile = open(path_to_file, 'a', encoding="utf-8")
OSError: [Errno 22] Invalid argument: 'E:\x07liza'

please have a look. Hope you can solve the issue. Thanks in advance!

Cannot push branch and open PR

Hello @giuseppegambino ! 👍
I just wanted to try the code and made some adjustments for firefox and the new cookie SDK used by tripadvisor but I cannot push my branch to the repo. Could you have a look at the permissions ?
Thx and have great EOY celebrations 🍾 🎉

'charmap' codec can't encode character '\U0001f60d' in position 107: character maps to <undefined>

Repetitive Reviews scrapped

Hi, I am a newbie to web scrapping and I followed the code to scape
https://www.tripadvisor.co.za/Restaurant_Review-g294265-d3735954-Reviews-Haidilao_Hot_Pot-Singapore.html
this webpage and found that the final csv file contains repetitive reviews. After inspection, I noticed that the code only managed to scrap the first page of the review and just kept repeating the same page scrapping for multiple times.
Can you please advise?
Thank you so much!

Can't scrap TripAdvisor

Hello,
I proceed till the final part and stoped at error:
"UnicodeEncodeError: 'charmap' codec can't encode character '\xc8' in position 3: character maps to

# to change the page
if (check_exists_by_xpath('//a[@class="ui_button nav next primary "]')):

File "", line 1
if (check_exists_by_xpath('//a[@Class="ui_button nav next primary "]')):
^
IndentationError: unexpected indent

    driver.find_element_by_xpath('//a[@class="ui_button nav next primary "]').click()

File "", line 1
driver.find_element_by_xpath('//a[@Class="ui_button nav next primary "]').click()
^
IndentationError: unexpected indent"

Can you explain what went wrong?
BR
Marek

ElementClickInterceptedException

hello Guiseppe,
i am using Python 3 and i have including the encoding="utf-8".
The code works better but after 60 changes of page, i have another error message:

"ElementClickInterceptedException: Message: element click intercepted: Element ... is not clickable at point (408, 564). Other element would receive the click:

...

(Session info: chrome=87.0.4280.88)"

any idea?
thks
Didier

Scraping error at a hotel review

First of all many thanks for this valuable tool! Although your scraper works perfectly with the example from Colosseo, when I try to extract hotel reviews, the following error appears:

[ElementClickInterceptedException: element click intercepted: Element ... is not clickable at point (845, 870). Other element would receive the click:

...

(Session info: chrome=86.0.4240.111)]

Thanks in advance for any help.

Getting an error: "NoSuchElementException" for "things to do" scraper

Hello @giuseppegambino!

I was trying your code (thanks a lot for publishing it), and while the code to scrape restaurants' reviews works, the code to scrape reviews for "things to do" does not work. I get this message when I run it:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//div[contains(@data-test-target, 'expand-review')]"}

Would it be possible for you to fix it? It would be extremely helpful. Thanks a lot and happy 2022!

Extract Tripadvisor reviews from a specific page with Google Colab

Hi Giuseppe!
I premise that I am a very novice user of Python, and for the moment, I am using Google Colab to perform some operations. In particular, I am trying to extract the reviews on TripAdvisor at this link:(https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html).

I tried several attempts using BeautifulSoup:
import requests
from bs4 import BeautifulSoup as soup

import requests
from bs4 import BeautifulSoup as soup

URL della pagina di TripAdvisor

url = 'https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html'

Effettua la richiesta HTTP per ottenere il contenuto della pagina

html = requests.get(url)
bsobj = soup(html.content, 'html.parser')

Trova tutti i tag 'q' che contengono le recensioni

reviews = []
for r in bsobj.findAll('q'):
reviews.append(r.span.text.strip())
print(r.span.text.strip())

Stampa le recensioni estratte

for review in reviews:
print(review)`

The code seems to work, but the runtime is too long and eventually crashes because of a large idle time on Colab (I even tried inserting an automatic click to avoid the timeout, but it doesn't work).

After that, I tried following your script but when I run:
driver = webdriver.Safari()
I get this error:
"Exception: SafariDriver was not found; are you using Safari 10 or later? You can download Safari from https://developer.apple.com/safari/download/".

The point is that I have the latest version of Safari (version 16.5.1), and I also checked the Safari Development section "Allow remote automation". How do you think I can download the reviews into a txt file or put them into a dataframe?

Thank you in advance.

giuseppegambino / scraping-tripadvisor-with-python-2020 Goto Github PK

scraping-tripadvisor-with-python-2020's People

Contributors

Stargazers

Watchers

Forkers

scraping-tripadvisor-with-python-2020's Issues

URL della pagina di TripAdvisor

Effettua la richiesta HTTP per ottenere il contenuto della pagina

Trova tutti i tag 'q' che contengono le recensioni

Stampa le recensioni estratte

Recommend Projects

Recommend Topics

Recommend Org

Jobs