GithubHelp home page GithubHelp logo

giuseppegambino / scraping-tripadvisor-with-python-2020 Goto Github PK

View Code? Open in Web Editor NEW
87.0 7.0 46.0 1.88 MB

Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

License: MIT License

Python 100.00%
python webscraping webscraper webscraper-website tripadvisor tripadvisor-scraper tripadvisorreview selenium

scraping-tripadvisor-with-python-2020's People

Contributors

giuseppegambino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

scraping-tripadvisor-with-python-2020's Issues

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f604' in position 290: character maps to <undefined>"

hello
i'm using the new code that split "things to do" with restaurant.
Restaurant runs well 7 times then i have this error:

"---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
in
19 review = container[j].find_element_by_xpath(".//p[@Class='partial_entry']").text.replace("\n", " ")
20
---> 21 csvWriter.writerow([date, rating, title, review])
22
23 # change the page

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f604' in position 290: character maps to "

please help
Didier

TripAdvisor Massive Changes

Thanks for your assistance the other time.

It's so so surprising that TripAdvisor has change a lot of things about their web html codes.
Well, I have managed to write and modify your codes to fit the new changes on their website.

Unfortunately for me, the new scrapy code I have ONLY scrapes the first review of each review page.

I will really appreciate if you can help me finish this code to scrap all reviews on a page.

Also you can look at how to capture the date {MM:YY} too.

Do reach me at esowah2000atgmail.

LouvreMuseum.zip
com

Missing LICENSE

I'm thinking of using this (or part of this) code in a project, but couldn't find a license (I just want to make sure what I can and can't do with it :) ).

Missing license

Hello Giuseppe. I'm currently writing my master thesis and would like to use this code for scraping hotel reviews, is that okay? It's really important for me. If so, would you kindly put a license so I can use it? Or else I can't, and this is the only code for scraping reviews that has been working for me. Thank you very much.

No such element error

Thanks for your codes. I find it simple and straightforward to use.

I tried to make some adjustments to use it in a project but I faced this error:

  1. Message: no such element: Unable to locate element: {"method":"xpath","selector":"//label[@for="filters_detail_checkbox_trating__5"]"}

  2. Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@Class="nav next taLnk ui_button primary"]"}

I have attached a copy of my scripts to this.

I will appreciate your help.

Louvre Museum.zip

problem in finding the accurate path

Hey,
I am getting the following errors:
File "C:/Users/kaium/Scraping-TripAdvisor-with-Python-2020-master/restaurants_scraper.py", line 26, in
csvFile = open(path_to_file, 'a', encoding="utf-8")
OSError: [Errno 22] Invalid argument: 'E:\x07liza'

please have a look. Hope you can solve the issue. Thanks in advance!

Cannot push branch and open PR

Hello @giuseppegambino ! ๐Ÿ‘
I just wanted to try the code and made some adjustments for firefox and the new cookie SDK used by tripadvisor but I cannot push my branch to the repo. Could you have a look at the permissions ?
Thx and have great EOY celebrations ๐Ÿพ ๐ŸŽ‰

Can't scrap TripAdvisor

Hello,
I proceed till the final part and stoped at error:
"UnicodeEncodeError: 'charmap' codec can't encode character '\xc8' in position 3: character maps to

# to change the page
if (check_exists_by_xpath('//a[@class="ui_button nav next primary "]')):

File "", line 1
if (check_exists_by_xpath('//a[@Class="ui_button nav next primary "]')):
^
IndentationError: unexpected indent

    driver.find_element_by_xpath('//a[@class="ui_button nav next primary "]').click()

File "", line 1
driver.find_element_by_xpath('//a[@Class="ui_button nav next primary "]').click()
^
IndentationError: unexpected indent"

Can you explain what went wrong?
BR
Marek

ElementClickInterceptedException

hello Guiseppe,
i am using Python 3 and i have including the encoding="utf-8".
The code works better but after 60 changes of page, i have another error message:

"ElementClickInterceptedException: Message: element click intercepted: Element ... is not clickable at point (408, 564). Other element would receive the click:

...


(Session info: chrome=87.0.4280.88)"

any idea?
thks
Didier

Scraping error at a hotel review

First of all many thanks for this valuable tool! Although your scraper works perfectly with the example from Colosseo, when I try to extract hotel reviews, the following error appears:

[ElementClickInterceptedException: element click intercepted: Element ... is not clickable at point (845, 870). Other element would receive the click:

...

(Session info: chrome=86.0.4240.111)]

Thanks in advance for any help.

Getting an error: "NoSuchElementException" for "things to do" scraper

Hello @giuseppegambino!

I was trying your code (thanks a lot for publishing it), and while the code to scrape restaurants' reviews works, the code to scrape reviews for "things to do" does not work. I get this message when I run it:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//div[contains(@data-test-target, 'expand-review')]"}

Would it be possible for you to fix it? It would be extremely helpful. Thanks a lot and happy 2022!

Extract Tripadvisor reviews from a specific page with Google Colab

Hi Giuseppe!
I premise that I am a very novice user of Python, and for the moment, I am using Google Colab to perform some operations. In particular, I am trying to extract the reviews on TripAdvisor at this link:(https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html).

I tried several attempts using BeautifulSoup:
import requests
from bs4 import BeautifulSoup as soup

import requests
from bs4 import BeautifulSoup as soup

import requests
from bs4 import BeautifulSoup as soup

URL della pagina di TripAdvisor

url = 'https://www.tripadvisor.it/Attraction_Review-g2173026-d8059630-Reviews-Bungee_Jumping_Asiago_Enego_Foza_175_metri-Foza_Province_of_Vicenza_Veneto.html'

Effettua la richiesta HTTP per ottenere il contenuto della pagina

html = requests.get(url)
bsobj = soup(html.content, 'html.parser')

Trova tutti i tag 'q' che contengono le recensioni

reviews = []
for r in bsobj.findAll('q'):
reviews.append(r.span.text.strip())
print(r.span.text.strip())

Stampa le recensioni estratte

for review in reviews:
print(review)`

The code seems to work, but the runtime is too long and eventually crashes because of a large idle time on Colab (I even tried inserting an automatic click to avoid the timeout, but it doesn't work).

After that, I tried following your script but when I run:
driver = webdriver.Safari()
I get this error:
"Exception: SafariDriver was not found; are you using Safari 10 or later? You can download Safari from https://developer.apple.com/safari/download/".

The point is that I have the latest version of Safari (version 16.5.1), and I also checked the Safari Development section "Allow remote automation". How do you think I can download the reviews into a txt file or put them into a dataframe?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.