GithubHelp home page GithubHelp logo

tomslee / airbnb-data-collection Goto Github PK

View Code? Open in Web Editor NEW
477.0 477.0 183.0 31.83 MB

Data collection for Airbnb listings.

License: MIT License

Python 88.60% PLpgSQL 9.57% SQLPL 1.17% Dockerfile 0.09% Shell 0.06% TSQL 0.50%

airbnb-data-collection's People

Contributors

aashishg avatar cortesimone avatar dependabot[bot] avatar deroses avatar jenslaufer avatar joao avatar neolithera avatar romanseidl avatar tomk32 avatar tomslee avatar tomslee-sap avatar xecgr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airbnb-data-collection's Issues

Adding new search parameter/columns?

Great project. (And great README. I am not a programmer and still got it running easy).

I was wondering if there was an easy way to add a column, like the availability_30 and availability_90 that insideairbnb scrapes?

License?

This is really cool work, but is it only here for informational purposes, or do you invite collaboration on it too?

Thanks for sharing!

Price hidden behind Javascript?

First of all thanks for your amazing work. I noticed that the code for get_price doesn't work for me, since requests.get doesn't fetch the respective element ("//meta[@itemprop='price']"). Hence the value for price is None. I tried a workaround using Selenium and it worked. Am I doing something wrong or is the scraper broken at the moment? Thanks!

Almost there...

I was able to set up the DB and install python requirements (although the latest Postgre 10, and python 3.7, with newer versions of requirements, sometimes), set up the config, and have created a bounding box in the DB, but when I run the "python airbnb.py -sb" option, while the script is running I get:

INFO Rectangle calculated: [42.3232, -71.14235, 42.2946, -71.179]
INFO Searching rectangle: zoom factor = 1, node = [[1, 1]]
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
ERROR Operational error (connection closed): resuming
INFO Page 01 returned 15 listings
INFO Final page of listings for this search
INFO Results: 1 pages, 0 new rooms
INFO Finishing survey 1, for Brookline

In the end, nothing was written to "rooms" table. Do you have any idea where I strayed off the path? I am a beginner with Postgre, but this kind of looks like a DB problem. I am wondering if I need to uninstall python 37 and go with 34, and exact same versions of py requirements as in your requirements.txt file. Thanks for anyone's advice.

Airbnb change breaking the collection script

I have heard from a couple of users that collection has been broken by an update to the Airbnb web site. I have been unable to work on it this week, but hope to have this done by Sunday Feb 12, so long as it's just a tweak that is needed.

get() takes exactly 1 argument (6 given)

INFO:root:----------------------------------------------------------------------
airbnbcollector | INFO:root:Room 14006298: getting from Airbnb web site
airbnbcollector | ERROR:root:Network request exception: type TypeError
airbnbcollector | Traceback (most recent call last):
airbnbcollector | File "/airbnb_ws.py", line 83, in ws_individual_request
airbnbcollector | headers=headers, cookies=cookies, proxies=proxies)
airbnbcollector | TypeError: get() takes exactly 1 argument (6 given)

Any thoughts?

[help] How to run schema(s) in postgresql

Hi there,
This is not an issue but more of a help request to get the script running.

The schema is in the two files postgresql/schema.sql and postgresql/functions.sql. You need to run those to create the database tables to start with.

How do I run the schema(s) in order to create the tables?
I think I have everything else setup properly. I am not at the step where it says "database "" does not exist.
Thank you

Missing properties in new version

I've run the new version (May 2019 (3.6)) alongside an older one (June 2018 (3.4)) multiple times using identical search areas, variables, etc. and continuously get 18 fewer listings in the 'new' results compared to the 'old'.

Is one 'page' of 18 listings getting dropped somewhere along the line before it's recorded in the May 2019 (3.6) release?

Only 18 items returned

Hi there,

I got an issue retrieving results of a search by bounding-box. The issue is that I only get 18 items returned:

INFO Searching rectangle: zoom factor = 0, node = []
INFO Page 01 returned 18 listings
INFO Page 02 returned 00 listings
INFO Final page of listings for this search
INFO Results: 2 pages, 0 new rooms

So it looks like that the pagination is not working properly.
Was there again a change in the airbnb-page, or am I missing something. I tried several bound boxes over cities, statest, etc. in germany.

Thank's for any help.

IP address blocked

Hi,

This looks like a fantasic project. Thanks for sharing!
I am using bounding box, I updated the database
and I keep getting:

WARNING HTTP status 400 from web site: IP address blocked. Waiting 1.0 minutes.

Is there something I am doing wrong?
How can I resolve this?

Thank you in advance,
Andreas

Search finishes without starting and problems making a new 'asa'

Hi,

first of all, great job with this project! I'm trying to get your code to work, and after adding a search area and a survey, I start the survey with 'python airbnb.py -s 1', but it never seems to find anything and finishes in less than a second. I've tried to create a survey for London twice, and Paris once, but the result is the same regardless.
Is this just a problem I'm having, or have AirBnb updated their website since the last commit? (The master code didn't work at all for me, so I'm using the code on the dev branch)

python airbnb.py -s 1
INFO ======================================================================
INFO Survey 1, for London
INFO Searching by neighborhood
INFO Finishing survey 1, for London

Additionally, it seems to have troubles connecting to AirBnb? Often when I add a new 'asa' it fails, but when I retry it a few times magically works.

python airbnb.py -asa "London"
ERROR Error collecting city and neighborhood information
ERROR Error getting city info from website
ERROR Top level exception handler: quitting.
Traceback (most recent call last):
File "airbnb.py", line 444, in main
ws_get_city_info(ab_config, args.addsearcharea, ab_config.FLAGS_ADD)
File "airbnb.py", line 271, in ws_get_city_info
conn.commit()
UnboundLocalError: local variable 'conn' referenced before assignment

do you have any idea what could result in these problems?

edit: Just FYI I'm running a Postgres DB in AWS, and I've tried to run the scripts both directly on my Mac, and through an EC2 instance. Entries in the DB is updated when I add e.g London, so the connection definitely works to the DB both within AWS and from my local Mac.

airbnb-home pictures-collection

Hey there, how can I get all the home pictures out of a given city? I don't know python at all. Can anyone teach me how to get the result that I want? I appreciate a lot~~~~
Cheers,
Cyberlilian

IP address blocked and survey quitting instantly

Hi Tom,

Thank you so much for your instruction and script. This would help my research a lot. I have implemented all the steps in the README, including constructing a database through pgAdmin, but I always meet problems when implementing the survey. Whichever city I choose, the survey ends instantly when I start it, and no data is stored in the database. This happens when I search by both neighborhood and zip code.

/Users/xins/anaconda3/lib/python3.5/site-packages/psycopg2/init.py:144:
UserWarning: The psycopg2 wheel package will be renamed from release 2.8;
in order to keep installing from binary please use "pip install psycopg2-binary" instead.
For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")
INFO ==============================================================
INFO Survey 8, for atlanda
INFO Searching by zipcode
INFO Finishing survey 8, for atlanda

When I search through bounding box, I constantly receive the message that I am blocked by the website.

INFO ==============================================================
INFO Survey 8, for atlanda
INFO Searching by bounding box, max_zoom=12
INFO ----------------------------------------------------------------------
INFO Rectangle calculated: [33.887618, -84.289389, 33.647808, -84.551819]
INFO Searching rectangle: zoom factor = 0, node = []
WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes.
WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes.
WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes.

This warning message repeats as my survey go on, and no data is stored. Is it possible that you can let me know where I may possibly make mistakes or mess up some steps?

Some advice about license compliance

Hello, such a nice repository benefits me a lot and so kind of you to make it open source!

Question
There’s some possible legal issues on the license of your repository when you combine numerous third-party packages.
For instance, lxml, argparse and psycopg2 you imported are licensed with BSD License, Python Software Foundation License and GNU Library or Lesser General Public License (LGPL), respectively.
However, the MIT License of your repository are less strict than above package licenses, which has violated the whole license compatibility in your repository and may bring legal and financial risks.

Advice
You can select another proper license for your repository, or write a custom license with license exception if some license terms couldn’t be summed up consistently.

Best wishes!

Error with python airbnb.py -asa "Paris"

Hello I am very interested in your work but I come across this error and I can not solve it
python airbnb.py -asa "City Name"

2017-02-17 23:42:19,461 ERROR Failed to add survey for Paris
2017-02-17 23:42:19,462 ERROR Top level exception handler: quitting.
Traceback (most recent call last):
File "airbnb.py", line 2394, in main
db_add_survey(ab_config, args.addsurvey)
File "airbnb.py", line 1613, in db_add_survey
survey_id = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable

Is what you can me debegger

Thanks

availability

Hi,
it is not clear to me how to gather availability for a given room.
Thanks,
S.

how to collect reviews textual data?

hi @tomslee !

thanks for makign this code available. I was looking at the insideairbnb website where you also have reviews and calendars for each listing id. How do you collect that data? I can't seem to find it in this script here. Thanks for your help!
florian

Script works, but returns low number of listings

First of all, thanks for the great project!

I have been able to get some data, but I'm just wondering what's the main issue as I'm only getting a few listings from Helsinki. When I am running python airbnb.py -sb 1 I get the following result

INFO Retrieved logged progress: None, None guests, price None-None
INFO quadtree node []
INFO median node []
INFO Bounding box: [60.297839, 25.254485, 59.922489, 24.782876]
INFO ======================================================================
INFO Survey 1, for helsinki
INFO Searching by bounding box, max_zoom=10
INFO ----------------------------------------------------------------------
INFO Rectangle calculated: [60.297839, 25.254485, 59.922489, 24.782876]
INFO Searching rectangle: zoom factor = 0, node = []
INFO Page 01 returned 00 listings
INFO Results: 1 pages, 0 new rooms
INFO Finishing survey 1, for helsinki

I ran this manually a few times and it mostly just returns nothing like above and at most something like 40 results. This seems a bit odd, as the data should contain thousands of listings. Bounding box is correct, as I am getting correct data in PostGIS, but not much.

Did I just read the docs badly, or is there something that's not working at AirBNB side?

How to scrape a city?

Hi Mr. Slee, very interested in your and Mr Cox's works!

Tried doing this command:

python airbnb.py -asa "Tokyo"
and received the following

ERROR:root:Top level exception handler: quitting.
Traceback (most recent call last):
  File "airbnb.py", line 440, in main
    ws_get_city_info(ab_config, args.addsearcharea, ab_config.FLAGS_ADD)
  File "airbnb.py", line 237, in ws_get_city_info
    cur.execute(sql_check, (citylist[0],))
psycopg2.ProgrammingError: relation "search_area" does not exist
LINE 3:                         from search_area
                   

Default Config selection

Is it possible to switch default config file selection to any file with .config extension ? Or any other suggestion

For Docker dev-alpine branch, database host, name, password and port are prefilled in docker/configs/docker.config.example

User environment variable doesn't seem to work in Alpine linux :

ERROR Failed to read config file properly Traceback (most recent call last): File "/home/jovyan/work/collector/airbnb_config.py", line 61, in __init__ username = os.environ['USER'] File "/opt/conda/lib/python3.5/os.py", line 725, in __getitem__ raise KeyError(key) from None KeyError: 'USER' Traceback (most recent call last): File "airbnb.py", line 563, in <module> main() File "airbnb.py", line 497, in main ab_config = ABConfig(args) File "/home/jovyan/work/collector/airbnb_config.py", line 61, in __init__ username = os.environ['USER'] File "/opt/conda/lib/python3.5/os.py", line 725, in __getitem__ raise KeyError(key) from None KeyError: 'USER'

Error json "returning None"

Hi,
Usually the script works very well, but now I have this error at every pages scrape and 0 room in database:

Searching 'Private room' (1 guests, prices in [60, 80]), zoom 0
2017-01-16 15:29:47,996 INFO    Page 1...
2017-01-16 15:29:49,728 ERROR   Error in __listing_from_search_page_json: returning None
2017-01-16 15:29:49,728 ERROR   Error in __listing_from_search_page_json: returning None
2017-01-16 15:29:49,729 INFO    Private room (1 guests): zoom 0: 0 new rooms, 1 pages

Do you have an idea what may cause the issue?

Thank you
Claire

Proxy locations and fetched prices

I come from insideairbnb.com where it says "Tom Slee regularly scrapes the Airbnb site .." and in your readme I see "I run the script using a number of proxy IP addresses to ...".

Assuming that insideairbnb.com data was scraped using code from your repository, what are your thoughts on airbnb.com returning different results depending on your proxy's location?

To test this idea, I've used VPN to switch to different countries and checked prices for the same listing. Results are returned in local currency so if multiple proxies scrape single city, the resulting csv will also have prices with mixed currencies.

If you agree it's an issue, fetching currency symbol could be partially helpful. Example: One of the listings I've tested, when accessed from USA has price $55, from China ¥377 but from Canada $74.

Error in schema_current.sql

Hi,

I'm trying to build the database using a Docker postgresql container but I'm struggling with schema errors both on schema.sql and schema_current.sql.
I first tried with the schema_current.sql but when building the schema I get an error on the CREATE TABLE public.city:

ERROR: relation "city_city_id_seq" does not exist STATEMENT: CREATE TABLE public.city ( city_id integer NOT NULL DEFAULT nextval('city_city_id_seq'::regclass), name character varying(255), search_area_id integer, CONSTRAINT city_pkey PRIMARY KEY (city_id) ) WITH ( OIDS=FALSE ); psql:/docker-entrypoint-initdb.d/3-schema_current.sql:10: ERROR: relation "city_city_id_seq" does not exist

I've noticed the change between the schema_current.sql and the schema.sql. On the older file (schema.sql) the table was built first, then the SEQUENCE city_city_id_seq.

Can you help me?

PS: The final goal is to construct a docker service with a series of containers for data (postgresql), code (pyhton) and frontends (to be determined...).

Thanks,
Pedro

Bathrooms and bedrooms maybe not int

In china, someone may set this value to 0.5. This cause type convertion error.

so I modify code, change int to float.

            self.bathrooms = float(self.bathrooms)

Not sure about retrieving process is running right

Hello Tom! I am getting this problem when searching by bbox... Warning HTTP Status 400 from web site: IP address blocked.Waiting 1.0 minutes... It seems my university IP is blocked... Do you have any recommendation to overpass this issue? When searching by zipcode or neighborhood, it finishes the process but no data is on DB.. Thanks in advance! I am looking for data within Lisbon boundaries

[solved] Error with latest commit and sb option

Hi there,

I updated to the latest commit and now have this error. (not sure what commit I was using before that - a month old at least)

INFO    Found 18 rooms
ERROR   Exception in get_search_page_info_rectangle
Traceback (most recent call last):
  File "./airbnb.py", line 2059, in ws_search_rectangle
    listing.property_type = json_listing["property_type"]
KeyError: 'property_type'
ERROR   Error
Traceback (most recent call last):
  File "./airbnb.py", line 1186, in __search_loop_bounding_box
    rectangle_zoom, flag)
  File "./airbnb.py", line 1271, in __search_rectangle
    rectangle, rectangle_zoom, flag)
  File "./airbnb.py", line 2059, in ws_search_rectangle
    listing.property_type = json_listing["property_type"]
KeyError: 'property_type'
INFO    Searching by bounding box - logged

Any idea what may have caused this issue?

Thank you

Can I search through all zipcodes or bounding boxes in the U.S.?

Thanks for the amazing project!! Is there a way for me to search over all zip codes in the U.S.? Or maybe divide the U.S. into several bounding boxes and search over all bounding boxes? It seems that your code is based on cities (regardless of whether the search is being done through bounding boxes, neighborhoods or zipcodes). Thank you very much.

How to force to make zoom?

I know for sure, that in Samara(Russia) there are more than 250 apartments, but the parser is not able to find them all. It also stops at zoom equal 0, though max zoom is 4. how to force the parser use all zoom?

Deleting values from postgres

I am collecting "Samara" (Russia) and watch how many raws are in the table "room".
The problem is following: at some time the number of raws is about 500, but when the script ends his work there are only 252 raws. Similar thing was with Saint-Petersburg changingig from 4560 to 4461, but it was not so critical.

IP address blocked when trying to run a search by bouding box.

Hello,

I'm trying to use the crawler to run a bouding box search in Florianopolis, SC, Brazil. But i'm having this output below when i try to run the search. Do you guys have any ideas of what could it be and how to solve it?

2018-09-21 16:30:43,295 INFO Rectangle calculated: [-27.39, -48.36, -27.83, -48.56]
2018-09-21 16:30:43,295 INFO Searching rectangle: zoom factor = 0, node = []
2018-09-21 16:30:43,952 WARNING HTTP status 400 from web site: IP address blocked. Waiting 1.0 minutes.

I believe i added all the configs necessary to run it.

Thank you. :)

Conflicts between airbnb-data-collection and prompt-toolkit

Hi, users are unable to run airbnb-data-collection due to dependency conflict with prompt-toolkit package.
As shown in the following full dependency graph of airbnb-data-collection, airbnb-data-collection requires prompt-toolkit==1.0.9,while pgcli==0.1.5 requires _prompt-toolkit==0.46.
According to pip’s “first found wins” installation strategy, prompt-toolkit==1.0.9 is the actually installed version. However, prompt-toolkit==1.0.9 does not satisfy prompt-toolkit==0.46.

Dependency tree------

airbnb-data-collection-master<version range:>
| +-alabaster<version range:==0.7.6>
| +-anaconda-client<version range:==1.5.4>
| +-apscheduler<version range:==3.0.5>
| +-astroid<version range:==1.3.4>
| +-babel<version range:==2.1.1>
| +-backports-abc<version range:==0.4>
| +-beautifulsoup4<version range:==4.6.0>
| +-boto<version range:==2.45.0>
| +-boto3<version range:==1.4.3>
| +-botocore<version range:==1.4.90>
| +-certifi<version range:==2015.9.6.2>
| +-click<version range:==6.2>
| +-clyent<version range:==1.2.2>
| +-colorama<version range:==0.3.7>
| +-configobj<version range:==5.0.6>
| | +-six<version range:>
| +-decorator<version range:==4.0.10>
| +-docutils<version range:==0.13.1>
| +-folium<version range:==0.2.1>
| | +-jinja2<version range:>
| +-greenlet<version range:==0.4.9>
| +-ipykernel<version range:==4.5.0>
| +-ipython<version range:==5.1.0>
| +-ipython-genutils<version range:==0.1.0>
| +-ipywidgets<version range:==5.2.2>
| +-jedi<version range:==0.9.0>
| +-jinja2<version range:==2.8>
| +-jmespath<version range:==0.9.0>
| +-jsonschema<version range:==2.5.1>
| +-jupyter<version range:==1.0.0>
| +-jupyter-client<version range:==4.4.0>
| +-jupyter-console<version range:==4.0.3>
| +-jupyter-core<version range:==4.2.0>
| +-logilab-common<version range:==0.63.2>
| +-lxml<version range:==3.4.4>
| +-markupsafe<version range:==0.23>
| +-matplotlib<version range:==1.4.3>
| +-mistune<version range:==0.7.3>
| +-nb-anacondacloud<version range:==1.2.0>
| +-nb-conda<version range:==2.0.0>
| +-nb-conda-kernels<version range:==2.0.0>
| +-nbconvert<version range:==4.2.0>
| +-nbformat<version range:==4.1.0>
| +-nbpresent<version range:==3.0.2>
| +-notebook<version range:==4.2.3>
| +-numpy<version range:==1.10.1>
| +-pandas<version range:==0.17.1>
| +-path.py<version range:==0.0.0>
| +-pep8<version range:==1.6.2>
| +-pgcli<version range:==0.20.1>
| | +-click<version range:>=4.1>
| | +-configobj<version range:>=5.0.6>
| | | +-six<version range:>
| | +-pgspecial<version range:>=1.1.0>
| | | +-click<version range:>=4.1>
| | | +-sqlparse<version range:>=0.1.19>
| | +-prompt-toolkit<version range:==0.46>
| | +-psycopg2<version range:>=2.5.4>
| | +-pygments<version range:>=2.0>
| | +-sqlparse<version range:==0.1.16>
| +-pgspecial<version range:==1.2.0>
| | +-click<version range:>=4.1>
| +-pickleshare<version range:==0.7.4>
| +-pillow<version range:==3.0.0>
| +-prompt-toolkit<version range:==1.0.9>
| +-psutil<version range:==3.3.0>
| +-psycopg2<version range:==2.6.1>
| +-pyflakes<version range:==1.0.0>
| +-pygments<version range:==2.1.3>
| +-pylint<version range:==1.4.2>
| +-pyparsing<version range:==2.0.3>
| +-pyreadline<version range:==2.1>
| +-python-dateutil<version range:==2.6.0>
| +-pytz<version range:==2016.7>
| +-pyyaml<version range:==3.12>
| +-pyzmq<version range:==16.0.1>
| +-qtconsole<version range:==4.1.1>
| +-requests<version range:==2.11.1>
| +-rise<version range:==4.0.0b1>
| +-rope-py3k<version range:==0.9.4.post1>
| +-s3transfer<version range:==0.1.10>
| +-scipy<version range:==0.16.0>
| +-seaborn<version range:==0.6.0>
| +-simplegeneric<version range:==0.8.1>
| +-six<version range:==1.10.0>
| +-snowballstemmer<version range:==1.2.0>
| +-sphinx<version range:==1.3.1>
| +-sphinx-rtd-theme<version range:==0.1.7>
| +-spyder<version range:==2.3.8>
| +-sqlanydb<version range:==1.0.8>
| +-sqlparse<version range:==0.1.16>
| +-tabulate<version range:==0.7.5>
| +-tinys3<version range:==0.1.12>
| | +-requests<version range:>=1.2.0>
| +-tornado<version range:==4.4.2>
| +-traitlets<version range:==4.3.1>
| +-tzlocal<version range:==1.2.2>
| | +-pytz<version range:>
| +-wcwidth<version range:==0.1.7>
| +-widgetsnbextension<version range:==1.2.6>
| +-win-unicode-console<version range:==0.5>
| +-xlsxwriter<version range:==0.7.6>

Thanks for your help.
Best,
Neolith

Thank you for this very good work but I have problems with bounding box method

Congratulations on this great job ... questioned regularly by the random changes of Airbnb.

After doing this:
python airbnb.py -asa "Bordeaux"
python airbnb.py -asv "Bordeaux"
update search_area set bb_n_lat = 44.92, bb_s_lat = 44.81, bb_e_lng = -0.53, bb_w_lng = -0.64 where name = 'Bordeaux';
python airbnb.py -sb 1

, I get this:
INFO Bounding box: [44.92, -0.53, 44.81, -0.64]
INFO ===========================================================
INFO Survey 1, for Bordeaux
INFO Searching by bounding box, max_zoom=6
INFO ----------------------------------------------------------------------
INFO Rectangle calculated: [44.92, -0.53, 44.81, -0.64]
INFO Searching rectangle: zoom factor = 0, node = []
INFO Page 01 returned 06 listings
INFO Results: 1 pages, 6 new rooms
INFO Finishing survey 1, for Bordeaux

and 6 records in the room table:
room_id;host_id;room_type;country;city;neighborhood;address;reviews;overall_satisfaction;accommodates;bedrooms;bathrooms;price;deleted;minstay;last_modified;latitude;longitude;survey_id;location;coworker_hosted;extra_host_languages;name;property_type;currency;rate_type
1582859;8426743;"Entire home/apt";"";"";"";"";236;5;4;0.00;1.00;67;0;;"2018-04-24 16:06:20.450171";44.458536;-68.483788;1;"0101000020E6100000A08CF161F61E51C0F304C24EB13A4640";;"";"Coastal Maine Cottage";"";"EUR";"nightly"
10201545;9991820;"Entire home/apt";"";"";"";"";148;5;4;1.00;1.00;50;0;;"2018-04-24 16:06:23.570191";48.190689;16.267038;1;"0101000020E61000000CCA349A5C4430407D5A457F68184840";;"";"Sunny apartment near metro station.";"";"EUR";"nightly"
3993887;20703644;"Entire home/apt";"";"";"";"";142;5;2;0.00;1.00;34;0;;"2018-04-24 16:06:23.570191";4.488137;-75.697931;1;"0101000020E610000055F7C8E6AAEC52C0C6DE8B2FDAF31140";;"";"Romantic Cabana with view";"";"EUR";"nightly"
302695;1530306;"Entire home/apt";"";"";"";"";183;5;4;1.00;1.00;124;0;;"2018-04-24 16:06:23.570191";46.043525;9.252129;1;"0101000020E610000012BF620D178122407AC7293A92054740";;"";"Romantic, Lakeside Home with Views of Lake Como";"";"EUR";"nightly"
5116533;26439805;"Entire home/apt";"";"";"";"";282;5;4;0.00;1.00;57;0;;"2018-04-24 16:06:23.570191";31.250417;121.484245;1;"0101000020E61000001990BDDEFD5E5E40C85C19541B403F40";;"";"#2 SHANGHIGH HOME";"";"EUR";"nightly"
1016153;3937638;"Entire home/apt";"";"";"";"";119;5;2;1.00;1.00;69;0;;"2018-04-24 16:06:23.570191";-8.498757;114.965854;1;"0101000020E61000007DAD4B8DD0BD5C40594DD7135DFF20C0";;"";"BALIAN TREEHOUSE w beautiful pool";"";"EUR";"nightly"

As you can see, there are several problems:

  • address is empty,
  • latitude and longitude are not in the required rectangle,
  • there are only 6 listings.

I know that changes to the Airbnb site are already causing you a lot of problems, but I will be very touched if you could give me some time to solve these problems.

exception missing

Function add_survey_log_bb_table in schema_update.py is missing an exception as part of the Try function, This leads to unexpected unindent error

Error: column "coworker_hosted" of relation "room" does not exist

First time user, just downloaded latest commit with -sb method, now getting this error:

Searching 'Private room' (1 guests, prices in [0, 40]), zoom 0
Page 1...
ERROR Exception in get_search_page_info_rectangle
Traceback (most recent call last):
File "airbnb.py", line 206, in save
self.__insert()
File "airbnb.py", line 362, in __insert
cur.execute(sql, insert_args)
psycopg2.ProgrammingError: column "coworker_hosted" of relation "room" does not exist
LINE 7: coworker_hosted, extra_host_languages, n...
^

All help appreciated, thank you kindly,

Peter

Running by bounding box

Hello, I'm trying to run by the bounding-box as the last recommendation but it seems is not working. I've tried 3 cities: 'SAO PAULO'; 'LISBON'; 'MIAMI' adding manually the city name and the bounding-box info into search_area. Then, when I run the script it's kind of eternal looping trying to connect to the airbnb server.

Tks

Airbnb API key

Hi,

Is there any way I can get an API key from Airbnb? The website says they are not accepting any requests at the moment. :(

Thanks.
Josh

Error python airbnb.py -dbp -c root.config

Hello how to solve this problem I have already parameterize the config file for postgresql

Root @ ubuntu-srv1: / var / lib / tomcat7 / airbnb-data-collection-master #python airbnb.py -dbp -c root.config
No handlers could be found for logger "root"
Traceback (most recent call last):
File "airbnb.py", line 479, in
hand()
File "airbnb.py", line 417, in main
Ab_config = ABConfig (args)
File "/var/lib/tomcat7/airbnb-data-collection-master/airbnb_config.py", line 69, in init
Logger.warning ("No proxy_list in" + config_file + ": not using proxies")
NameError: global name 'config_file' is not defined

IndexError with bounding box search

Hi,

Thanks a lot for the latest fixes. Now the bounding box search seems to be working pretty well. Only thing is that when running the survey, I get the following error after a while:

2018-05-01 15:42:24,933 INFO Retrieved logged progress: quadtree node [[0, 0]] 2018-05-01 15:42:24,933 INFO median node [[60.17343, 24.94219]] 2018-05-01 15:42:24,933 INFO Bounding box: [60.297839, 25.254485, 59.922489, 24.782876] 2018-05-01 15:42:24,933 INFO ====================================================================== 2018-05-01 15:42:24,933 INFO Survey 1, for helsinki 2018-05-01 15:42:24,933 INFO Searching by bounding box, max_zoom=8 2018-05-01 15:42:24,933 INFO ---------------------------------------------------------------------- 2018-05-01 15:42:24,933 INFO Rectangle calculated: [60.29784, 25.25448, 60.11016, 25.01868] 2018-05-01 15:42:24,933 INFO Searching rectangle: zoom factor = 1, node = [[0, 0]] 2018-05-01 15:42:29,585 INFO Page 01 returned 18 listings 2018-05-01 15:42:34,308 INFO Page 02 returned 18 listings 2018-05-01 15:42:38,998 INFO Page 03 returned 18 listings 2018-05-01 15:42:41,766 INFO Page 04 returned 18 listings 2018-05-01 15:42:47,011 INFO Page 05 returned 18 listings 2018-05-01 15:42:53,118 INFO Page 06 returned 18 listings 2018-05-01 15:42:57,597 INFO Page 07 returned 18 listings 2018-05-01 15:43:01,262 INFO Page 08 returned 18 listings 2018-05-01 15:43:05,579 INFO Page 09 returned 18 listings 2018-05-01 15:43:07,565 INFO Page 10 returned 18 listings 2018-05-01 15:43:07,565 INFO Results: 10 pages, 0 new rooms 2018-05-01 15:43:07,580 ERROR Error in recurse_quadtree Traceback (most recent call last): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 421, in recurse_quadtree if self.subtree_previously_completed(quadtree_node): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 787, in subtree_previously_completed for j in range(0, 2) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 788, in <genexpr> for i in range(0, len(quadtree_node))) IndexError: list index out of range 2018-05-01 15:43:07,580 ERROR Error in recurse_quadtree Traceback (most recent call last): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 454, in recurse_quadtree self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 421, in recurse_quadtree if self.subtree_previously_completed(quadtree_node): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 787, in subtree_previously_completed for j in range(0, 2) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 788, in <genexpr> for i in range(0, len(quadtree_node))) IndexError: list index out of range 2018-05-01 15:43:07,580 ERROR Error in recurse_quadtree Traceback (most recent call last): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 454, in recurse_quadtree self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 454, in recurse_quadtree self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 421, in recurse_quadtree if self.subtree_previously_completed(quadtree_node): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 787, in subtree_previously_completed for j in range(0, 2) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 788, in <genexpr> for i in range(0, len(quadtree_node))) IndexError: list index out of range 2018-05-01 15:43:07,580 ERROR Error Traceback (most recent call last): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 395, in search self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 454, in recurse_quadtree self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 454, in recurse_quadtree self.recurse_quadtree(quadtree_node, median_node, flag) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 421, in recurse_quadtree if self.subtree_previously_completed(quadtree_node): File ",python\airbnb-data-collection-master\airbnb_survey.py", line 787, in subtree_previously_completed for j in range(0, 2) File ",python\airbnb-data-collection-master\airbnb_survey.py", line 788, in <genexpr> for i in range(0, len(quadtree_node))) IndexError: list index out of range

So this is probably just a bug in the code this time and not dependent on the AirBNB site changes? I will also look in to this myself if I could fix it.

WARNING No response received

Hi,

I had fun with your work, but if I try to search for Lisbon, at the end I get the message:

INFO:root:No progress logged for survey 2
INFO No progress logged for survey 2
INFO:root:Bounding box: [38.795854, -9.090571, 38.691399, -9.229836]
INFO Bounding box: [38.795854, -9.090571, 38.691399, -9.229836]
INFO:root:======================================================================
INFO ======================================================================
INFO:root:Survey 2, for Lisbon--Portugal
INFO Survey 2, for Lisbon--Portugal
INFO:root:Searching by bounding box, max_zoom=6
INFO Searching by bounding box, max_zoom=6
INFO:root:----------------------------------------------------------------------
INFO ----------------------------------------------------------------------
INFO:root:Searching rectangle: Private room, guests = 1, prices in [0, 40], zoom factor = 0
INFO Searching rectangle: Private room, guests = 1, prices in [0, 40], zoom factor = 0
WARNING:root:No response received from request despite multiple attempts: {'sw_lng': '-9.229836', 'ne_lat': '38.795854', 'source': 'filter', 'ne_lng': '-9.090571', 'room_types[]': 'Private room', 'price_min': '0', 'search_by_map': 'True', 'sw_lat': '38.691399', 'price_max': '40', 'page': '1', 'guests': '1'}
WARNING No response received from request despite multiple attempts: {'sw_lng': '-9.229836', 'ne_lat': '38.795854', 'source': 'filter', 'ne_lng': '-9.090571', 'room_types[]': 'Private room', 'price_min': '0', 'search_by_map': 'True', 'sw_lat': '38.691399', 'price_max': '40', 'page': '1', 'guests': '1'}

issue related to user agent? Or something wrong from my side?

Thank you,
Pietro

Bounding box survey broken?

I've been using the same process each month, for the last few months to run a survey over the same bounding box, successfully.

However attempting to do the same this month, the process finishes after just two pages (see log screenshot below)

Has there been a change to the Airbnb interface that might have broken this?

image

Debugging/experimenting on individual property

I've had good success using this library so far, simply running a survey over a bounding box.

I wanted to dive into the code, to see whether I could understand it and potentially extract different information from listings. However, I having trouble running the process on one property, so that I can experiment.

Below I've outlined the code that I am attempting to run, it returns a "Room 834190: found" message, but fails to extract any information (e.g. the price printed below returns None). Via debug print statements, I can also see that the website is returning an html response. But searching the HTML manually, I can't find the price, (e.g. CTRL-F for '140' in the case of the property listed below).

I'm sure I am misunderstanding something very simple! If anyone could provide any help, that would be fantastic.

from airbnb_listing import ABListing
from airbnb_config import ABConfig
config = ABConfig()
x = ABListing(config=config, room_id=834190, survey_id=None)
y = x.get_room_info_from_web_site(config.FLAGS_PRINT)
print(x.price)

same page of listings returned

Hello Tom, thanks for your great work.

I was wondering if Airbnb changed something recently (past week or so). Your code used to work perfectly but now it seems to return the same 18 rooms for page 01, 02, etc. in a given geographic area. Did some address in the API got changed? not sure. Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.