instapy / instagram-profilecrawl Goto Github PK

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

License: MIT License

Python 95.69% Shell 2.60% PowerShell 1.71%

instagram crawler python instapy selenium simple information python-script automation

instagram-profilecrawl's Introduction

InstaPy

Tooling that automates your social media interactions to “farm” Likes, Comments, and Followers on Instagram Implemented in Python using the Selenium module.

Newsletter: Sign Up for the Newsletter here!
Guide to Bot Creation: Learn to Build your own Bots

Find the full documentation in Docs

Table of contents

Credits

Community

An active and supportive community is what every open-source project needs to sustain. Together we reached every continent and most of the countries in the world!
Thank you all for being part of the InstaPy community ✌️

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Disclaimer: Please note that this is a research project. I am by no means responsible for any usage of this tool. Use it on your behalf. I'm also not responsible if your accounts get banned due to the extensive use of this tool.

instagram-profilecrawl's People

Contributors

Stargazers

Watchers

Forkers

lkfgroup tojen haykinz timetraveler90 boomshanker victorsc cheepo2109 nanospeck willgatto cfirmo33 alonecuzzo vanova danyalsh kurozone tonoli mushoffa omarrr newyorkdev jayphen javiplav pesouza bernadsatriani kaidmml shantanuj danieleidan knightth0r phazz hanjinda userfine digimix christinecardoso joozz mght sangkwun mrscrog rittamdebnath mayuragg estebancortero mohamedelephant nyon-one giovannipaganini95 david-miron tuanlha sabaoon96 ezrawilliam gabrielmacedoo alantanlc ericpinedo sonitywolf davasu andrewjburnett justdvl mookimooki diegocaldeira iamarif pysky m4z3n alexrossello amahajavon andreyhartung timmoh tranvansang brianzhou13 thomasviennet-zz gregblt rahul15495 khanof89 simasima121 oomree77 lbenassi nicolomantini kyuhwas albererre imansh77 pttnx vuongeric astu9880 kusw3 anubhav-jangra rtpharry acubaniti 0rc0 indexnotfound404 deenadayalans excelenz valentin0h maminh piperer astrobenhart adityabohra007 noke8868 alexperegrina mark1002 flerov yujuanjiang artcmd bhargav4892 anki92 romanwixinger mukira

instagram-profilecrawl's Issues

Unable to locate element: class name "_mesn5"

Trying to scrape a public Instagram profile, I got this error:


Extracting information from ???
Traceback (most recent call last):
  File "crawl_profile.py", line 28, in <module>
    information = extract_information(browser, username)
  File "C:\Users\Stefan\git-repos\instagram-profilecrawl\util\extractor.py", lin                                                                                                                e 89, in extract_information
    = get_user_info(browser)
  File "C:\Users\Stefan\git-repos\instagram-profilecrawl\util\extractor.py", lin                                                                                                                e 11, in get_user_info
    container = browser.find_element_by_class_name('_mesn5')
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 485, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 855, in find_element
    'value': value})['value']
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 308, in execute
    self.error_handler.check_response(response)
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\er                                                                                                                rorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Una                                                                                                                ble to locate element: {"method":"class name","selector":"_mesn5"}
  (Session info: headless chrome=63.0.3239.132)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e                                                                                                                73),platform=Windows NT 10.0.16299 x86_64)

Am I missing something? Why is it using headless chrome? I actually appreciate headless, but I read in another issue that it's apparently not yet supported?
Or is this an issue with Python 3.6?

IndexError: list index out of range - Video post

Hi @timgrossmann

Thanks for your amazing work !!!
For some obvious reason your app stop to work since there are videos posts ..
The error is img = imgs[1].get_attribute('src')
IndexError: list index out of range

Do you have an idea about this issue ?
Regards

Laurent

Caption and location information

I have modified my local repo to extract caption, location name, and location url for each post. I would love to contribute if this can be considered as an enhancement.

extractor.py

please help!

File "crawl_profile.py", line 27, in <module>
    information = extract_information(browser, username)
  File "/home/kurozone/instagram-profilecrawl/util/extractor.py", line 127, in extract_information
    img, tags, likes, comments = extract_post_info(browser)
  File "/home/kurozone/instagram-profilecrawl/util/extractor.py", line 80, in extract_post_info
    return img, tags, int(likes), int(len(comments) - 1)
TypeError: object of type 'int' has no len()

error code 127, selenium and chromedriver issues on ubuntu 17.04

hi this is what i'm getting (testing on my DO droplet and local desktop ubuntu) both same message:

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py ashishegaran
Traceback (most recent call last):
  File "crawl_profile.py", line 17, in <module>
    browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/chrome/webdriver.py", line 62, in __init__
    self.service.start()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 96, in start
    self.assert_process_still_running()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    % (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service ./assets/chromedriver unexpectedly exited. Status code was: 127

first it said this

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py ashishegaran
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 74, in start
    stdout=self.log_file, stderr=self.log_file)
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: './assets/chromedriver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "crawl_profile.py", line 17, in <module>
    browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/chrome/webdriver.py", line 62, in __init__
    self.service.start()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 81, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

which it seemed it needs the forlder "assets" but it's not there, i created that and got to the first error. i donno what is happening or why it is showing this, i installed selenium on pip and pip3 and google chrom is also installed as well as chrome driver x64 linux.

Followers list?

I used your tool, but it didn't output the followers list. I checked the setting file and there was also nothing in there about followers. But in the git description, it is said that followers list is supported ?!

do this project work yet ?

URL IS EMPTY !

the script doesn't get past posts and shows error

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py sabaasafari
Extracting information from sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWc05WaHn8u/?taken-by=sabaasafari
BEFORE IMG
	- Could not get information from post: https://www.instagram.com/p/BWYS2inHYsN/?taken-by=sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWUkEu5HAEe/?taken-by=sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWTDPuxnG79/?taken-by=sabaasafari

can you test this and see if this happens on ur side as well or not? if yes can you provide a fix?

We can't use pyvirtualdisplay on Windows

Hi,
In windows we have a problem with Display function (from pyvirtualdisplay import Display)
actually we can't use pyvirtualdisplay on Windows.
It is just a wrapper that calls Xvfb. Xvfb is a headless display server for the X Window System. Windows does not use the X Window System.

How can I use this in windows OS ?

WebDriverException: DevToolsActivePort file doesn't exis

I've installed chrome on EC2 running Amazon Linux AMI.
When running crawl_profile.py, WebDriverException pop and the script stop.
What's the problem and how to fix it?
Thank you in advance.
Here's the error message:

Traceback (most recent call last):
File "crawl_profile.py", line 21, in
browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in init
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: DevToolsActivePort file doesn't exist
(Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.1.7-15.23.amzn1.x86_64 x86_64)

Unable to located element.

Here's my error. The script ran fine a few times but is now unable to locate this element.

[selenium.webdriver.remote.remote_connection] DEBUG: Finished Request {'sessionId': '202cae480ffc0f2e7f76665155ad5760', 'status': 7, 'value': {'message': 'no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(@Class, "_1cr2e _epyes")]"}\n (Session info: chrome=64.0.3282.140)\n (Driver info: chromedriver=2.35.528157 (4429ca2590d6988c0745c24c8858745aaaec01ef),platform=Mac OS X 10.12.6 x86_64)'}}

absolute import instead of a relative one?

File "crawl_profile.py", line 5, in
from .settings import Settings
SystemError: Parent module '' not loaded, cannot perform relative import

I was running into this issue i googled a bit and found

so i changed line 5 to:

#!/usr/bin/env python3.5
"""Goes through all usernames and collects their information"""
import json
from util.settings import Settings

works for me

TypeError: 'NoneType' object is not iterable

Getting the following error after scrolling the profile and scrapping the first link:

Traceback (most recent call last): File "crawl_profile.py", line 33, in <module> information, user_commented_list = extract_information(browser, username, limit_amount) File "/Users/kevinleahey/Git/instagram-profilecrawl/util/extractor.py", line 225, in extract_information caption, location_url, location_name, location_id, lat, lng, img, tags, likes, comments, date, user_commented_list = extract_post_info(browser) TypeError: 'NoneType' object is not iterable

I looked at the extract_post_info method, but nothing stuck out to me. Any thoughts?

WebDriver Issue

Excuse me, I've changed the webdriver to newest version by chrome.
But still happened the same error.. how to fix it?

Traceback (most recent call last):
File "/Users/edward/instagram-profilecrawl/crawl_profile.py", line 11, in
browser = webdriver.Chrome('./assets/chromedriver')
File "/Library/Python/2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 62, in init
self.service.start()
File "/Library/Python/2.7/site-packages/selenium/webdriver/common/service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

Script error - chromedriver

css is dynamically generated, not hard coded

https://github.com/timgrossmann/instagram-profilecrawl/blob/e657d6c4b443641053c8da4812593fb3abb30596/util/extractor.py#L98

[Feature Request] - help needed: crawl only the N most recent posts

Hello,
I need your assistance.
How can I download the most N recent posts ? (it takes a lot of time to process the entire profile posts in case there are many posts)

Thanks.
Noam

Broken after Instagram updated profiles.

The error message is suppressed so I can't post the error beyond:

$python crawl_profile.py john
Waiting 10 sec
Extracting information from john

Error: Couldn't get user profile.
Terminating

Instagram Update - Class Name Issue

Hello,

I keep getting an error when I run extractor.py particularly in line 14. I think Instagram updated their class name because this is the error I get after changing the error code in line 301-304. I'm not sure how to find the correct class name. All help is greatly appreciated!

`Message: no such element: Unable to locate element: {"method":"class name","selector":"v9tJq"}'

Error Code I put in:

except Exception as e: print(e) print ("\nError: Couldn't get user profile.\nTerminating") quit()

[Feature Request] Add teh date, hour, timestamp of each post

In extract_post_info, is it possible to add the date and / or timestamp of each post? It could be very useful to analyze properly the account :)

[Feature Request] any plans for headless crawling? PhantomJS or similar?

do you have any plans for adding phantom js integration so that the script is actually usable on production in contrast to now needing a GUI to run?

Run using firefox on RPi 3

I'd like to run the crawler headless on my RPi3, ideally just using Firefox/geckodriver. Installing chrome on RPi is always kind of a mess. Is there a simple workaround? It should be possible with selenium, right?

unknown error: call function result missing 'value'

Traceback (most recent call last):
File "C:/Users/kk703.DESKTOP-J939SLP/PycharmProjects/POM/TestScripts/Login_Test.py", line 7, in
loginPage.login("admin","manager")
File "C:\Users\kk703.DESKTOP-J939SLP\PycharmProjects\POM\PageClasses\LoginPage.py", line 11, in login
self.__username.send_keys (user_name)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 479, in send_keys
'value': keys_to_typing(value)})
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: call function result missing 'value'
(Session info: chrome=65.0.3325.181)
(Driver info: chromedriver=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f),platform=Windows NT 10.0.16299 x86_64)

Process finished with exit code 1

Couldn't get user profile.

Hello,

now i am running into straight errors.

When i execute this script i get the return:

Couldn't get user profile.

Anyone any idea?

fork of project on nodeJS

Hello, I realized a fork of your project but since I do not know the python, I did it in nodejs and I tried to make improvements. It is still under development, so there are still bugs.
You can see it here : https://github.com/nacimgoura/instagram-profilecrawl

bug report .

i got this error i must be the most annoying user of your scrpit but i can't figure what i did wrong

Traceback (most recent call last):
File "crawl_profile.py", line 27, in
information = extract_information(browser, username)
File "/Users/Desktop/instagram-profilecrawl/util/extractor.py", line 117, in extract_information
img, tags, likes, comments = extract_post_info(browser)
File "/Users/Desktop/instagram-profilecrawl/util/extractor.py", line 34, in extract_post_info
img = imgs[1].get_attribute('src')
IndexError: list index out of range
thanks for the support

Cant start running

I run using nohup and got this error
/Users/phongyewtong/Desktop/InstaPy-master/chainingExample.py: line 3: syntax error near unexpected token username='test',' /Users/phongyewtong/Desktop/InstaPy-master/chainingExample.py: line 3: InstaPy(username='test', password='test')'

Should add user id in json file.

Terminate

I'm encountering
"bio:
Error: Couldn't get user profile.
Terminating"
How can I solve this?
I will be grateful for any help you can provide.

Having problem crawling a complete profile

root@ubuntu:/home/aria/instagram-profilecrawl# ./crawl_profile.py behzadshishegaran
Extracting information from behzadshishegaran
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
Traceback (most recent call last):
  File "./crawl_profile.py", line 27, in <module>
    information = extract_information(browser, username)
  File "/home/aria/instagram-profilecrawl/util/extractor.py", line 128, in extract_information
    img, tags, likes, comments = extract_post_info(browser)
  File "/home/aria/instagram-profilecrawl/util/extractor.py", line 68, in extract_post_info
    while (comments[1].text == 'load more comments'):
IndexError: list index out of range

the script quits without any extra data on what happened. what's the problem here?

Logging In?

Sorry if this is spelled out in the documentation (or InstaPy), but I was wondering if there was a way to login during the session. I'd like to get the likes of a friend's pics, but his profile's private - I could access it if I could figure a way to log in during the Headless session. Thanks!

'Service' object has no attribute 'process'

Hey guys, its me, again...

I'm trying to run that DigitalOcen Ubuntu... but getting that error when I try to run the script. If someone has any tip, would be helpful.

In the page it says to run Python3.5etc... is that mandatory?

If that's the problem, I'm sorry... but I wanna try it with 2.7 version.

Thanks guys.

Cannot find Chrome Binary

Hello, I'm trying to run this with Python 3.5, and chrome driver 2.40, but I'm getting the next error
_Traceback (most recent call last):
File "crawl_profile.py", line 21, in
browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in init
desired_capabilities=desired_capabilities)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in init
self.start_session(capabilities, browser_profile)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
self.error_handler.check_response(response)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
(Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.13.0-45-generic x86_64)
_

Does anyone know something I can try to fix it?

Minor error: replacing k,m with '000'/'000000'

inside the code block in utils/extractor.py,
k is replaced as '00', but should be replaced as '000'.
similarly: m is replaced as '00000', but should be replaced as '000000'.

  followers = int(followers.replace('k', '00').replace('m', '00000'))
  following = infos[2].text.split(' ')[0].replace(',', '').replace('.', '')
  following = int(following.replace('k', '00'))

fixed issue - comments problem

Sorry not great with github but there was a problem where comments = 0 on line 61 of extractor.py and then later tried to find the len of it which caused:

 return img, tags, int(likes), int(len(comments) - 1)
TypeError: object of type 'int' has no len()

if you just change comments = 0 to comments = [] then works fine.

Could not get information from post...

Everything works great. After the script runs it creates the json file and outputs the profile information. However, when it starts crawling individual posts, although I can see the driver scanning the correct post, it fails to grab any information relevant to any post.

To troubleshoot, I added some print statements in the extract_post_info function to see if it was grabbing info. It's able to perform well up to here:

 if len(imgs) >= 2:
    img = imgs[1].get_attribute('src')
    print(img) #added print statement

After that, I tried adding some print statements here:

likes = likes.split(' ')
  
  print("likes is: ", likes) #my addition

  #count the names if there is no number displayed
  if len(likes) > 2:
    likes = len([word for word in likes if word not in ['and', 'like', 'this']])
    print(likes) #my addition
  else:
    likes = likes[0]
    likes = likes.replace(',', '').replace('.', '')
    likes = likes.replace('k', '00')

But nothing prints out. I assume, this must be related to the issue of the function not returning anything and thus leading to the except NoSuchElementException: print('- Could not get information from post: ' + link)

What do you think could be wrong? I don't want to alter the code too much since I'm not particularly familiar with selenium.

Any help would awesome!

Thanks!

Empty Caption

Hello.
Line 50 to 57 in extractor.py file is where, post's caption will have read.
but it seems in new version of instagram HTML file it doesn't work properly,
i fixed this issue, just replace all of codes in try block with this line:
caption = post.find_element_by_class_name('gElp9').find_element_by_tag_name('span').text
Be lucky :-)

Scrolling Profile no stop

Good morning people.

I started using Instagram-profilecrawl. But a problem.

It is not to load all the posts.

The message at the prompt is:

...
Scrolling profile 324/380
Scrolling profile 336/380
Scrolling profile 348/380
Scrolling profile 360/380
Scrolling profile 372/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380

I already tried to increase sleep in extractor.py but no work

TypeError on save_profile_json()

Traceback (most recent call last):
File "crawl_profile.py", line 33, in
Datasaver.save_profile_json(username,information)
TypeError: unbound method save_profile_json() must be called with Datasaver inst
ance as first argument (got str instance instead)

I didn't do any change to the code so far.

I'm finding instagram developer

Hello I am finding instagram developer for my follower and likes script.
contact skype: hiphopahmet

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally

instapy is running perfectly, however I'm having trouble starting profilecrawl. Heres the output after trying to start it :

Traceback (most recent call last): File "crawl_profile.py", line 18, in <module> browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__ desired_capabilities=desired_capabilities) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__ self.start_session(desired_capabilities, browser_profile) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute self.error_handler.check_response(response) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 4.4.0-83-generic x86_64)

You got any hints/ideas?

Comments not exceeding 25?

The number of comments are stuck at 25 for me
Like so:
"likes": 1029020,
"comments": 25

Can anyone help me to fix this?

Can't work since yesterday

The script can't get information since yesterday. It seems that Instagram has changed the tag&class name on the html page?
Error message：
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"_de9bg"}

Incorrect followers/following value

The followers/following value is incorrect for IG accounts that have no decimal in their followers/following count.
(e.g. 630k followers becomes 63000 instead of 630000)

followers = infos[1].text.split(' ')[0].replace(',', '').replace('.', '')
followers = int(followers.replace('k', '00').replace('m', '00000'))

This can be corrected with the following lines of code:

followers = str(infos[1].text.split(' ')[0].replace(',', ''))
if followers.find('.') != -1:
  followers = followers.replace('.', '')
  followers = int(followers.replace('k', '00').replace('m', '00000'))
else:
  followers = int(followers.replace('k', '000').replace('m', '000000'))

following = str(infos[2].text.split(' ')[0].replace(',', ''))
if following.find('.') != -1:
  following = following.replace('.', '')
  following = int(following.replace('k', '00').replace('m', '00000'))
else:
  following = int(following.replace('k', '000').replace('m', '000000'))

Read user's bio?

Is there a way to extract a user's bio text and write it into a file?

.... is not clickable at point (943, 933)

Hi folks,

just wanted to tell about a problem i got and fixed:

i got this error:

Traceback (most recent call last):
  File "./crawl_profile.py", line 30, in <module>
    information = extract_information(browser, username)
  File "/home/pi/instagram/instagram-profilecrawl/util/extractor.py", line 102, in extract_information
    load_button.click()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 78, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 499, in _execute
    return self._parent.execute(command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 297, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element <a href="/freddyfrog/?max_id=1594149044930567909" class="_1cr2e _epyes">...</a> is not clickable at point (943, 933). Other element would receive the click: <div class="_8c4cy">...</div>
  (Session info: headless chrome=60.0.3112.113)
  (Driver info: chromedriver=2.29 (8e8216e581c512667203931f81c1a1ead47222e5),platform=Linux 4.9.50-v7+ armv7l)

Solution:
the problem is that the page seems to be not fully loaded, so just go ahead and raise the sleep before this statement in utils/extractor.py:

      :
      load_button = body_elem.find_element_by_xpath\
        ('//a[contains(@class, "_1cr2e _epyes")]')
      body_elem.send_keys(Keys.END)
>>>sleep(3)
      load_button.click()
      :

[Feature Request] - adding an option to only receive the public data of each profile instead of crawling it to the end

It would be great if you add an option so that user can only retrieve Username, # of Posts, # of Followers, # of Following, and Bio. it is very efficient for big crawls when you only need to identify a certain group of people in contrast to dumping their whole profile data.

Cant crawl, it print "-Only few posts" always

User has more than 12 posts but after running the program, only print out -Only few posts

[Feature Request] - Get a list of the most engaged followers ordered by rank

It would be nice to get some info about who left the likes and not just the number so that it would be easy to understand which followers are more active and interacting more with the posts. Is this even possible without a login? Thanks!

[Feature Request] - adding a function to the script so that it records the progress on each profile

Adding a read/write function to the script so that it updates the profile.json after each page crawl, now it only records if the process has been completed successfully, which has happened for me very rarely, most of time the script crashes for a reason and therefore nothing is saved.

instapy / instagram-profilecrawl Goto Github PK

instagram-profilecrawl's Introduction

InstaPy

Find the full documentation in Docs

Credits

Community

Contributors

Backers

instagram-profilecrawl's People

Contributors

Stargazers

Watchers

Forkers

instagram-profilecrawl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs