GithubHelp home page GithubHelp logo

tufayellus / linkedin-scraper Goto Github PK

View Code? Open in Web Editor NEW
192.0 3.0 57.0 474 KB

A LinkedIn Scraper to scrape up to 1k LinkedIn profiles(due to LinkedIn limit) from company profile links and save their e-mail addresses if available!

License: GNU General Public License v3.0

Python 100.00%
linkedin linkedin-scraper linkedin-bot leads email-scraper scraper-engine scraper email-marketing digital-marketing crawler

linkedin-scraper's Introduction

LinkedIn Lead Scraper ๐Ÿš€

A LinkedIn Scraper to scrape up to 10k LinkedIn profiles and save their e-mail addresses if available!
It collects 10k profiles from the LinkedIn directory and their details like name, current position/headline, and location information. After all profiles are collected, it starts finding their email addresses. You can narrow down your searches based on location, role, etc.


Scrape public LinkedIn profile data at scale with Proxycurl APIs.
โ€ข Scraping Public profiles are battle-tested in court in HiQ VS LinkedIn case.
โ€ข GDPR, CCPA, SOC2 compliant
โ€ข High rate Limit- 300 requests/minute
โ€ข Fast APIs respond in ~2s
โ€ข Fresh data- 88% of data is scraped real-time, other 12% are not older than 29 days
โ€ข High accuracy
โ€ข Tons of data points returned per profile
Built for developers, by developers

Trouble with the base version? Check the "Cookie Based" version here
Need location filter or role filter? Check this version

Installation Guide

  1. First, download Python software from Python's official website. Python 3.x only is supported. Download from here or for a precise Python version, download this version and scroll to the bottom to download the correct version based on your operating system and make sure to tick on "Add to PATH" during installation in windows machines
  2. Now, from the start menu (Windows) or Applications list (Linux/Mac), search for Command Prompt (Windows) or terminal (on Mac/Linux) and copy-paste the command written below:
pip3 install requests

This will show some installation progress and will install the library eventually. If you see any pip warning, you may ignore that as that's optional.

  • If pip doesn't get recognized as a command, please re-install Python with "Add python to executable path" enabled, or for Mac/Linux, run the command apt-get install python3-pip
  1. An account on LinkedIn is a must! You can create temporary profiles if you want.

Usage Guide

  1. Assuming that the Python software and the library required by this project are installed, time for the script execution. First, download the Python script of your choice and put it inside a folder.
  2. Right-click on the Python script and select the option "Edit with IDLE". If you don't see this option, you have to figure that out yourself to fix the problem but a correct installation will show this option in the right-click menu.
  3. This option should open up a code window. Locate the linkedin_email and linkedin_password placeholders and put your login details (don't worry, it won't get leaked to anyone). After that, set the desired company profile URL inside the target_company_link placeholder and save the changes by pressing the ctrl+s shortcut.
  4. Now, locate the Run menu and select Run Module and the automation will start processing. When you see a >>> at the bottom of the output screen, it will mean that the process has finished.

Other Ways to Run the Script

Windows

  • You can set the configuration of email, password, and company profile link and save the changes. Double-clicking the Python file will also execute it.

Linux/Mac

  • In the terminal, cd to the script folder and type
python3 Random_Scraper.py
Or,
python3 CompanyWise_Leads.py

Variants Details

Random_Scraper.py is the initial development of the scraper that collects up to 10k random LinkedIn profiles from the directory and picks info from their profile
CompanyWise_Leads.py is the revised version of the code to be able to collect company wise employee profiles for more leads information.
If you're having issues with the login-based version(the base script), you can try either the cookie version (doesn't have e-mail scraping ability) or the ProxyCurl API integrated version (Allows searching location, roles and have e-mail scraping capabilities)

Limits

  • Do not log in from the IP address from where you don't usually login to your LinkedIn account, otherwise, it will trigger their security system and won't let you log in.
  • Result is limited to 10,000 records only (this is a limitation from LinkedIn's side)
  • First-page data is not collected due to being away from API endpoint capability. But the cookie version will get you covered.
  • For searching by location or by role, or if you don't want to use your own LinkedIn account, you have to use the ProxyCurl version as they offer this ability.

Disclaimer

Do not flood LinkedIn using your account on a large number to avoid issues with your account. If you want to anonymously scrape LinkedIn Leads for your business, you can check Proxycurl APIs to stay on the safe side (they offers trial access too for FREE!).

Loved This Open Source Project?

Star the repository and share it with your friends who might need this. More variants of the LinkedIn Scraper will come soon. Keep this on watch for more updates!

linkedin-scraper's People

Contributors

tufayellus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

linkedin-scraper's Issues

error finding email

Hi after installing all requirements with pip3 / python3.
I than enter my email and password if a valid linedin account inside: linkedin_email = and linkedin_password = and I than put the link of the company's linkedin inside target_company_link =
and when I run python3 Random_Scraper.py or python3 CompanyWise_Leads.py
I get this:

Logged in to LinkedIn!
Collecting all company member profiles upto 10000!

Traceback (most recent call last):
  File "................/linkedin-email/Random_Scraper.py", line 170, in <module>
    profile_list.extend(connection.listProfiles(company_id, page_no))
  File "................/linkedin-email/Random_Scraper.py", line 116, in listProfiles
    profiles = resp.get('elements')[0].get('elements')
IndexError: list index out of range

What should I do to fix it?
I'm on Python 3.10.2 and pip 22.2.

this command returns error

for page_no in range(2,page_count+1):
profile_list.extend(connection.listProfiles(company_id, page_no))

Unable to find the API Links

Hello Sir, I'm unable find the API link that you have given in the code, I could find a similar API link after logging into LINKEDIN inside the Networks tab from inspect elements. But when I use that link in python its throwing me a error.

Please let me know where to find these two API Links it would be of great help to me.
api_link = 'https://www.linkedin.com/voyager/api/organization/companies?decorationId=com.linkedin.voyager.deco.organization.web.WebCompanyStockQuote-2&q=universalName&universalName={}'.format(quote(company_username))
resp = self.s.get(api_link, headers=headers).json()

resp = self.s.get('https://www.linkedin.com/voyager/api/search/blended?count=10&filters=List(currentCompany-%3E{},resultType-%3EPEOPLE)&origin=COMPANY_PAGE_CANNED_SEARCH&q=all&queryContext=List(spellCorrectionEnabled-%3Etrue)&start={}'.format(company_id,(int(page_no)-1) * 10), headers=headers).json()

1K results or 10K results?

Hello,

Everytime I launch this script, I scrape 1000 results. In the About section, you mention that this script can "scrape up to 10k LinkedIn profiles from company profile links". How can we harvest 10K profiles?

Notes:

  • I select companies that have +10K employees (of course)
  • I have changed the Do-For loop (that was set to 1000 iterations) to 10K iterations
  • I am using a newly created Linkedin free account. Shall I use an old warmed up account? One with a Premium subscription?

IndexError: list index out of range

Traceback (most recent call last):
File "CompanyWise_Leads.py", line 170, in
profile_list.extend(connection.listProfiles(company_id, page_no))
File "CompanyWise_Leads.py", line 114, in listProfiles
profiles = resp.get('data').get('elements')[0].get('elements')
IndexError: list index out of range

Unable to login

i have problem with login. email and password already correct. but unable to login to linkedin,

No results :

Logged in to LinkedIn!
Collecting all company member profiles upto 10000!
Requesting page 1 for company ID 10953582
Response received: {'status': 404}
No data available in response
Profile list collected and saved! Extracting emails ...
Scanning 0 profiles

Would you know how to solve this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.