GithubHelp home page GithubHelp logo

emailscraper's Introduction

Email Scraper

This script scrape e-mails from Google Search Terms.

scraper.py

In the following order.

  1. Collect search term (i.e: Developers in London)
  2. Connect to Googlesearch API and create a TXT File with URLs to be scraped
  3. Open the TXT File and scrape emails.

The script requires Google Module in order to connect to Google API. When you first execute the script it will try to include googlesearch, if the module is not installed in your computer it will prompt for automatic download and installation of dependencies. The script will reload once installation is completed.

try:
    from googlesearch import search
except:
    upgrade_pip = lambda: os.system("pip3 install --upgrade pip")
    install_google = lambda: os.system("pip3 install google glob2")
    reload_scraper = lambda: os.system("python3 scraper.py")
    print("Upgrading Pip")
    print("----------------------------------------------------------")
    upgrade_pip()
    print("Downloading Google Library and Glob2")
    print("----------------------------------------------------------")
    install_google()
    print("Instalation complete: Ready to start scraping")
    print("----------------------------------------------------------")
    reload_scraper()

Install Dependencies from Terminal

To install Google Module use the following command in your terminal.

pip install google

To install glob2 Module use the following command in your terminal.

pip install glob2

glob.py

It will put all the TXT files together in one file called result.txt

Remember to delete the URLs files before using glob.

merge.py

This will verify all repeated emails and merge them.

EDIT WARNING

The email files (email - [search-term].txt) must be edited and all bad lines excluded before glob.py and merge.py are executed.

Update (#1)

  • Result limit excluded from prompt and set as 1000 as default. To edit search limit go to the file config.py and edit the variable total.
total = 1000 

emailscraper's People

Contributors

wellyington avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.