GithubHelp home page GithubHelp logo

jackmcdowell / php-crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hedii/php-crawler

0.0 2.0 0.0 376 KB

A crawler written in php finding email addresses on the internets

License: GNU General Public License v2.0

PHP 98.72% CSS 0.44% JavaScript 0.84%

php-crawler's Introduction

php-crawler

A crawler written in php: find email addresses on the internets. See it in action here (video): https://www.youtube.com/watch?v=rWsb6E_335U

Installation

  1. Put this all files on your server
  2. Create a mysql database
  3. Create database tables using the SQL code below
  4. Open Crawler.php and edit the __construct function with your database connection infos

Database tables creation

Open a SQL terminal, paste this and execute:

CREATE TABLE `emails` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(255) NOT NULL DEFAULT '',
  `date` datetime NOT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `urls` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `url` varchar(1000) NOT NULL DEFAULT '',
  `date` datetime NOT NULL,
  `visited` tinyint(1) NOT NULL DEFAULT '0',
  `email_visited` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
);

Usage

  1. Navigate to index.php
  2. Enter an url on the form input and click to fire the form. The crawler will scan all url on this page and put them in the database. The crawler will then visite all unvisited url that are in the database, and do the same search for other urls.
  3. Navigate to emails.php. The crawler will now start to search for email addresses in urls that are in the database.
  4. If you want a list of all the emails, just export your database table 'emails', and do whatever you want with it.

php-crawler's People

Contributors

hedii avatar jackmcdowell avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.