GithubHelp home page GithubHelp logo

evine's Introduction

Go Report Card License Build Status

Evine

Interactive CLI Web Crawler.

Evine is a simple, fast, and interactive web crawler and web scraper written in Golang. Evine is useful for a wide range of purposes such as metadata and data extraction, data mining, reconnaissance and testing.

asciicast

Follow the project on Twitter.

If you like the project, give it a star. It forces me to develop the project!

Install

From Binary

Pre-build binary releases are also available(Suggested).

From source

go get github.com/saeeddhqan/evine
"$GOPATH/bin/evine" -h

From GitHub

git clone https://github.com/saeeddhqan/evine.git
cd evine
go build .
mv evine /usr/local/bin
evine --help

Note: golang 1.13.x required.

Commands & Usage

Keybinding Description
Enter Run crawler (from URL view)
Enter Display response (from Keys and Regex views)
Tab Next view
Ctrl+Space Run crawler
Ctrl+S Save response
Ctrl+Z Quit
Ctrl+R Restore to default values (from Options and Headers views)
Ctrl+Q Close response save view (from Save view)
evine -h

It will displays help for the tool:

flag Description Example
-url URL to crawl for evine -url toscrape.com
-url-exclude string Exclude URLs maching with this regex (default ".*") evine -url-exclude ?id=
-domain-exclude string Exclude in-scope domains to crawl. Separate with comma. default=root domain evine -domain-exclude host1.tld,host2.tld
-code-exclude string Exclude HTTP status code with these codes. Separate whit '|' (default ".*") evine -code-exclude 200,201
-delay int Sleep between each request(Millisecond) evine -delay 300
-depth Scraper depth search level (default 1) evine -depth 2
-thread int The number of concurrent goroutines for resolving (default 5) evine -thread 10
-header HTTP Header for each request(It should to separated fields by \n). evine -header KEY: VALUE\nKEY1: VALUE1
-proxy string Proxy by scheme://ip:port evine -proxy http://1.1.1.1:8080
-scheme string Set the scheme for the requests (default "https") evine -scheme http
-timeout int Seconds to wait before timing out (default 10) evine -timeout 15
-query string JQuery expression(It could be a file extension(pdf), a key query(url,script,css,..) or a jquery selector($("a[class='hdr']).attr('hdr')"))) evine -query url,pdf,txt
-regex string Search the Regular Expression on the page contents evine -regex 'User.+'
-max-regex int Max result of regex search for regex field (default 1000) evine -max-regex -1
-robots Scrape robots.txt for URLs and using them as seeds evine -robots
-sitemap Scrape sitemap.xml for URLs and using them as seeds evine -sitemap
-wayback Scrape WayBackURLs(web.archive.org) for URLs and using them as seeds evine -sitemap

VIEWS

  • URL, In this view, you should enter the URL string.
  • Options, This view is for setting options.
  • Headers, This view is for setting the HTTP Headers.
  • Query, This view is used after the crawling web. It will be used to extract the data(docs, URLs, etc) from the web pages that have been crawled.
  • Regex, This view is useful to search the Regexes in web pages that have been crawled. Write your Regex in this view and press Enter.
  • Response, All of the results writes in this view.
  • Search, This view is used to search the Regexes in the Response view content.

Extract methods

From Keys

Keys are predefined keywords that can be used to specify data like in scope URLs, out scope URLs, emails, etc. List of all keys:

  • url, to extract IN SCOPE urls. the urls completly are sanitized.
  • email, to extract IN SCOPE and out scope emails.
  • query_urls, to extract IN SCOPE urls that contains the get query: ?foo=bar.
  • all_urls, to extract OUT SCOPE urls.
  • phone, to extract a[href]s that contains a phone number.
  • media, to extract files that are not web executable file. like .exe,.bat,.tar.xz,.zip, etc addresses.
  • css, to extract CSS files.
  • script, to extract JavaScript files.
  • cdn, to extract Content Delivery Networks(CDNs) addresses. like //api.foo.bar/jquery.min.js
  • comment, to extract html comments, <!-- .* !-->
  • dns, to extract subdomains that belongs to the website.
  • network, to extract social network IDs. like facebook, twitter, etc.
  • all, to extract all list of keys.(url,query_url,..) keys are case-sensitive. Also, it could be written to or three key with comma separation.

From Extensions

Maybe you wanna a file that is not defined in keys. What can you do? You can easily write the extension of the file on the Query view. like png,xml,txt,docx,xlsx,a,mp3, etc.

From JQuery selector

If you have basic JQuery skills, you can easily use this feature, but if not, it is not very difficult. To have a quick view about the selectors w3schools is a great source.
example(To find source[src]):

$("source").attr("src") // To find all of source[src] urls
$("h1").text() // To find h1 values

Template:

$("SELECTOR").METHOD_NAME("arg")

It does not support queries like below:

$('SELECTOR').METHOD("arg")
$('SELECTOR').METHOD('arg')
$("SELECTOR"  ).METHOD("arg" )

Methods are described below:

  • text(), to returns the content of the SELECTOR without html tag.
  • html(), to returns the content of the SELECTOR with html tag.
  • attr("ATTR"), to get the attribute of the SELECTOR. e.g $("a").attr("href")

Bugs or Suggestions

To report bugs or suggestions, create an issue.

Evine is heavily inspired by wuzz.

evine's People

Contributors

pacodiazdg avatar saeeddhqan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.