GithubHelp home page GithubHelp logo

cov3's Introduction

cov3

This is a web crawler library, with 3 flavors of crawling:

  • the usual crawler, give him a url, and keeps on going until he visits all of the linked pages (that point to same domain).
  • a sitemap crawler, give him a sitemap.xml and visits the urls he finds in the sitemap.
  • a step crawler, give him a csv file with the list of urls(steps) to visit.

On each page the crawler visits it executes a bit of javascript code that we can define as a validation. These validations are usefull to test you site in an automated way, say for example, you want to check:

  • if all pages contain meta tags
  • if all pages contain a title
  • if al pages contain web analytics tracking
  • find out what links are broken
  • test your own javascript page code in all(or some) pages.
  • etc..

Its based on the new version of Selenium that uses Webdriver, thus allowing you to(in theory*) use either Firefox, Internet Explorer(when on windows), Chrome or HtmlUnit(a GUI-Less browser).

*In practice, there’s a couple of issues, note that Selenium 2 is in beta version: Chrome does not likes to run javascript. And i’ve had a few Out of Memory exceptions with HtmlUnit, so for now i’m using mostly Firefox. But as with all beta softwware this is expected and with time and new selenium2 releases, it should be all fine.

Usage

To be used as a library, thus:

(require '[cov3 :as cov3])

;; then: (:ff is short for firefox, use :hu for HtmlUnit,
;; :ch for Chrome, and :ie for Internet Explorer)
(cov3/crawl :ff "http://al3xandr3.github.com/" '("document.title"))

;; or: (10 is the sample size to pick from sitemap.xml)
(cov3/sitemap-crawl :ff "http://al3xandr3.github.com/sitemap.xml" "" "" 10 '("document.title"))

;; or: (assuming you have a csv file with the steps to take, see more on documentation)
;; for example the line: http://al3xandr3.github.com/,"document.title",,
(cov3/step-crawl :ff "data/steps.csv")

Its easeally made into a standalone executable .jar, just add a main and run: lein uberjar

Installation

Assuming you have lein installed:

> git clone git://github.com/al3xandr3/cov3.git
> cd cov3/
> lein deps
> lein install

When doing lein install on the root of the cov3 it will install the built jar into your local maven repository, typically into: ~/.m2/repository/

Then from a new project.clj just add it to the dependencies, like:

(defproject myproj "1.0.0"
  :dependencies [[cov3/cov3 "0.1.0"]]
  ...)

Documentation

> lein autodoc

Will create html files into autodoc/ where there’s the documentation extracted directly from the code.

cov3's People

Contributors

al3xandr3 avatar

Stargazers

Leandro Oliveira avatar Nils Grünwald avatar  avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.