GithubHelp home page GithubHelp logo

web5design / status-jquery-crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from riccardo-forina/status-jquery-crawler

0.0 3.0 0.0 548 KB

Check for broken links in yout website with jQuery

Home Page: http://www.codingnot.es/check-your-website-for-broken-links-with-jquery/

License: BSD 3-Clause "New" or "Revised" License

status-jquery-crawler's Introduction

status-jquery-crawler

Check for broken links in yout website with jQuery

Version 0.2.0

(c) 2012 Riccardo Forina Status.js may be freely distributed under the MIT license. For details and documentation: http://www.codingnot.es

Status.js, a jQuery crawler

Preview of Status.js

Demo

Demo: See it in action!

How does it work?

Status.js will scan the website it's hosted on from the root (/) for links.
Internal links will be followed (fetched through Ajax) and scanned again. Yes, it's recursive.
External links will be memorized and used for cross-referencies. You can't check if an external link is broken with Status.js because of the cross-domain limitation of Ajax calls.

A table with some nice data will be populated in real time while Status.js is working.

Sometimes something more graphic is better, so I implemented the Javascript InfoVis Toolkit (Jit) in Status.js to plot the website as a graph you can interact with.

Last but not lest, there is a sitemap.xml generator that makes use of the crawler work. Nothing fancy, but if you can't generate a sitemap in a more correct way it can be useful.

## How is it done?

Status.js is a Backbone application.
For Ajax and DOM manipulation, there is jQuery.
The url manipulation is powered by jsUri. Plotting done with Javascript InfoVis Toolkit (Jit). The GUI part is Twitter Bootstrap.

What can I check with Status.js?

Url

The url of the page. To avoid duplication, hashes will be removed.

Title

Available for internal pages only, the title tag is fetched. If not present, you'll get a {No title} placeholder.

Description

Available for internal pages only, the meta name="description" tag is fetched. If not present, you'll get a {No description} placeholder.

Status

Because of the Javascript-in-a-browser limitations, we can handle only these statuses:

Success
Available for internal pages only, means a correctly fetched page.
External
Indicates an external link.
Redirect
Indicates that there is another page for the same url but with a trailing slash. It's an hack around the browser that does not return any 30x http code
Error
_Broken link!_
Unfetched
Page memorized but waiting to be crawled.

### Out links

It's the number of internal and external links present in the page. Clicking on the number you'll get the full list.

In links

It's the number of pages that link to the url. Clicking on the number you'll get the full list.

TODO list

This is a list of some of the things I'll have to work on. Please feel free to contribute with suggestions!

  • Warnings about duplicate/too long/missing titles/descriptions.
  • Verify for the presence of Google Analytics.
  • Check for broken images
  • Warning about missing/bad alt tags for images.
  • Pagination
  • Performance tests
  • Let's be honest... do tests!
  • Code cleaning, comments, etc.

status-jquery-crawler's People

Contributors

riccardo-forina avatar

Watchers

JT5D avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.