linkgrabber's Introduction

Link Grabber

Link Grabber provides a quick and easy way to grab links from a single web page. This python package is a simple wrapper around BeautifulSoup, focusing on grabbing HTML's hyperlink tag, "a."

pypi

GitHub

Dependecies:

BeautifulSoup

How-To

$ python setup.py install

$ pip install linkGrabber

Quickie

import re
import linkGrabber

seek = linkGrabber.Links("http://www.google.com")
seek.find()
# limit the number of "a" tags to 5
seek.find(limit=5)
# filter the "a" tag href attribute
seek.find({ "href": re.compile("plus.google.com") })

Documentation

find

Parameters:

filters (dict): Beautiful Soup's filters as a dictionary
limit (int): Limit the number of links in sequential order
reverse (bool): Reverses how the list of <a> tags are sorted
sort (function): Accepts a function that accepts which key to sort upon within the List class

Find all links that have a style containing "11px"

import re
from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find({ "style": re.compile("11px")  }, 5)

Reverse the sort before limiting links:

from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find(limit=2, reverse=True)

Sort by Links property:

from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find(limit=3, sort=lambda key: key.text)

Link Dictionary

Currently only three properties exist:

text (text inbetween the <a></a> tag)
href (href attribute, aka the hyperlink)
seo (parse all text after last "/" in URL and make it human readable)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

mikewaters / linkgrabber Goto Github PK