GithubHelp home page GithubHelp logo

go-web-crawler's Introduction

Dean's Cool Web Crawler

A CLI tool for grabbing anchors from webpages.

Installation

go run cmd/main.go -uri https://www.google.com

Syntax

This program takes one argument: a fuly-qualified URL for the webpage you wish to scrape.

go-crawler -uri <URL>

This will then output a file to your current working directory.

Anchors printed to <wd>/anchors-<timestamp>.txt! Thank you for using my cool tool!

You cna add the flag -outputtoconsole if you would prefer to have the URLs dumped into your current terminal session:

go-crawler -uri <URL> -outputtoconsole

Tests & Benchmarks

Benchmark specification:

goos: linux
goarch: amd64
pkg: github.com/deanfoley/go-web-crawler/internal
cpu: Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz

Benchmark command: go test --bench=. -benchmem -benchtime=10s -count=5 -run=^#

NOTE: this actually didn't work properly on some of the functions and resulted in horrible race conditions. Benchmarking them individually (such as with VSCode's Go extension) works fine

/internal

PageGrabber

Test Average Cycles Average ns/op Bytes/op Allocs/op
GrabWebpage 58,840 204,057 91,433 77

PageParser

Test Average Cycles Average ns/op Bytes/op Allocs/op
ExtractAnchors 134,922 12,168 6,552 25
FormatAnchors 626,671 2718 593 3

UrlParser

Test Average Cycles Average ns/op Bytes/op Allocs/op
ValidURL 725,694 1,834 144 1
InvaldURL 1,000,000 1,128 208 3

pprof

This project supports pprof!

Pass in a -cpuprofile and/or -memprofile flag with a desired output to output a prof file for either.

go run main.go -uri https://www.vortex.com -cpuprofile cpu.prof -memprofile mem.prof

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.