GithubHelp home page GithubHelp logo

waybackurls's Introduction

waybackurls

Accept line-delimited domains on stdin, fetch known URLs from the Wayback Machine for *.domain and output them on stdout.

Usage example:

▶ cat domains.txt | waybackurls > urls

Install:

▶ go install github.com/tomnomnom/waybackurls@latest

Credit

This tool was inspired by @mhmdiaa's waybackurls.py script. Thanks to them for the great idea!

waybackurls's People

Contributors

tomnomnom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

waybackurls's Issues

Bash waybackurl command not found

I tried installing tool using go get github.com/tomnomnom/waybackurls and it seems installed no errors popped and when i tried to use the tool using the command cat domains.txt | waybackurls > urls it poped like

naveenj@Saturn:|08:25 AM|~$ cat domains.txt | waybackurls > urls
-bash: waybackurls: command not found

[issue] Urls appended together

Some urls are appended with what seems to be different domains

Test:

 > echo "https://dominoweb.draco.res.ibm.com"|waybackurls |grep TCG

Screenshot from 2021-06-21 20-14-15

Notice .TCG.htmlhttp:/msdn.microsoft.com/en-us/library/ms171339.aspxhttp:/www.opensymphony

NO output

the tool is giving blank output how to fix it

feature request

Very useful! Is there a way to grab dates from the links? Be handy to know which pages were quite recent apposed to ones that disappeared years ago?
Thanks

Add archive.is

Hi @tomnomnom

I've checked that

https://github.com/qvint/archive.is

Is not working anymore. Can you please check that & add that into this tool

Regards,

waybackurls not working

image

i am trying to work with waybackurls but not able to execute command
saying command not found tried everything from the net but not working someone can help me with that

Update readme to tell users how to add waybackurl in the path variable

Currently, it is not mentioned how to add waybackurl in the path variable and hence people may face difficulty in running waybackurl inside any directory. We just need to update the readme to include wayback path variable in /etc/profile.

One just needs to add export PATH=$PATH:go/bin/waybackurls in /etc/profile.

I would like to add this in the readme. Please allow me to do so.

Output cannot be placed in a file

I want to run waybackurls for a list of domains and I need the output placed in a text file. I tried using these commands:
cat list.txt | wayback >> results.txt
cat list.txt | wayback > results.txt
Unfortunately, it doesn't work and the help of the wayback command is written in the results.txt file instead of the expected output. It works ok if I run just:
cat list.txt | wayback
Once I try to push the output in a file it's not working anymore. Any idea on how to fix this?
Thank you.

Sugestion

Add options to filter results by status code, mime-type and file extensions. It can allow for better results for specific targets. I've done something on this myself but in Perl.

not working

i am not able to use waybackurl for recon when i try use it i just giving blank lines as output , i have tried reinstalling it still issue not resolved
please refer to link for screenshot of issue https://ufile.io/xri8ne9q

No output

It was working fine. But I'm not getting any output today. output is blank. Any help would be appreciated. Thanks!

multile go install error

hi folks,
i desesperately try to install waybackurl onmy ubuntu, but don t understand what is wrong
here are the error message, if you have any tips, thank you for sharing
thx for your time

nocomp@RFB0x:/tools$ export GOROOT=/usr/local/go
nocomp@RFB0x:
/tools$ export GOPATH=$HOME/go
nocomp@RFB0x:/tools$ export PATH=$PATH:$GOROOT/bin
nocomp@RFB0x:
/tools$ go get github.com/tomnomnom/waybackurls
package bufio: unrecognized import path "bufio" (import path does not begin with hostname)
package encoding/json: unrecognized import path "encoding/json" (import path does not begin with hostname)
package flag: unrecognized import path "flag" (import path does not begin with hostname)
package fmt: unrecognized import path "fmt" (import path does not begin with hostname)
package io/ioutil: unrecognized import path "io/ioutil" (import path does not begin with hostname)
package net/http: unrecognized import path "net/http" (import path does not begin with hostname)
package net/url: unrecognized import path "net/url" (import path does not begin with hostname)
package os: unrecognized import path "os" (import path does not begin with hostname)
package strings: unrecognized import path "strings" (import path does not begin with hostname)
package sync: unrecognized import path "sync" (import path does not begin with hostname)
package time: unrecognized import path "time" (import path does not begin with hostname)
nocomp@RFB0x:~/tools$

Suggestion

I think resulted URLs (Output) must be piped into a resolver to check links giving 200,302,400,401,500 Code Response along with Content Length. It will save time to individually look for the links.

bash: waybackurls: command not found

Hello,
when I install this tool using the command go get github.com/tomnomnom/waybackurls it successfully installed but upon running it showed the following error
bash: waybackurls: command not found
Now I was able to run the tool when I installed it with git clone but there is a problem that I have to open the directory in which the tool is saved in order or run it, or I have to provide the full path, Am I missing something here. Is there something I can do to run the tool just by typing waybackurls in any directory.

fatal error: runtime: cannot allocate memory

While running
cat domains.txt | waybackurls
it runs for a few seconds and throw an error " fatal error: runtime: cannot allocate memory " then exit the code with go errors

goroutine 5 [running]:
runtime.throw(0x6eb0bc, 0x1f)
/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc0003e63f8 sp=0xc0003e63c8 pc=0x42d202
runtime.newArenaMayUnlock(0x919e80)
/usr/local/go/src/runtime/mheap.go:1930 +0xda fp=0xc0003e6430 sp=0xc0003e63f8 pc=0x427eda
runtime.newMarkBits(0x200, 0x100)
/usr/local/go/src/runtime/mheap.go:1850 +0xc3 fp=0xc0003e6478 sp=0xc0003e6430 pc=0x427ac3
runtime.heapBits.initSpan(0x7f199c0a9600, 0x20300000000000, 0x7f199c103fff, 0x7f199e77dc10)
/usr/local/go/src/runtime/mbitmap.go:792 +0x75 fp=0xc0003e64f8 sp=0xc0003e6478 pc=0x4164a5
runtime.(*mcentral).grow(0x9205a0, 0x0)
/usr/local/go/src/runtime/mcentral.go:264 +0x12e fp=0xc0003e6540 sp=0xc0003e64f8 pc=0x418b5e
runtime.(*mcentral).cacheSpan(0x9205a0, 0xbf)
/usr/local/go/src/runtime/mcentral.go:106 +0x2ff fp=0xc0003e65a0 sp=0xc0003e6540 pc=0x4185cf
runtime.(*mcache).refill(0x7f199e8c4008, 0x205)
/usr/local/go/src/runtime/mcache.go:135 +0x86 fp=0xc0003e65c0 sp=0xc0003e65a0 pc=0x418066
runtime.(*mcache).nextFree(0x7f199e8c4008, 0x5, 0xc0003e6638, 0x47dbcd, 0x6778e0)
/usr/local/go/src/runtime/malloc.go:786 +0x88 fp=0xc0003e65f8 sp=0xc0003e65c0 pc=0x40ca08
runtime.mallocgc(0x9, 0x0, 0x100, 0x18)
/usr/local/go/src/runtime/malloc.go:915 +0x589 fp=0xc0003e6698 sp=0xc0003e65f8 pc=0x40d139
runtime.slicebytetostring(0x0, 0xc0091dbf70, 0x9, 0x2e23e90, 0x0, 0x0)
/usr/local/go/src/runtime/string.go:102 +0xa1 fp=0xc0003e66c8 sp=0xc0003e6698 pc=0x446291
encoding/json.(*decodeState).literalStore(0xc0000781e0, 0xc0091dbf6f, 0xb, 0x2e23e91, 0x67f320, 0xc0034a82f0, 0x198, 0xc0003e6a00, 0x487fd2, 0x197)
/usr/local/go/src/encoding/json/decode.go:947 +0x236d fp=0xc0003e6a28 sp=0xc0003e66c8 pc=0x4ce71d
encoding/json.(*decodeState).value(0xc0000781e0, 0x67f320, 0xc0034a82f0, 0x198, 0x67f320, 0xc0034a82f0)
/usr/local/go/src/encoding/json/decode.go:395 +0x1ef fp=0xc0003e6a90 sp=0xc0003e6a28 pc=0x4c870f
encoding/json.(*decodeState).array(0xc0000781e0, 0x67bd20, 0xc002cf3f78, 0x197, 0xc000078208, 0x5b)
/usr/local/go/src/encoding/json/decode.go:560 +0x1a9 fp=0xc0003e6b78 sp=0xc0003e6a90 pc=0x4c9179
encoding/json.(*decodeState).value(0xc0000781e0, 0x67bd20, 0xc002cf3f78, 0x197, 0x67bd20, 0xc002cf3f78)
/usr/local/go/src/encoding/json/decode.go:371 +0xff fp=0xc0003e6be0 sp=0xc0003e6b78 pc=0x4c861f
encoding/json.(*decodeState).array(0xc0000781e0, 0x6705a0, 0xc00000e3e0, 0x16, 0xc000078208, 0x5b)
/usr/local/go/src/encoding/json/decode.go:560 +0x1a9 fp=0xc0003e6cc8 sp=0xc0003e6be0 pc=0x4c9179
encoding/json.(*decodeState).value(0xc0000781e0, 0x6705a0, 0xc00000e3e0, 0x16, 0xc0003e6d78, 0x4d8ce2)
/usr/local/go/src/encoding/json/decode.go:371 +0xff fp=0xc0003e6d30 sp=0xc0003e6cc8 pc=0x4c861f
encoding/json.(*decodeState).unmarshal(0xc0000781e0, 0x6705a0, 0xc00000e3e0, 0xc000078208, 0x0)
/usr/local/go/src/encoding/json/decode.go:179 +0x209 fp=0xc0003e6db8 sp=0xc0003e6d30 pc=0x4c7da9
encoding/json.Unmarshal(0xc008000000, 0x28144b8, 0x3fffe00, 0x6705a0, 0xc00000e3e0, 0x3fffe00, 0x0)
/usr/local/go/src/encoding/json/decode.go:106 +0x123 fp=0xc0003e6e00 sp=0xc0003e6db8 pc=0x4c7783
main.getWaybackURLs(0xc000016280, 0x19, 0xc000018300, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:127 +0x24f fp=0xc0003e6f10 sp=0xc0003e6e00 pc=0x651d4f
main.main.func1(0xc000018310, 0x6f7598, 0xc000012bf0, 0xc000018309, 0xc000060120)
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:57 +0x90 fp=0xc0003e6fb8 sp=0xc0003e6f10 pc=0x6526f0
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0003e6fc0 sp=0xc0003e6fb8 pc=0x458981
created by main.main
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:55 +0x4c4
goroutine 1 [chan receive]:
main.main()
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:76 +0x5b2
goroutine 7 [semacquire]:
sync.runtime_Semacquire(0xc000018318)
/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc000018310)
/usr/local/go/src/sync/waitgroup.go:130 +0x65
main.main.func2(0xc000018310, 0xc000060120)
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:71 +0x2b
created by main.main
/home/rootx/go/src/github.com/tomnomnom/waybackurls/main.go:70 +0x50c
goroutine 19 [IO wait]:
internal/poll.runtime_pollWait(0x7f199e878f08, 0x72, 0xffffffffffffffff)
/usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0000ca798, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000ca780, 0xc0000c9000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc0000ca780, 0xc0000c9000, 0x1000, 0x1000, 0x1, 0x42f01c, 0xc000047b88)
/usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0000100d0, 0xc0000c9000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:177 +0x69
net/http.(*persistConn).Read(0xc0000806c0, 0xc0000c9000, 0x1000, 0x1000, 0xc000047c88, 0x406845, 0xc000060a20)
/usr/local/go/src/net/http/transport.go:1524 +0x7b
bufio.(*Reader).fill(0xc000057020)
/usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc000057020, 0x1, 0x0, 0x0, 0x1, 0xc000060900, 0x0)
/usr/local/go/src/bufio/bufio.go:138 +0x4f
net/http.(*persistConn).readLoop(0xc0000806c0)
/usr/local/go/src/net/http/transport.go:1677 +0x1a3
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1357 +0xae8
goroutine 22 [IO wait]:
internal/poll.runtime_pollWait(0x7f199e878e38, 0x72, 0xffffffffffffffff)
/usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0000ca918, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000ca900, 0xc000115000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc0000ca900, 0xc000115000, 0x1000, 0x1000, 0x1, 0x42f01c, 0xc000049b88)
/usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0000100d8, 0xc000115000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:177 +0x69
net/http.(*persistConn).Read(0xc0000807e0, 0xc000115000, 0x1000, 0x1000, 0xc000049c88, 0x406845, 0xc000060c00)
/usr/local/go/src/net/http/transport.go:1524 +0x7b
bufio.(*Reader).fill(0xc0000570e0)
/usr/local/go/src/bufio/bufio.go:100 +0x10f
bufio.(*Reader).Peek(0xc0000570e0, 0x1, 0x0, 0x0, 0x1, 0xc000060b00, 0x0)
/usr/local/go/src/bufio/bufio.go:138 +0x4f
net/http.(*persistConn).readLoop(0xc0000807e0)
/usr/local/go/src/net/http/transport.go:1677 +0x1a3
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1357 +0xae8
goroutine 20 [select]:
net/http.(*persistConn).writeLoop(0xc0000806c0)
/usr/local/go/src/net/http/transport.go:1976 +0x113
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1358 +0xb0d
goroutine 23 [select]:
net/http.(*persistConn).writeLoop(0xc0000807e0)
/usr/local/go/src/net/http/transport.go:1976 +0x113
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1358 +0xb0d

Not downloading everything ..

About 2 weeks ago i have downloaded url list for website and got 40K links . Now when i have downloaded again Url List for the same Website i have got only 10K links .

Something have been changed ?

Licence?

Could you please add the license?

Trouble building waybackurls

Steps to reproduce

  • clone waybackurl git repo
  • direct to waybackurl directory
  • run go build

Error facing right now

Capture

go get command not working

hi, got get not working so I'm changing the command to go install github.com/tomnomnom/waybackurls@latest but also not working for me.

Fix CommonCrawl URLs

The current CommonCrawl fetch url is this:

http://index.commoncrawl.org/CC-MAIN-2018-22-index?url=*.%s&output=json

I would suggest that it should be this:

http://index.commoncrawl.org/CC-MAIN-2018-22-index?url=*.%s/*&output=json

See the difference in results in the following:
http://index.commoncrawl.org/CC-MAIN-2018-22-index?url=blog.innerht.ml/*&output=json
as opposed to how you currently have it:
http://index.commoncrawl.org/CC-MAIN-2018-22-index?url=blog.innerht.ml&output=json

Thanks,
Justin

Feature request: set a flag to http or https only to remove duplicates

Hey Tom, I've been playing around with waybackurls for a few days. Love it, thanks for writing a cool tool.

I'd like to preface this by saying It's probably just easier to use sort and uniq straight from the start to remove duplicates - or even run sed to filter out anything before '://' to get the domain only, then sort duplicates.

Anyway, here's the crux of the problem and the suggestion. When inputting a domain list through the use of cat ($ cat domains | waybackurls > urls), you can easily duplicate what is being run in waybackurls, as it seems Waybackmachine itself indexes pages on both http and https regardless if specified or not.

Running waybackurls with a domain list of:

example.com
http://example.com
https://example.com

Will return the exact same results for each, both with http and https archives. The http(s) archives aren't the issue, that's expected. It's more the fact that it takes twice as long to finish if full links haven't been sorted and filtered out of the input file.

While your example on the readme.md uses a domain name without http(s), it can be used with http(s) prefixed. I think a good solution would be an optional bool flag that filters one or the other.

Thanks for you time 👍

[Feature Request] Switch to enable/disable urls from subdomains of base domain

Hey Tom!

Great tool! I was wondering if it would be possible for you to add a command-line arg to select URLs only from the base domain provided. The new WayBack Machine query would look like this:

http://web.archive.org/cdx/search/cdx?url=%s/*&output=json&collapse=urlkey

as opposed to the one that pulls all subdomains too (currently in your code) which is this:

http://web.archive.org/cdx/search/cdx?url=*.%s/*&output=json&collapse=urlkey

I propose that this param should be called "with_subs" and should be Boolean defaulting to True.

Thanks,
Justin

waybackurls.exe

waybackurls.exe这个下载下来怎么使用,-h并没有说教程呀

Flag provided but not defined

Hi
If we are using goland13 it changed the order of the test initialize.
And if we try to use (-dates or -get-versions or -h) , got this error "flag provided but not defined: -XFlag".
-h still works but dont show anything.

Urls can't fetch

Hey Tom Brother, Firstly, I appreciate your waybackurls tool. It's awesome. I have used it so many times, and it was working well, but sorry to say, for a few days, it's not fetched URLs. When I run the following command, they fetch a few URLs. After that, they wait but do not show any URLs.

cat uniq.txt | waybackurls | tee waybackurls.txt

Please upgrade the tools for future reconnaissance.

Advance Thanks,
Sajjad Hosen

image

Ubuntu - Go Get Error / Go Build Same Error - Can not Build/Execute /main.go:191: u.Hostname undefined (type *url.URL has no field or method Hostname)

go get -v github.com/tomnomnom/waybackurls
github.com/tomnomnom/waybackurls
# github.com/tomnomnom/waybackurls
./main.go:191: u.Hostname undefined (type *url.URL has no field or method Hostname)
/go/src/github.com/tomnomnom/waybackurls# go build
# github.com/tomnomnom/waybackurls
./main.go:191: u.Hostname undefined (type *url.URL has no field or method Hostname)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.