laramies / metagoofil Goto Github PK

View Code? Open in Web Editor NEW

999.0 999.0 198.0 812 KB

Metadata harvester

License: GNU General Public License v2.0

Python 100.00% Makefile 0.01%

metagoofil's People

Contributors

Stargazers

Watchers

Forkers

4sp1r3 chrisukgit techbytom noscripter mlinton etnarek bwry dtrip ee-in hacksparrow brainiarc7 brianwrf jjf012 morrolan security-geeks dhaymmon cnbird1999 urwithajit9 stainless5792 hackendless matkuscz bratik skalskis laudarch lopugit ptitdoc iroos f00bar10 goffinet psuedoelastic pombredanne jmcclenon usbrandon ivosandoval starrkdevil rdcklinux beesec dev0p0 angry7panda 0thm4n3 newaynewlife overtec greg198584 todd-placher 521xueweihan phpplay kpelesz techwizmac ekloster pawankumar9 igorone hunter0o070 priestd09 xzblueidea lohar01 markstachowski cainiaoxiaobai2016 exploitcollection knightth0r luisscozzese misakar n0ooooone espi0n flankerz ro9ueadmin cloudsecuritylabs majord4m4ge ivenmori t3rabyt3-zz fishtown jetheller mohammed0000 dfirgeek nan3r signedbytes silentfrognet arunasindhuja-peri cybersec4 i02001992 ver007 matt-morris limuitech liuyun201990 jcarabantes 45h1f solidoptionos modulexcite imagineagents a-min3 seabreg fakegit jobilert 3ur0c3nt5y scr3w-2ooth rajivraj idkwim dickhed abhiunix masterscott j-a-c-k-maynard

metagoofil's Issues

HTML is not well-formed

What steps will reproduce the problem?
1.Execute metagoofil with any target
2.Open the HTML file generated
3.Inspect the code (firefox even highlights the errors)

What is the expected output? What do you see instead?
The HTML output can be opened with a browser and there aren't any errors or 
warnings, but if the HTML code is parsed, you realise that it is not 
well-formed. For example, it starts directly with a <title> tag without 
enveloping everything in a <document> tag. There are also two </head> ending 
tags one after the other, and several extra </pre> tags.


What version of the product are you using? On what operating system?
I've checked out a read-only copy with svn, working in an Ubuntu, but I found 
the same problem with the version provided with Kali Linux.

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 24 Apr 2014 at 11:32

import commands not found , can read discovery & extractors

What steps will reproduce the problem?
1.dl metagoofil,untar,chmod +x,run the script on xubuntu 12

What is the expected output? What do you see instead?

from: can't read /var/mail/discovery
from: can't read /var/mail/extractors
./metagoofil.py: line 3: import: command not found
./metagoofil.py: line 4: import: command not found
./metagoofil.py: line 5: import: command not found
./metagoofil.py: line 6: import: command not found
./metagoofil.py: line 7: import: command not found
./metagoofil.py: line 8: import: command not found
./metagoofil.py: line 9: import: command not found
./metagoofil.py: line 10: import: command not found
./metagoofil.py: line 12: syntax error near unexpected token `"ignore"'
./metagoofil.py: line 12: `warnings.filterwarnings("ignore") # To prevent 
errors from hachoir deprecated functions, need to fix.'


What version of the product are you using? On what operating system?

xubuntu 12.04 . 3.2.0-36-generic kernel
metagoofil-2.1 from the BH2011_Arsenal.tar

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 8:48

ImportError: No module named myparser

What steps will reproduce the problem?
1. I Ran metagoofil.py 
2. Module Error
3.

What is the expected output? What do you see instead?
Error on the module


What version of the product are you using? On what operating system?
2.2, Ubuntu 12.04

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 4 Apr 2013 at 11:17

Google now answers with HTTP 302 which metagoofil 2.2 does not follow yet

What steps will reproduce the problem?
1. default run of metagoofil taken from help screen

What is the expected output? What do you see instead?
expected:  lots of results from apple.com
instead:   0 results, due to HTTP 302 from google

What version of the product are you using? On what operating system?
2.2

Please provide any additional information below.

I solved it by inserting following code in line 28 of discovery/googlesearch.py:

                if returncode == 302:
                        h = httplib.HTTP(self.server)
                        h.putrequest('GET', headers.getheader('Location')[20:])
                        h.putheader('Host', self.hostname)
                        h.putheader('User-agent', self.userAgent)
                        h.endheaders()
                        returncode, returnmsg, headers = h.getreply()



best regards &

many thanks to author for this great piece of software!

Original issue reported on code.google.com by [email protected] on 15 Apr 2015 at 10:18

Attachments:

googlesearch.py

Regular expression adds false-matches to the number of documents returned in query

What steps will reproduce the problem?
1. Run metagoofil.py as expected
2. Observe the number of documents found always 5 more than what was specified

What is the expected output? What do you see instead?

The expected number of matches is always 5 more than requested.  This is due to 
cruft at the bottom of the Google results page that is being matched by the 
compiled regular expression for "<a href=".

What version of the product are you using? On what operating system?
metagoofil-read-only from SVN dated May 16, 2011 (revision 2)

Please provide any additional information below.

The indication of the erroneous matches is seen from metagoofil.py's output 
from the googlesearch.py's invocation of parser.py's fileurls call.  Note the 
output below where "Searching 100 results..." is followed by "Results: 105 
files found".
-------
[-] Searching for doc files, with a limit of 10
        Searching 100 results...
Results: 105 files found
Starting to download 10 of them..

From inspection, the 5 extra matches are:
 '/'
 '/intl/en/ads/'
 '/services/'
 '/intl/en/privacy.html'
 '/intl/en/about.html' 

and are being matched from URLs in the footer of the google search page.

Patch attached to address the additional (spurious) pattern matches.

Original issue reported on code.google.com by [email protected] on 16 Jun 2011 at 3:00

Attachments:

parser.patch

metagoofil.py syntax error

Hello
I just downloaded metagoofil and cloned this projects url. It all went fine, but when I tried to run metagoofil it gave me this error:

File "/usr/share/metagoofil/metagoofil.py", line 14
print "\n******************************************************"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("\n******************************************************")?

I don't know why I'm getting an error and can't find anyone else who has had this problem... What should I do about this?

can't read /var/mail/extractors

What steps will reproduce the problem?
1. downloaded metagoofil
2. untar,chmod +x,run the script on updated/upgraded backtrack 5 r3.

What is the expected output? What do you see instead?

from: can't read /var/mail/extractors

Please provide any additional information below.

updated/upgraded backtrack 5 r3.

Original issue reported on code.google.com by [email protected] on 12 Feb 2013 at 6:01

example: adding download proxy support

What steps will reproduce the problem?
1. run program as-is, no support for proxys
2.
3.

What is the expected output? What do you see instead?
n/a

What version of the product are you using? On what operating system?
svn read-only revision 2

Please provide any additional information below.

The patch below adds download proxy support.  It's not vetted nor robust, but 
illustrates the approach that we're taking to add proxy support for both 
downloading files and a separate proxy for queries.

Original issue reported on code.google.com by [email protected] on 21 Jun 2011 at 8:38

Attachments:

dload.patch

Global variables not propagating to subroutines

What steps will reproduce the problem?
1. Download megagoofil-readonly from svn
2. Run the code without defining values for "-l" or "-n"

What is the expected output? What do you see instead?
The expected output is for the default values to be used as defined in 
metagoofil.py:
limit=100
filelimit=50

What version of the product are you using? On what operating system?
svn release metagoofil-read-only dated May 16, 2011 (revision 2)

Please provide any additional information below.

The shell output below shows what happens when these values are not provided on 
the command line (the code exits in line 126 of metagoofil.py upon attempting 
to convert filelimit into a string)

[** This shows a fresh svn checkout, followed by an attempt to run metagoofil 
without filelimit and without limit specified on the command line.  Result: 
Failure to download **]
Checked out revision 2.
gray@fireball:~/metagoofil$ cd metagoofil-read-only/
gray@fireball:~/metagoofil/metagoofil-read-only$ python ./metagoofil.py -d 
iastate.edu -f iastate-download.html -o iastate -t doc

*************************************
* Metagoofil Ver 2.0 - Reborn       *
* Christian Martorella              *
* Edge-Security.com                 *
* cmartorella_at_edge-security.com  *
* BACKTRACK 5 Edition!!             *
*************************************
['doc']
[-] Starting online search...
[** Program dies at this point -PG **]


[** This shows a fresh svn checkout, followed by an attempt to run metagoofil 
without filelimit specified on the command line.  Result: Failure to download 
**]

gray@fireball:~/metagoofil/metagoofil-read-only$ python ./metagoofil.py -d 
iastate.edu -f iastate-download.html -o iastate -t doc -l 10


[** This shows a fresh svn checkout, followed by an attempt to run metagoofil 
with filelimit and with limit specified on the command line.  Result: Success 
**]
gray@fireball:~/metagoofil/metagoofil-read-only$ python ./metagoofil.py -d 
iastate.edu -f iastate-download.html -o iastate -t doc -l 10 -n 10

*************************************
* Metagoofil Ver 2.0 - Reborn       *
* Christian Martorella              *
* Edge-Security.com                 *
* cmartorella_at_edge-security.com  *
* BACKTRACK 5 Edition!!             *
*************************************
['doc']
[-] Starting online search...

[-] Searching for doc files, with a limit of 10
        Searching 100 results...
Results: 105 files found
Starting to download 10 of them..
----------------------------------------------------

[0/10] http://www.registrar.iastate.edu/calendar/Deptcal1011.doc
[1/10] http://www.mcdb.iastate.edu/MCDBhandbook20102011.doc
[2/10] http://www.registrar.iastate.edu/courses/changesupdates2011.doc
[3/10] http://www.registrar.iastate.edu/courses/changesupdatef2011.doc
[4/10] http://www.philrs.iastate.edu/Rottlerapp.doc
[5/10] http://www.immunobiology.iastate.edu/IMBIOHandbook20102011.doc
[6/10] http://www.stuorg.iastate.edu/isss/SEDSGuidelines.doc
[7/10] http://www.registrar.iastate.edu/veterans/vabenefits.doc
[8/10] http://lasonline.iastate.edu/pdf_files/Online_CDG.doc
[9/10] http://www.bus.iastate.edu/jmcelroy/McElroy.doc
[10/10] http://www.hrs.iastate.edu/ISUOnLine/VeteransPref.doc

[+] List of users found:
--------------------
 mkmcdow
 cchulse
 Kathryn B.  Andre
 Katie
 adptemp
 bjhotch
 Janet Krengel
 Aragorn
 AerE Department
 Iowa State University
 Molly Helmers
 College of Business
 McElroy, James C
 marleneb
 cball

[+] List of software found:
-----------------------
 Microsoft Office Word
 Microsoft Word 10.0
 Microsoft Macintosh Word

[+] List of paths and servers found:
--------------------------------
 Normal.dotm
 Normal
 Normal.dot

Attached patch addresses the issue.

Original issue reported on code.google.com by [email protected] on 16 Jun 2011 at 2:46

Attachments:

global-var.patch

Few issues

1. I had to first grant metagoofil.py execution permissions. (might not be an 
issue)

2. Tried to run the metagoofil.py script and then it stated two errors. 
Therefore, I had to go into metagoofil.py and add the #!/usr/bin/python line. 
The two errors I received before adding the first line are presented below:
a. from: can't read /var/mail/discovery
b. from: can't read /var/mail/extractors

3. After doing the above steps, I then tried to run the script and it gave me 
an error "ImportError: No module named myparser". I realized that, while 
theHarvester had myparser.py in its directory, metagoofil did not. Instead, it 
had a file called parser.py instead.

Original issue reported on code.google.com by [email protected] on 20 Feb 2013 at 7:37

Error downloading file

What steps will reproduce the problem?

python metagoofil.py -d liquid11.co.uk -t pdf,doc,docx,xls,xlsx,ppt,pptx -l 200 
-n 50 -o liquid11files -f metagoofil-liquid11-results.html

What is the expected output? What do you see instead?

[1/50] /webhp?hl=en
         [x] Error downloading /webhp?hl=en
[2/50] /intl/en/ads
         [x] Error downloading /intl/en/ads
[3/50] /services
         [x] Error downloading /services
[4/50] /intl/en/policies/
processing
tuple index out of range
Error creating the file

What version of the product are you using? On what operating system?

2.2 - Ubuntu 12.04.1 LTS

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 19 Mar 2013 at 4:33

Version 2.2 works?

What steps will reproduce the problem?
1. python metagoofil.py
2.
3.

What is the expected output? What do you see instead?
 ~/Downloads/metagoofil-2.2 ⮀ python metagoofil.py 
Traceback (most recent call last):
  File "metagoofil.py", line 1, in <module>
    from discovery import googlesearch
  File "/home/lyy/Downloads/metagoofil-2.2/discovery/googlesearch.py", line 3, in <module>
    import myparser
ImportError: No module named myparser


What version of the product are you using? On what operating system?
2.2 downloaded from here.Linux amd64

Please provide any additional information below.
blackhat version provided may be work,
but lots of modified should be made to work.Especially when it comes with 
Chinese characters.

Original issue reported on code.google.com by [email protected] on 8 Jul 2013 at 1:03

metagoofil does not find any results

What steps will reproduce the problem?
1. metagoofil -d microsoft.com -t doc,pdf -l 200 -n 50 -o msfiles -f 
results.html
2. metagoofil -d docs.kali.org -t doc,pdf -l 200 -n 50 -o kalifiles -f 
results.html
3. metagoofil -d apple.com -t doc,pdf -l 200 -n 50 -o applefiles -f results.html

What is the expected output? What do you see instead?

All searches return the same thing.

[-] Starting online search...

[-] Searching for doc files, with a limit of 200
    Searching 100 results...
    Searching 200 results...
Results: 0 files found
Starting to download 50 of them:
----------------------------------------


[-] Searching for pdf files, with a limit of 200
    Searching 100 results...
    Searching 200 results...
Results: 0 files found
Starting to download 50 of them:
----------------------------------------

processing
user
email

[+] List of users found:
--------------------------

[+] List of software found:
-----------------------------

[+] List of paths and servers found:
---------------------------------------

[+] List of e-mails found:
----------------------------


What version of the product are you using? On what operating system?

metagoofil 2.2 on Kali Linux. Same result with a fresh svn checkout.

Please provide any additional information below.

I suspect Google has changed things on you again.

Original issue reported on code.google.com by [email protected] on 24 Mar 2015 at 3:02

Failing to run correctly

What steps will reproduce the problem?
1.cl execution

root@bt:/pentest/enumeration/google/metagoofil# ./metagoofil.py -d DOMAINNAM -t 
doc -l 200 -n 50 -o /root/Desktop/ -f results.html


2. cli ERROR

[1/50] /webhp?hl=en
Error downloading /webhp?hl=en
[2/50] /intl/en/ads
Error downloading /intl/en/ads
[3/50] /services
Error downloading /services
[4/50] /intl/en/policies/
tuple index out of range
Error creating the file


3.

What is the expected output? What do you see instead?

data gathered, instead get errors

What version of the product are you using? On what operating system?

2.1 BackTrack5R3 (ubuntu)

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 31 Jul 2013 at 9:59

i can't run 2.2 correctly in china

What steps will reproduce the problem?
1.I run this script in China:
> metagoofil.py -d swu.edu.cn -t doc -l 20 -n 20 -o test -f test.html
 output：

******************************************************
*     /\/\   ___| |_ __ _  __ _  ___   ___  / _(_) | *
*    /    \ / _ \ __/ _` |/ _` |/ _ \ / _ \| |_| | | *
*   / /\/\ \  __/ || (_| | (_| | (_) | (_) |  _| | | *
*   \/    \/\___|\__\__,_|\__, |\___/ \___/|_| |_|_| *
*                         |___/                      *
* Metagoofil Ver 2.2                                 *
* Christian Martorella                               *
* Edge-Security.com                                  *
* cmartorella_at_edge-security.com                   *
******************************************************
['doc']

[-] Starting online search...

[-] Searching for doc files, with a limit of 20
        Searching 100 results...
Results: 0 files found
Starting to download 20 of them:
----------------------------------------

processing
user
email

[+] List of users found:
--------------------------

[+] List of software found:
-----------------------------

[+] List of paths and servers found:
---------------------------------------

[+] List of e-mails found:
----------------------------


2. I tried to modify the file: discovery/googlesearch.py
change：   
self.server="www.google.com"
self.hostname="www.google.com"
to:
self.server="www.google.com.hk"
self.hostname="www.google.com.hk"

Re-run step1，
output:
....
['doc']

[-] Starting online search...

[-] Searching for doc files, with a limit of 20
_

This time, the screen output to stop in here and can not continue to go down.(I 
do not know if you can understand, i'm sorry for my poor English!)

I debugged the code and found this script execution is blocked here，i don't 
know what's happen
discovery/googlesearch.py：27  self.results = h.getfile().read()

It looks like google to return too many results


3.so I adjusted the page size parameter: 
 discovery/googlesearch.py：16 
self.quantity="100"  ===> self.quantity="10" 
 discovery/googlesearch.py：46 
self.counter+=100   ===>  self.counter+=10

and I also modified this point 
discovery/googlesearch.py：27 
     self.results = h.getfile().read()
     h.close() #Add this sentence seems to be useful

Re-run step1, Sometimes it works, sometimes the same as before

What is the expected output? What do you see instead?

it does not work very well

What version of the product are you using? On what operating system?
metagoofil 2.2 windows7 python2.6

Please provide any additional information below.

if the script run successfully， the results file path list contains some 
errors:
eg :
[1/20] /webhp?hl=en-HK
         [x] Error downloading /webhp?hl=en-HK
[12/20] /support/websearch/bin/answer.py?answer=134479
        [x] Error downloading /support/websearch/bin/answer.py?answer=134479
....

my solution is :
myparser.py:43 
#reg_urls = re.compile('<a href="(.*?)"')
reg_urls = re.compile('<a href="[^">]*?/url\?q=([^">]*?)&amp;sa=U.*?"')

 The result looks no problem, I do not know any other way, I do not want to change it again.

Original issue reported on code.google.com by [email protected] on 23 Sep 2013 at 2:45

laramies / metagoofil Goto Github PK

metagoofil's People

Contributors

Stargazers

Watchers

Forkers

metagoofil's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs