GithubHelp home page GithubHelp logo

fb-crawl's Introduction

fb-crawl.pl is a script that crawls/scrapes Facebook friends and adds their information to a database.
It can be used for social graph analysis and refined Facebook searching.

FEATURES

    - Multithreaded
    - Aggregates information from multiple accounts

REQUIREMENTS

    - Perl 5 or greater
    - MySQL

INSTALLATION

    $ cd fb-crawl
    $ chmod +x fb-crawl.pl
    $ ./fb-crawl.pl
    
    fb-crawl.pl will set up all the required database tables.
    All you have to do is provide it with the MySQL connection information and Facebook account.
    
    $ ./fb-crawl.pl -u [email protected] -host mysql.host -user fb-crawl -pass mysqlPassword

OPTIONS
    
    -u         Facebook email address.
    -p         Facebook password.
    -host      MySQL server IP address or host name (default: localhost);
    -port      MySQL port (default: 3306)
    -user      MySQL user (default: root)
    -pass      MySQL password.
    -db        MySQL database (default: facebook)
    -tables    MySQL table names for info, wall, and friends in that order. Formatted in a colon separated list. (default: info:wall:friends)
    -info      User info save method (default: append)
                append:  This appends new comma separated information to the row and keeps the old information.
                         Useful when you want to save all user changes and don't care about when it was updated.
                insert:  This inserts new user information in a new row (degrages searchability).
                         Useful when you want to save all user changes and when that info was updated.
                replace: This replaces all the current user information in the database with the new information.
                         Useful when you only care about the most recent user information.
    -i         Crawl user's information and add to info table.
    -w         Crawl user's wall posts and add to wall table.
    -f         Crawl user's friends and add to friends table.
    -self      Crawl your profile too.
    -t         Threads (default: 16)
    -https     Use SSL encryption.
    -proxy     Use an HTTP proxy. host[:port]
    -timeout   Timeout in seconds (default: 30)
    -depth     Crawl depth (default: 0)
                0 - only your friends
                1 - friends of friends
                2 - friends of friends of friends
                3 - friendception
    -url       Crawl these url(s) and also crawl the user's friends if -depth > 1
                example: -url http://fb.com/profile.php?id=12345,profile.php?id=54321,john.smith.3
    -name      Search for and crawl these name(s) and also crawl the user's friends if -depth > 1.
               This works by using Facebook's search by and using the first result. For more precision use -url.
                example: -name "John Smith, Jane Smith"
    -new       Only crawl users that aren't in the database.
    -old       Only crawl users that are in the database.
    -plugins   Plug-ins to include.
                example: birthday2date.pl,location2LatLon.pl
    -h         Help

EXAMPLES

    Crawl your friends' Facebook information, wall, and friends:
    $ ./fb-crawl.pl -u [email protected] -i -w -f

    Crawl John Smith's Facebook information, wall, and friends:
    $ ./fb-crawl.pl -u [email protected] -i -w -f -name 'John Smith'

    Crawl Facebook information for friends of friends:
    $ ./fb-crawl.pl -u [email protected] -depth 1 -i
    
    Crawl Facebook information of John Smith's friends of friends:
    $ ./fb-crawl.pl -u [email protected] -depth 1 -i -name 'John Smith'
    
    Extreme: Crawl friends of friends of friends of friends with 200 threads:
    $ ./fb-crawl.pl -u email@address -depth 4 -t 200 -i -w -f
    
    
MYSQL EXAMPLES
    
    Find local singles:
    SELECT `user_name`, `profile` FROM `info` WHERE `current_city` = 'My Current City, State' AND `sex` = 'Female' AND `relationship` = 'Single'
    
    Find some Harvard singles:
    SELECT `user_name`, `profile` FROM `info` WHERE `college` = 'Harvard University' AND `sex` = 'Female' AND `relationship` = 'Single'
    
    How many Facebook employees have you crawled? 
    SELECT count(*) FROM `info` WHERE `company` = 'Facebook'
    
    Find John Smith's friends:
    SELECT `friends` FROM `friends` WHERE `name` = 'John Smith'
    
    
PLUG-INS
    
    fb-crawl.pl will open a perl script that can analyze and modify user information before it goes into the database.
    The script should contain a function with the same name as the file.
    The function is passed a hash reference with the current user's information in it.
    
    To load a plug-in use the -plugins option:
    $ ./fb-crawl.pl -u email@address -i -plugins location2latlon.pl,birthday2date.pl
    
    location2latlon.pl:
    This plug-in adds the user's coordinates to the database using the Google Geocoding API.
    
    birthday2date.pl:
    This plug-in convert the user's birthday to MySQL date (YYYY-MM-DD) format.
    
    See plugin files for implementation details.
    
FAQ

    It's logging in but won't load my friends?
    You probably have SSL enabled on your account. You need to use the -https option.
    
    Can't locate object method "ssl_opts" via package "LWP::UserAgent"
    You need to install LWP::Protocol::https.
    
    $ sudo perl -MCPAN -e 'install LWP::Protocol::https'
    

fb-crawl's People

Watchers

James Cloos avatar

fb-crawl's Issues

Info is empty

i appended -i but the info table is empty..what i do wrong?

Original issue reported on code.google.com by [email protected] on 10 Apr 2013 at 8:40

Don't craw data from facebook

What steps will reproduce the problem?
1. run with cmd: ./fb-crawl.pl -u xxx@xxx -p xxx -https -i -w -f
2. display error: 
Use of uninitialized value $fb_user_id in numeric eq (==) at ./fb-crawl.pl line 
510.
Argument "\x{6d}\x{6c}..." isn't numeric in numeric eq (==) at ./fb-crawl.pl 
line 510.
3.

What is the expected output? What do you see instead?
table into database don't have data

What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 1 Dec 2014 at 3:26

No data is crawled

What steps will reproduce the problem?
1. Execute ./fb-crawl.pl -u [email protected] -p xxxxx -https -i -w -f

Result is:

+ Connecting to [email protected] on port 3306
+ Checking Tables
 | Table "info" exists
 | Table "wall" exists
 | Table "friends" exists
+ Logging in...done
+ Entering depth level: 0 (your friends)
+ Loading My Name's friends. User ID: 123456789
+ Entering depth level: 1 (friends of friends)
+ Entering depth level: 2 (friends of friends of friends)
+ Entering depth level: 3 (friends of friends of friends of friends)
+ 0 profiles crawled in 8 seconds

The DB tables are created, but no data is saved into it.

Original issue reported on code.google.com by [email protected] on 27 Jun 2013 at 3:25

Get error: 0

.nontouch ._55wr{padding:4px}
.nontouch ._55ws{padding:6px}
.nontouch ._56hq{padding:8px}._52j9{color:#adb2bb}
._52ja{color:#6a7180}
._52jb{color:#141823}
.touched ._592p ._52j9, .touched ._592p._52j9, .touched._592p ._52j9, 
.touched._592p._52j9, .touched ._592p ._52ja, .touched ._592p._52ja, 
.touched._592p ._52ja, .touched._592p._52ja, .touched ._592p._52jb, 
.touched._592p ._52jb, .touched ._592p ._52jb, .touched._592p._52jb, .touched 
._592p, .touched._592p{color:#fff}
._56bq{font-size:11px;line-height:16px;text-transform:uppercase}
._52jc{font-size:12px;line-height:16px}
._52jd{font-size:14px;line-height:20px}
._52je{font-size:16px;line-height:20px}
._52jf{font-size:18px;line-height:24px}
._52jg{font-weight:normal}
._52jh{font-weight:bold}
._52ji{text-align:left}
._52jj{text-align:center}
._52jk{text-align:right}._56bg{border:0;display:block;margin:0;padding:0}.btn{bo
rder:solid 2px;cursor:pointer;margin:0;padding:2px 6px 3px;text-align:center}
.btn.largeBtn{display:block}
button.largeBtn,
input.largeBtn{width:100%}
.btnForm{display:inline;border:none;padding:0}
.btnD,
.acb .btnC,
.btnI,
.nontouch a.btnD,
.nontouch .acb a.btnC,
.nontouch a.btnI{background:#f3f4f5;border-color:#ccc #aaa #999;color:#505c77}
.acb .btnD,
.btnC,
.acb .btnI,
.nontouch .acb a.btnD,
.nontouch a.btnC,
.nontouch a.btnC:visited,
.nontouch .acb a.btnI{background:#3b5998;border-color:#8a9ac5 #29447E 
#1a356e;color:#fff}
.nontouch .btnC.disabled{color:#9dabce}
.btnS,
.nontouch a.btnS,
.nontouch a.btnS:visited{background:#69a74e;border-color:#98c37d #3b6e22 
#2c5115;color:#fff}
.btnN,
.nontouch a.btnN,
.nontouch a.btnN:visited{background:#ee3f10;border-color:#f48365 #8d290e 
#762610;color:#fff}
.btn .img{pointer-events:none}.btn,
.btnForm{display:inline-block}
.btn + .btn,
.btnForm + .btnForm,
.btn + .btnForm,
.btnForm + .btn{margin-left:3px}
.largeBtn + .largeBtn{margin-left:0;margin-top:6px}
.btn input{background:none;border:none;margin:0;padding:0}
.btnD input,
.acb .btnC input,
.btnI input{color:#505c77}
.acb .btnD input,
.btnC input,
.acb .btnI input,
.btnS input,
.btnN input{color:#fff}.nontouch a,
.nontouch a:visited{color:#3b5998;text-decoration:none}
.nontouch .sub,
.nontouch .sub:visited{color:gray}
.nontouch .sec,
.nontouch .sec:visited{color:#6d84b4}
.nontouch .inv,
.nontouch .inv:visited{color:#fff}.nontouch a:focus,
.nontouch a:hover,
.nontouch .sub:focus,
.nontouch .sub:hover,
.nontouch .sec:focus,
.nontouch .sec:hover{background-color:#3b5998;color:#fff}
.nontouch .inv:focus,
.nontouch .inv:hover,
.nontouch .inv:hover .fcy,
.nontouch .inv:focus .fcy{background-color:#fff;color:#3b5998}._5pkb, 
._5pkc{margin:0}
._5pkb li, ._5pkc 
li{display:block;list-style:none}body{text-align:left;direction:ltr}
body, tr, input, textarea, button{font-family:sans-serif}
body,
p, figure,
h1, h2, h3, h4, h5, h6,
ul, ol, li, dl, dd, dt{margin:0;padding:0}
h1, h2, h3, h4, h5, h6{font-size:1em;font-weight:bold}
ul, ol{list-style:none}._513c #viewport{margin:0 
auto;max-width:600px}#page{position:relative}.lr{width:100%}
.lr .r{text-align:right}.img{border:0;display:inline-block;vertical-align:top}
i.img u{position:absolute;width:0;height:0;overflow:hidden}.nontouch 
._5ui2{background:#eceff5}
.nontouch ._5ui2 a, .nontouch ._5ui2 a:visited{color:#2b55ad}
.nontouch ._5ui2 a:hover, .nontouch ._5ui2 
a:focus{background:#2b55ad;color:#fff}
.nontouch ._5ui3, .nontouch ._5ui4{padding:0 6px 6px}
.nontouch ._5ui5{border-top:1px solid 
#dfe2e8;padding:3px}/*]]>*/</style></head><body tabindex="0" class="nontouch x1 
ff _513c iframe acw"><div class="mfsm"><div id="viewport"><div class="acb aps" 
id="header"><table cellspacing="0" cellpadding="0" class="lr"><tr><td 
valign="top"><h1><a href="/home.php?refid=9"><img 
src="https://fbstatic-a.akamaihd.net/rsrc.php/v2/yz/r/aKhO2tw3FnO.png" 
width="76" height="20" class="img" alt="facebook" /></a></h1></td><td 
valign="top" class="r"><a class="btn btnS" href="/r.php?refid=9">Create 
Account</a></td></tr></table></div><div id="objects_container"><div id="root" 
role="main" class="_5so8 acw" data-sigil="context-layer-root 
content-pane"><table class="_4g33"><tbody><tr><td class="_4g34"><div class="acy 
aps abb"><span class="mfss">You must log in first.</span></div><div class="aclb 
_5rut"><form method="post" class="mobile-login-form _5so9" id="login_form" 
novalidate="1" 
action="https://m.facebook.com/login.php?next=https%3A%2F%2Fm.facebook.com%2Fpro
file.php%3Fid%3D0
! Request Failed: https://www.facebook.com/ajax/browser/list/allfriends/?uid=ml 
version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" 
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html 
xmlns="http://www.w3.org/1999/xhtml"><head title="Welcome to 
Facebook"><title>Welcome to Facebook</title><meta name="description" 
content="Facebook helps you connect and share with the people in your life." 
/><meta name="referrer" content="default" id="meta_referrer" /><style 
type="text/css">/*<![CDATA[*/.nontouch ._5ruq{border:solid 1px 
#999;font-size:medium;width:89%}
#root._5t66{background-color:#395a99;color:#fff}
#root._5t66 a{color:#fff}
._5t67{padding:56px 36px 0 36px;text-align:center}._5so8 #header 
td{padding:2px;vertical-align:middle}
._5so8 .other-links{line-height:2em}
._5so8 .button_area .btnC{color:#fff}
._5so8 .button_area .btnR{background-color:#b9b9b9;color:#000}
._5so8 .button_area .btnR:hover{background-color:#999;color:#000}
._5so9 div.button_area{width:89%}
._5so9 #form_fields > div > div{padding:0 3px 0 0}
._5so9 .button_area .largeBtn{padding:5px}
._4u99{color:#87898c;font-size:13px;font-weight:700;padding:5px 0 0 
4px}.nontouch 
._4g33{border:0;border-collapse:collapse;margin:0;padding:0;width:100%}
.nontouch ._4g33 tbody, .nontouch ._52wc > tr > td, .nontouch ._52wc > tbody > 
tr > td, .nontouch ._4g33 td._52wc, .nontouch ._52wf > tr > td, .nontouch 
._52wf > tbody > tr > td, .nontouch ._4g33 td._52wf{vertical-align:top}
.nontouch ._52wd > tr > td, .nontouch ._52wd > tbody > tr > td, .nontouch 
._4g33 td._52wd{vertical-align:bottom}
.nontouch ._52we > tr > td, .nontouch ._52we > tbody > tr > td, .nontouch 
._4g33 td._52we{vertical-align:middle}
.nontouch ._4g33 td{padding:0}
.nontouch ._4g33 td._55wq{padding:2px}
.nontouch ._4g33 td._55wr{padding:4px}
.nontouch ._4g33 td._55ws{padding:6px}
.nontouch ._4g33 td._56hq{padding:8px}
.nontouch ._4g34{width:100%}.acw{background-color:#fff}
.acbk{background-color:#000}
.acb{background-color:#3b5998}
.aclb{background-color:#eceff5}
.acdb{background-color:#31394a}
.acg{background-color:#f2f2f2}
.acy{background-color:#fffbe2;color:#7f7212}
.acr{background-color:#ffebe8;color:#6d220d}.aps{padding:2px 3px}
.apm{padding:4px 3px}
.apl{padding:6px 3px}.abt{border-top:1px solid}
.abb{border-bottom:1px solid}
.acw{border-color:#e9e9e9}
.acb{border-color:#1d4088}
.aclb{border-color:#d8dfea}
.acg{border-color:#ccc}
.acy{border-color:#e2c822}
.acr{border-color:#dd3c10}.fcb{color:#000}
.fcg{color:gray}
.fcw{color:#fff}
.fcl{color:#3b5998}
.fcs{color:#6d84b4}.mfsxs{font-size:x-small}
.mfss{font-size:small}
body, tr, input, textarea, .mfsm{font-size:medium}
.mfsl{font-size:large}form{margin:0;border:0}.nontouch ._55wp{padding:0}
.nontouch ._55wq{padding:2px}
.nontouch ._55wr{padding:4px}
.nontouch ._55ws{padding:6px}
.nontouch ._56hq{padding:8px}._52j9{color:#adb2bb}
._52ja{color:#6a7180}
._52jb{color:#141823}
.touched ._592p ._52j9, .touched ._592p._52j9, .touched._592p ._52j9, 
.touched._592p._52j9, .touched ._592p ._52ja, .touched ._592p._52ja, 
.touched._592p ._52ja, .touched._592p._52ja, .touched ._592p._52jb, 
.touched._592p ._52jb, .touched ._592p ._52jb, .touched._592p._52jb, .touched 
._592p, .touched._592p{color:#fff}
._56bq{font-size:11px;line-height:16px;text-transform:uppercase}
._52jc{font-size:12px;line-height:16px}
._52jd{font-size:14px;line-height:20px}
._52je{font-size:16px;line-height:20px}
._52jf{font-size:18px;line-height:24px}
._52jg{font-weight:normal}
._52jh{font-weight:bold}
._52ji{text-align:left}
._52jj{text-align:center}
._52jk{text-align:right}._56bg{border:0;display:block;margin:0;padding:0}.btn{bo
rder:solid 2px;cursor:pointer;margin:0;padding:2px 6px 3px;text-align:center}
.btn.largeBtn{display:block}
button.largeBtn,
input.largeBtn{width:100%}
.btnForm{display:inline;border:none;padding:0}
.btnD,
.acb .btnC,
.btnI,
.nontouch a.btnD,
.nontouch .acb a.btnC,
.nontouch a.btnI{background:#f3f4f5;border-color:#ccc #aaa #999;color:#505c77}
.acb .btnD,
.btnC,
.acb .btnI,
.nontouch .acb a.btnD,
.nontouch a.btnC,
.nontouch a.btnC:visited,
.nontouch .acb a.btnI{background:#3b5998;border-color:#8a9ac5 #29447E 
#1a356e;color:#fff}
.nontouch .btnC.disabled{color:#9dabce}
.btnS,
.nontouch a.btnS,
.nontouch a.btnS:visited{background:#69a74e;border-color:#98c37d #3b6e22 
#2c5115;color:#fff}
.btnN,
.nontouch a.btnN,
.nontouch a.btnN:visited{background:#ee3f10;border-color:#f48365 #8d290e 
#762610;color:#fff}
.btn .img{pointer-events:none}.btn,
.btnForm{display:inline-block}
.btn + .btn,
.btnForm + .btnForm,
.btn + .btnForm,
.btnForm + .btn{margin-left:3px}
.largeBtn + .largeBtn{margin-left:0;margin-top:6px}
.btn input{background:none;border:none;margin:0;padding:0}
.btnD input,
.acb .btnC input,
.btnI input{color:#505c77}
.acb .btnD input,
.btnC input,
.acb .btnI input,
.btnS input,
.btnN input{color:#fff}.nontouch a,
.nontouch a:visited{color:#3b5998;text-decoration:none}
.nontouch .sub,
.nontouch .sub:visited{color:gray}
.nontouch .sec,
.nontouch .sec:visited{color:#6d84b4}
.nontouch .inv,
.nontouch .inv:visited{color:#fff}.nontouch a:focus,
.nontouch a:hover,
.nontouch .sub:focus,
.nontouch .sub:hover,
.nontouch .sec:focus,
.nontouch .sec:hover{background-color:#3b5998;color:#fff}
.nontouch .inv:focus,
.nontouch .inv:hover,
.nontouch .inv:hover .fcy,
.nontouch .inv:focus .fcy{background-color:#fff;color:#3b5998}._5pkb, 
._5pkc{margin:0}
._5pkb li, ._5pkc 
li{display:block;list-style:none}body{text-align:left;direction:ltr}
body, tr, input, textarea, button{font-family:sans-serif}
body,
p, figure,
h1, h2, h3, h4, h5, h6,
ul, ol, li, dl, dd, dt{margin:0;padding:0}
h1, h2, h3, h4, h5, h6{font-size:1em;font-weight:bold}
ul, ol{list-style:none}._513c #viewport{margin:0 
auto;max-width:600px}#page{position:relative}.lr{width:100%}
.lr .r{text-align:right}.img{border:0;display:inline-block;vertical-align:top}
i.img u{position:absolute;width:0;height:0;overflow:hidden}.nontouch 
._5ui2{background:#eceff5}
.nontouch ._5ui2 a, .nontouch ._5ui2 a:visited{color:#2b55ad}
.nontouch ._5ui2 a:hover, .nontouch ._5ui2 
a:focus{background:#2b55ad;color:#fff}
.nontouch ._5ui3, .nontouch ._5ui4{padding:0 6px 6px}
.nontouch ._5ui5{border-top:1px solid 
#dfe2e8;padding:3px}/*]]>*/</style></head><body tabindex="0" class="nontouch x1 
ff _513c iframe acw"><div class="mfsm"><div id="viewport"><div class="acb aps" 
id="header"><table cellspacing="0" cellpadding="0" class="lr"><tr><td 
valign="top"><h1><a href="/home.php?refid=9"><img 
src="https://fbstatic-a.akamaihd.net/rsrc.php/v2/yz/r/aKhO2tw3FnO.png" 
width="76" height="20" class="img" alt="facebook" /></a></h1></td><td 
valign="top" class="r"><a class="btn btnS" href="/r.php?refid=9">Create 
Account</a></td></tr></table></div><div id="objects_container"><div id="root" 
role="main" class="_5so8 acw" data-sigil="context-layer-root 
content-pane"><table class="_4g33"><tbody><tr><td class="_4g34"><div class="acy 
aps abb"><span class="mfss">You must log in first.</span></div><div class="aclb 
_5rut"><form method="post" class="mobile-login-form _5so9" id="login_form" 
novalidate="1" 
action="https://m.facebook.com/login.php?next=https%3A%2F%2Fm.facebook.com%2Fpro
file.php%3Fid%3D0&infinitescroll=1&location=friends_tab_tl&start=0&__user=0&__a=
1 - Sorry, something went wrong.
Error: 0

So when i run the line "perl fb-crawl/fb-crawl.pl -u *******@hotmail.com -p 
password -https -f -i -w -self" i get the error above no Mather witch 
parameters i chose

Original issue reported on code.google.com by [email protected] on 4 Dec 2013 at 12:28

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.