GithubHelp home page GithubHelp logo

kevinzg / facebook-scraper Goto Github PK

View Code? Open in Web Editor NEW
2.2K 2.2K 589.0 12.59 MB

Scrape Facebook public pages without an API key

License: MIT License

Python 100.00%
facebook facebook-scraper facebook-scraping hacktoberfest scraping

facebook-scraper's People

Contributors

asymness avatar barakplasma avatar bipsen avatar dependabot[bot] avatar ianneee avatar is3ka1 avatar jocejocejoe avatar josx avatar jwesheath avatar kevinzg avatar krzygorz avatar lazanet avatar lennoxho avatar lucasmrdt avatar neon-ninja avatar nielsoerbaek avatar nubpro avatar pierremesure avatar qdii avatar rednafi avatar roma-glushko avatar rshkunov avatar salamtamp avatar sassbalint avatar senexus avatar suryashekharc avatar tbuytaer avatar themulti0 avatar travelsir avatar vanguard-52236 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facebook-scraper's Issues

How can I download the facebook scraped data in an excel file

This is the code
for post in get_posts('iamforloveofwater', pages=50, extra_info=True):
#print (post)
print (post['text'][:100])
temp=pd.DataFrame(data=post)
dj=dj.append(temp,verify_integrity=False)

Error: ValueError: If using all scalar values, you must pass an index

Extract search results

Is there a way to query for a keyword across all posts? Or do I have to pick a specific user page?

empty console - no output/error

Hi, I have tried your library, very easy to use. However, while scraping the facebook groups, it doesn't return anything. I've tried with public groups too. Returns nothing and emptiness just like my 2019...xD

can you please look into this? Thanks.

No response when retrieving posts

I just installed the library and I'm trying to run the example at the README, but I get an error

Traceback (most recent call last):
  File "/Users/fernandagomes/dev/deni_dashbot/main_executer.py", line 49, in <module>
    for p in posts:
  File "/Users/fernandagomes/dev/deni_dashbot/deni_env/lib/python3.6/site-packages/facebook_scraper.py", line 55, in get_posts
    html = response.html
AttributeError: 'Response' object has no attribute 'html'

Any idea on why this is happening? I tried with many different pages.

Image quality is low

Hi, the link of 'image' returned by get_posts is different from the link you get with 'show image' on browser and it gives back an image on a lower resolution.
Would it be possible to change it in order to get the highest resolution?

No response!

I've tried to run this code as it is, by providing my login credentials and the facebook group id. However, I'm not getting any response neither any message on the terminal. It's all blank. Can anyone here guide me?

Changed algorithm?

It happened since yesterday, I can't extract text from facebook page anymore

Continue Reading not working

When "Continue Reading" is displayed in a post, only a fraction of the text is returned (when "see more" is displayed it works well).

Post Limit

I very much appreciate you work on this. Thank you.

There seems to be a limit of about 300 posts that are able to be collected. How can i remedy this and collect all of a persons posts ?

Thank You

Login does not work

Apparently after the login function run, later request still get the response as if the user hasn't logged in.

Couldn't get any posts. (error)

With one private group, I was able to scrape the posts (there are only 3 or 4 in the group) however in a larger private group with more activity where I am also an admin, I get an error of Couldn't get any posts.

Could it be a facebook group setting that differs between the group that works and the one that doesn't? I know the smaller group was recently created, and basically has the default facebook settings. Also I ran this again and outputted some of the html to a file, and now it is titled "Security Check". So I think it is presenting a captcha now.

AttributeError: 'NoneType' object has no attribute 'html'

Hi there,

Thanks for your great efforts. It's great to see this kind of package since FB changed its API policy last year.

I was trying to leverage your code to collect some data. The following is my code. However, when I executed my code, I got the "AttributeError: 'NoneType' object has no attribute 'html'. Do you know how to fix the issue?

d = []
for post in get_posts('Chrysler', pages=1000):
d.append({'time': post['time'], 'post_id': post['post_id'], 'text': post['text'][:20000], 'like':
post['likes'], 'comment':post['comments'],'share': post['shares'] , 'URL':post['post_url'], 'link':post['link']})

Obj1= pd.DataFrame(d)
Obj1.to_csv("fgc_Chrysler.csv")

Thank a lot!

Only limited number of posts could be scraped when scraping groups

from facebook_scraper import get_posts

for post in get_posts(group='234176430922917', credentials=('xxxx', 'xxxx')):
    print(post)

When executing the above code, only about 20 posts are collected while the group has hundreds of new posts a day.
Moreover, the time of all posts is none. Is it normal?

Thank you very much.

installation with pip mal functions

Hi!

i get the following error trying to install with pip

Collecting facebook-scraper
Could not find a version that satisfies the requirement facebook-scraper (from versions: )
No matching distribution found for facebook-scraper

thanks!

Login unsuccessful

for post in get_posts(group='EatsleeprepeatESR/', credentials={'email' : '[email protected]' , 'pass' : '*****************'}):
print(post['text'][:50])

This is my code. Despite of providing my correct credentials, it gives me error:
" warnings.warn('login unsuccessful')
UserWarning: login unsuccessful"

Capture posts from old pages

Please, I did:

from facebook_scraper import get_posts

for post in get_posts('cezinhademadureira', pages=1):
    print(post['text'])

And for this page the return is empty. I noticed that it's an old page and hasn't been updated in a while. Is there a solution in these cases?

Only a small part of longer post is getting returned

For longer post, only a fraction of the text is being returned. this is different from the issue #2 as more than one paragraph is being shown but not the full text. In my use cases, there seems to be a limit of around 700 characters after which "... more" is displayed at the end of the returned fraction.

Has anyone ran into this issue, any fix ?
Thank you

Proxies support

Hi!
Cannot find proxy support in the library, am I wrong?

Regards,
Vlad

Scraper only retrieving 2 posts

It seems Facebook changed its code and the scraper is only getting 2 posts now.

I'll try to fix this over the weekend but if anyone has looked into it please share your findings here. PRs with a fix are also welcomed.

Retrieve text from image posts (and one more..)

Firstly, thank you so much giving us this amazing scraper, I really appreciate your hard work that went into building this.

Now to the issue:

  1. Whenever facebook-scraper gets to a post like this, it returns empty text, post_text, shared_text
    For example:
    https://www.facebook.com/littlekiteskerala/posts/553818741897029

2 . Also, sometimes number of shares returned are 0, and I am unable to figure out why?

Cheers and stay safe

Scrape all images per post

When using get_posts only the first image url gets scraped. Is there a way to get all of them?
Thank you

Feature Request - Video

Is it possible to provide video support with a post type parameter like in the original graph api

eg 'type': video,
'source_url': link,
Thank you

Scraping Likes, but no reactions

I have noticed, that your script only scraps total number of likes and it dose not take reactions at all

Also shares are not working at all

Attribute error _find_and_search

For some posts, in the function _find_and_search the container returned is of type None. The subsequent pattern search explicitly looks for the attribute html which then leads to an exception.

2019-12-18-002101_655x344_scrot

Shares field issue

Sometimes the shares field returns 0 when in fact a post has been shared multiple times, just FYI.

AttributeError: 'NoneType' object has no attribute 'find'

First, I would like to say thank for your work, this script is very useful.

I ran get_post with a absurdly big number just to get all the post of the post.
After downloading 144 post I got this

` File ".../facebook_post_downloader/venv/lib/python3.6/site-packages/facebook_scraper.py", line 75, in _get_posts
yield _extract_post(article)
File ".../facebook_post_downloader/venv/lib/python3.6/site-packages/facebook_scraper.py", line 102, in _extract_post
text, post_text, shared_text = _extract_text(article)
File ".../facebook_post_downloader/venv/lib/python3.6/site-packages/facebook_scraper.py", line 137, in _extract_text
nodes = article.find('p, header')
AttributeError: 'NoneType' object has no attribute 'find'

Process finished with exit code 1
`

I was expecting a graceful exit from the generator.

Use account without logging in on each subsequent request

If I'm not mistaken, at the moment scraping several pages in a row would have _login_user triggered for each _get_posts call. This might be flagged by facebook as suspicious.

In my usecase I fixed it this way:

def _get_posts(path, pages=10, timeout=5, sleep=0, credentials=None):
    """Gets posts for a given account."""
    global _session, _timeout

    url = f'{_base_url}/{path}'
    if _session is None:
        _session = HTMLSession()
        _session.headers.update(_headers)

        if credentials:
            _login_user(*credentials)

That however this isn't optimal for people who use several different accounts in their workloads, so I believe it should be discussed.

Timestamp option

I get the following info from nintendo, with extra_info=True:

image

But, I can see the timestamp from when the text was posted.
How I can see the timestamp?

Feature request - provide login option in order to scrape private/age restricted pages

Thank you for this great package. I need a way to scrape certain Facebook pages that require the user to be logged in. I believe some of these pages (think alcohol and tobacco related products) only need the user to be logged in so they can check the user's age/DOB and restrict young users from viewing the page (though I'm not 100% certain about this). I've forked this repo and modified the code so that I can provide my login information to log the session in before scraping so that I can scrape these restricted pages. If I submit a pull request to add this feature, is this something people would be interested in? I am admittedly not a professional software developer and may not implement the feature in the most optimal way, but I have it working and would be glad to try and share if folks are interested.

time is not properly formatted from posts into into posts for groups

Time for posts from the current year do not include the current year in the post. They are in the format of May 29 at 1:35

The code is expecting you provide the year.
Second issue is for recent post they are relative to "current" time. They show up as 2 mins or 4 hrs
I am fixing it locally on my version, I am not sure if you accept merge requests etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.