GithubHelp home page GithubHelp logo

shaikhsajid1111 / facebook_page_scraper Goto Github PK

View Code? Open in Web Editor NEW
201.0 6.0 60.0 147 KB

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

Home Page: https://pypi.org/project/facebook-page-scraper/

License: MIT License

Python 100.00%
facebook-scraper facebook-page facebook-page-scraper facebook-page-post web-scraping web-scraper facebook csv python selenium

facebook_page_scraper's People

Contributors

edmond7450 avatar lrudolph333 avatar peter279k avatar shaikhsajid1111 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

facebook_page_scraper's Issues

Get Comments content

Dear,
There is the possibility to get the content of the comments of the posts? Because in your code you only get the number of comment (I have to change the id of the list to 1 instead of 0 in the method that gets this number) but I would like to know if from post is possible to extract the text and reaction of a comment. You have some advice?

Why I have only one post ?

Hello and thank you for this tool.

This script seems to work good but I receive only one post ?

json_data = meta_ai.scrap_to_json()
print(json_data)

Can you explain me how can I make to receive all the posts from one facebook profile?

Many Thanks

"posted_on": "Failed to fetch!"

I’m getting this error "posted_on": "Failed to fetch!" on specific posts. Other posts in the same feed are fine. Any advice?

Error message when attempting to install-- Please Help a Qualitative Graduate Researcher

Hello,

I am new to coding, especially using Python. For my dissertation, I want to pull Tweets and Facebook page data from various organizations. I had no issues eventually figuring out how to install and run twitter-scraper, but I need help getting the Facebook scraper to install. Every time I run pip install facebook-scraper or pip install git+https://github.com/kevinzg/facebook-scraper.git, I get an error message about either an invalid syntax error or no parent package being present once I attempt to address the code. At one point, I was able to run # !pip install git+https://github.com/kevinzg/facebook-scraper.git, which did not result in an error code but also didn't install anything. This is the code I used to install the Twitter scraper, so I thought it was worth a shot. I am using the latest (free) versions of Python and PyCharm on Mac.

Thanks in advance for any insight!

posts_count bigger than 19 results in only 19 scraped posts

Hi,

When I want to scrape the last 100 posts on a Facebook page:

facebook_ai = Facebook_scraper("facebookai",100,"chrome")
json_data = facebook_ai.scrap_to_json()
print(json_data)

Only 19 posts are scraped. I tried with other pages too, the same result.

Any ideas what goes wrong?

The csv does not reflect the required number of posts.

Hello, first of all thank you for this very useful code :)
I tried to run it requesting 100 posts instead of 10
posts_count =100
and everything runs perfect, but when I open the csv only 5 to 10randomly posts appear, [I have run it several times and the result is random (same posts but different total number each time) but never reaches the 100 required].

No posts were found!

Hey! Thanks for your script.
But I was trying to run your example and get the 'no posts were found' error.
Is it because of the new layout?
Thanks!

Facebook Login page popup, --- facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

Here is my result running this code:

[WDM] - Current google-chrome version is 108.0.5359
[WDM] - Get LATEST driver version for 108.0.5359

[WDM] - Driver [C:\Users\Zoey.wdm\drivers\chromedriver\win32\108.0.5359.71\chromedriver.exe] found in cache
2023-01-06 22:06:59,049 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

--
I use proxy in the US in the code.

After the first three lines of results, the facebook login page popup in chrome. Then a few seconds of timeout, it shows no posts were found

Scrapping multiple page_name

I want to scrap multiple page names, I define a list of page name on a variable

page_names = ["pagename1", "pagename2", "pagename3"]

and then I iterate Facebook_scraper through multiple page_names with this code

`results=[]
posts_count = 2
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True

for name in page_names:
meta_ai = Facebook_scraper(name, posts_count, browser,proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
results.append(json_data)
print(results)`

The first iteration was successful, but the next iteration still scrapes the first page_names. Is this because every iteration the cache was not deleted, or is there any approach that I can reach my objective?

selenium.common.exceptions.ElementClickInterceptedException: Message: Element is not clickable at point ([x],[y]) because another element <div class=> obscures it

I am using Firefox as a Browser.

When trying to connect with the Facebook Page, I sometimes face the Error mentioned in the Issue title.

This IS related to the Cookie Banner. However, searching the Internet, I found the following Link: https://proxyway.com/guides/how-to-scrape-facebook that gives some advice for adding code to the driver_utilities.py

The weird part is, that adding the following code to the py module helps - but only sometimes:

allow_span = driver.find_element(
    By.XPATH, '//div[contains(@aria-label, "Allow")]/../following-sibling::div')
allow_span.click()

Im not sure if someone can reproduce this weird behavior

Allow providing driver's executable_path explicitly without going through .install() method

I have a funny env: WSLv1 Linux on Windows. Python runs inside Linux emulation, while Chrome and chromedriver.exe are running on Windows. I have a symlink /usr/bin/chromedriver pointing to chromedriver.exe. This all works out well, but the automatic driver installer may get confused. So having an option to specify driver's executable_path explicitly to Facebook_scraper instance would be nice! Thanks! (when I patch Initializer manually, all works well!)

No results are obtained in new facebook page template

Hello, in pages that have migrated to the new template, it is not possible to recover the posts.
In pages that keep the old template, it works without problem.

Do you plan to support the new facebook template in the future?

Script scraps incomplete facebook posts

When I scrape longer messages, the script scrapes only the visible part of the fb post and not what is also under "see more". The end result message is always a message ending with string 'see more'.

About parsing the json file

Hi,
I am testing your nice project.... and after getting the JSON file, i am wondering what is the "Key" of the hole JSON file? the "Key" of the "Values" which are the different collected data, because i want to parse it into a flutter app

e.g: which i would like to mention by "Key" is like the following JSON file, the "Key" here is "items"

{ "items": [ { "id": "p1", "name": "Item 1", "description": "Description 1" }, { "id": "p2", "name": "Item 2", "description": "Description 2" }, { "id": "p3", "name": "Item 3", "description": "Description 3" } ] }

I hope that you've understand my request, and thank you in advance

AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

Hello,

When I try to import :

from facebook_page_scraper import Facebook_scraper

I get the following error :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mpl/Desktop/FaceBookPagesScraper/facebook_page_scraper/facebook_page_scraper/__init__.py", line 1, in <module>
    from .driver_initialization import Initializer
  File "/Users/mpl/Desktop/FaceBookPagesScraper/facebook_page_scraper/facebook_page_scraper/driver_initialization.py", line 3, in <module>
    from seleniumwire import webdriver
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/webdriver.py", line 13, in <module>
    from seleniumwire import backend
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/server.py", line 4, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/handler.py", line 5, in <module>
    from seleniumwire import har
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/thirdparty/mitmproxy/connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

I have tried downgrading to PyOpenSSL==22.0.0.0 didn't resolve the issue

Only likes and loves are scraped properly

Hi,

Great tool, congrats! I am using the following code:

from facebook_page_scraper import Facebook_scraper
import os
import stem.process
SOCKS_PORT = 9050
TOR_PATH = os.path.normpath(os.getcwd()+"\\Tor\\tor\\tor.exe")
tor_process = stem.process.launch_tor_with_config(
  config = {
    'SocksPort': str(SOCKS_PORT),
  },
  init_msg_handler = lambda line: print(line) if re.search('Bootstrapped', line) else False,
  tor_cmd = TOR_PATH
)
page_name = "metaai"
posts_count = 10
browser = "firefox"
proxy = "socks5://127.0.0.1:9050"
timeout = 600
headless = False
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
print(json_data)

Numbers of likes and loves are correct, shares and other reactions seem to be always zero, as for number of comments I am getting a different (lower) number.

Same without proxy.

I am using the tool from Europe with language set to English(UK), although I am not sure about the correct way to select language without using authentication.

image

I would appreciate any advice you may have for me.

ModuleNotFoundError: No module named 'facebook_page_scraper'

Hola, realice cada uno de los pasos descriptos pero me aparece este error, no se que estaré haciendo mal, alguna sugerencia?
comparto el archivo que ejecuto
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "Turismogsm"
posts_count = 2
browser = "chrome"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json()
print(json_data)

#filename = "extraccionminpei" #file name without CSV extension,where data will be saved
#directory = "C:\Users\cgpal\Desktop\Julieta\IPECD\Web_scraping" #directory where CSV file will be saved
#meta_ai.scrap_to_csv(filename, directory)

Facing some webdriver exceptions

inside facebook_page_scraper i have runned setup.py as per instrcutions.
when iam trying to run the same example iam getting an error:
[WDM] - Driver [/root/.wdm/drivers/geckodriver/linux64/v0.29.0/geckodriver] found in cache
Traceback (most recent call last):
File "face.py", line 8, in
json_data = facebook_ai.scrap_to_json()..

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

can anyone figure out this error and provide me running example .

Running the library in Linux Debian 8.11

I have a server that build with os Debian 8.11 (jessie). When I try to run the code, i get this error

[WDM] - Driver [/home/cucakrowo/.wdm/drivers/geckodriver/linux64/v0.32.1/geckodriver] found in cache 2023-02-06 18:20:25,350 - facebook_page_scraper.scraper - ERROR - Error at scrap_to_csv : Message: Process unexpectedly closed with status 1 Traceback (most recent call last): File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 151, in scrap_to_csv data = self.scrap_to_json() # get the data in JSON format from the same class method File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 80, in scrap_to_json self.__start_driver() File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 60, in __start_driver self.browser, self.proxy, self.headless, self.browser_profile).init() File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/driver_initialization.py", line 90, in init driver = self.set_driver_for_browser(self.browser_name) File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/driver_initialization.py", line 83, in set_driver_for_browser return webdriver.Firefox(executable_path=GeckoDriverManager().install(), options=self.set_properties(browser_option)) File "/home/cucakrowo/.local/lib/python3.7/site-packages/seleniumwire/webdriver.py", line 75, in __init__ super().__init__(*args, **kwargs) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 183, in __init__ keep_alive=True) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in __init__ self.start_session(capabilities, browser_profile) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute self.error_handler.check_response(response) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

Is this because this library is not compatible with my OS, or are there any configuration that I have to set??

scarping posts by date

my question is "is it possible to scrap posts posted on a date given for example on posts posted on 24/04/2023"?

Login In

i need scraper from my account in a facebook How??

Implement login

I'm running into some problems that I fast run into a login wall and therefore can't scrape much more.
Is it possible to implement a login function? ie. something like

facebook.scrap_to_json(credentials = {email: email, pass: pass})

Failure when calling scrap_to_json()

Error:

[WDM] - Current google-chrome version is 112.0.5615
[WDM] - Get LATEST driver version for 112.0.5615
[WDM] - Driver [***] found in cache

DevTools listening on ws://127.0.0.1:56478/devtools/browser/e6c8b2dc-2403-490c-9eeb-36b531efcdea
127.0.0.1:56489: Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\server.py", line 113, in handle
    root_layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\modes\http_proxy.py", line 9, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 285, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http1.py", line 100, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 304, in _process_flow
    return self.handle_regular_connect(f)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 223, in handle_regular_connect
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 276, in create_client_context
    context = _create_ssl_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 163, in _create_ssl_context
    context = SSL.Context(method)
  File "C:\Python310\lib\site-packages\OpenSSL\SSL.py", line 674, in __init__
    res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

[0422/154849.820:ERROR:ssl_client_socket_impl.cc(992)] handshake failed; returned -1, SSL error code 1, net_error -100
127.0.0.1:56492: Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\server.py", line 113, in handle
    root_layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\modes\http_proxy.py", line 9, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 285, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http1.py", line 100, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 304, in _process_flow
    return self.handle_regular_connect(f)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 223, in handle_regular_connect
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 276, in create_client_context
    context = _create_ssl_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 163, in _create_ssl_context
    context = SSL.Context(method)
  File "C:\Python310\lib\site-packages\OpenSSL\SSL.py", line 674, in __init__
    res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

[0422/154849.935:ERROR:ssl_client_socket_impl.cc(992)] handshake failed; returned -1, SSL error code 1, net_error -100
2023-04-22 15:48:50,068 - facebook_page_scraper.scraper - ERROR - Error at scrap_to_csv : Message: unknown error: net::ERR_CONNECTION_CLOSED
  (Session info: headless chrome=112.0.5615.138)
Stacktrace:
Backtrace:
        GetHandleVerifier [0x00B8DCE3+50899]
        (No symbol) [0x00B1E111]
        (No symbol) [0x00A25588]
        (No symbol) [0x00A21D87]
        (No symbol) [0x00A18B45]
        (No symbol) [0x00A19B1A]
        (No symbol) [0x00A18E20]
        (No symbol) [0x00A18275]
        (No symbol) [0x00A1820C]
        (No symbol) [0x00A16F06]
        (No symbol) [0x00A17668]
        (No symbol) [0x00A26D22]
        (No symbol) [0x00A7E631]
        (No symbol) [0x00A6B8FC]
        (No symbol) [0x00A7E01C]
        (No symbol) [0x00A6B6F6]
        (No symbol) [0x00A47708]
        (No symbol) [0x00A4886D]
        GetHandleVerifier [0x00DF3EAE+2566302]
        GetHandleVerifier [0x00E292B1+2784417]
        GetHandleVerifier [0x00E2327C+2759788]
        GetHandleVerifier [0x00C25740+672048]
        (No symbol) [0x00B28872]
        (No symbol) [0x00B241C8]
        (No symbol) [0x00B242AB]
        (No symbol) [0x00B171B7]
        BaseThreadInitThunk [0x75627D49+25]
        RtlInitializeExceptionChain [0x7781B74B+107]
        RtlClearBits [0x7781B6CF+191]
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\facebook_page_scraper\scraper.py", line 151, in scrap_to_csv
    data = self.scrap_to_json()  # get the data in JSON format from the same class method
  File "C:\Python310\lib\site-packages\facebook_page_scraper\scraper.py", line 83, in scrap_to_json
    self.__driver.get(self.URL)
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 436, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: net::ERR_CONNECTION_CLOSED
  (Session info: headless chrome=112.0.5615.138)
Stacktrace:
Backtrace:
        GetHandleVerifier [0x00B8DCE3+50899]
        (No symbol) [0x00B1E111]
        (No symbol) [0x00A25588]
        (No symbol) [0x00A21D87]
        (No symbol) [0x00A18B45]
        (No symbol) [0x00A19B1A]
        (No symbol) [0x00A18E20]
        (No symbol) [0x00A18275]
        (No symbol) [0x00A1820C]
        (No symbol) [0x00A16F06]
        (No symbol) [0x00A17668]
        (No symbol) [0x00A26D22]
        (No symbol) [0x00A7E631]
        (No symbol) [0x00A6B8FC]
        (No symbol) [0x00A7E01C]
        (No symbol) [0x00A6B6F6]
        (No symbol) [0x00A47708]
        (No symbol) [0x00A4886D]
        GetHandleVerifier [0x00DF3EAE+2566302]
        GetHandleVerifier [0x00E292B1+2784417]
        GetHandleVerifier [0x00E2327C+2759788]
        GetHandleVerifier [0x00C25740+672048]
        (No symbol) [0x00B28872]
        (No symbol) [0x00B241C8]
        (No symbol) [0x00B242AB]
        (No symbol) [0x00B171B7]
        BaseThreadInitThunk [0x75627D49+25]
        RtlInitializeExceptionChain [0x7781B74B+107]
        RtlClearBits [0x7781B6CF+191]
        ```
        Code:
        ```
        from facebook_page_scraper import Facebook_scraper
page_name = "metaai"
posts_count = 10
browser = "chrome"
proxy = "IP:PORT"
timeout = 10 
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
print(json_data)

Packages:
pyOpenSSL         21.0.0
selenium              4.1.0
cryptography       38.0.4

Im not getting any reaction

Hello,
I am having this problem when I bring information from a page, in all cases it happens that the reactions are all zero, Do you have any advice about this issue?

Thanks for all your work!

update number of reactions and comments

hello we all know that Number of reactions and comments is updated everyday so does facebook_page_scraper offer this possibility to update .
and my second question is can we scrap comments as a text with the name of the user who commented?

thanks

selenium.common.exceptions.SessionNotCreatedException - Issues with GeckoDriver

Hi, thanks for the codes! Was trying to test this package out and faced this issue when running the sample codes in the README. I am not very familiar with web scraping but seems to be some issue with GeckoDriver, not too sure on why this error is popping out. Besides that, I would like to also check if this package works with Facebook groups (not pages). Please help!

>>> json_data = facebook_ai.scrap_to_json()
[WDM] - Driver [C:\Users\Jiayi\.wdm\drivers\geckodriver\win64\v0.29.0\geckodriver.exe] found in cache
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\scraper.py", line 56, in scrap_to_json
    self.__start_driver()
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\scraper.py", line 52, in __start_driver
    self.__driver = Initializer(self.browser).init()
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\driver_initialization.py", line 48, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\driver_initialization.py", line 42, in set_driver_for_browser
    return webdriver.Firefox(executable_path=GeckoDriverManager().install(),options=self.set_properties(browser_option))
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 170, in __init__
    RemoteWebDriver.__init__(
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response       
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line

No module named 'seleniumwire' error

I am encountering the error shown below when trying to execute the driver_initialization.py file. I have tried installing both selenium and seleniumwire using pip:

!pip install seleniumwire
!pip install selenium

But I am still unable to resolve this error

#import Facebook_scraper class from facebook_page_scraper
----> 2 from facebook_page_scraper import Facebook_scraper
3
4 #instantiate the Facebook_scraper class
5

1 frames

/content/facebook_page_scraper/facebook_page_scraper/driver_initialization.py in
1 #!/usr/bin/env python3
2
----> 3 from seleniumwire import webdriver
4 # to add capabilities for chrome and firefox, import their Options with different aliases
5 from selenium.webdriver.chrome.options import Options as ChromeOptions

ModuleNotFoundError: No module named 'seleniumwire'

issue with geckodriver when running inside a dockerized python app

I work with a fast api application this worked with me on the local but when tried to "dockerize" the app got this exception

  raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command li

It seems to be an exception related to the path maybe of geckodriver

TypeError: __init__() got an unexpected keyword argument 'proxy'

Hi guys, im trying to work with a proxy.
And i get this error:
TypeError: __init__() got an unexpected keyword argument 'proxy'

my setting:

page_name = "FacebookAI"
posts_count = 25
browser = "chrome"
proxy = "proxyIP:9999" #if proxy requires authentication then user:password@IP:PORT
facebook_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy)

Getting Data of META AI

I am changing page name but still I am getting data of META AI. Could you please resolve this issue.

Does not scrape reactions

Hi I tried using your scraper,

However it does not seem to accurately scrape the reactions (the emoticons).

It does show up as a key, but the value is just always zero, where on the Facebook it does have the reactions.

List index out of range

I have followed the github documenation and nothing more to get posts, and I am encountering this error:

File "\facebook_page_scraper-4.0.1-py3.11.egg\facebook_page_scraper\element_finder.py", line 374, in __accept_cookies
button[-1].click()
~~~~~~^^^^
IndexError: list index out of range

With firefox as the browser.

README.MD FileNotFound on install

When installing 0.1.8 either through pip3 or setup.py it fails with a FileNotFound on "README.MD"

Workaround - rename README.md to README.MD.

delete first json's element

the result i get after scraping is this;
[{"1766004853797710": {"username": "2M.ma", "shares": 0, "likecount": 101, "replycount": 0,
i want to remove the status wich is in this case 1766004853797710 and put it in this way "id_post":1766004853797710 next to username likecount..

error at find_elements method : local variable 'status_link' referenced before assignment

I just started getting this error message today and can't find a way around it. I need the data for my research. Any help will be really appreciated.

I ran the code below
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "metaai"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout)

SSL_CTX_set_ecdh_auto Error

I'm trying to get the example in the README to work. I first encountered the issue with SSL and downgraded to version 21.0.0. I then ran the example code with the only change to the browser type. I used "chrome" instead of "firefox". I got the error below. I'm on a Mac (OS 10.15.7), Intel hardware, running Python 3.11 with the following packages installed:

async-generator 1.10
attrs 22.1.0
beautifulsoup4 4.11.1
blinker 1.5
certifi 2022.9.24
cffi 1.15.1
charset-normalizer 2.1.1
colorama 0.4.6
configparser 5.3.0
crayons 0.4.0
cryptography 38.0.3
facebook-page-scraper 4.0.1
google 3.0.0
h11 0.14.0
h2 4.1.0
hpack 4.0.0
html5lib 1.1
hyperframe 6.0.1
idna 3.4
kaitaistruct 0.10
outcome 1.2.0
pip 22.3.1
pyasn1 0.4.8
pycparser 2.21
pyOpenSSL 21.0.0
pyparsing 3.0.9
PySocks 1.7.1
python-dateutil 2.8.2
requests 2.28.1
selenium 4.1.0
selenium-wire 4.3.1
setuptools 65.5.0
six 1.16.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.3.2.post1
termcolor 2.1.0
trio 0.22.0
trio-websocket 0.9.2
urllib3 1.26.12
urllib3-secure-extra 0.1.0
webdriver-manager 3.2.2
webencodings 0.5.1
wsproto 1.2.0

Also, I'm trying to scrape information from the top-level page, specifically the email address. Can this library do that?

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Get LATEST driver version for 107.0.5304
[WDM] - There is no [mac64] chromedriver for browser 107.0.5304 in cache
[WDM] - Get LATEST driver version for 107.0.5304
[WDM] - Trying to download new driver from http://chromedriver.storage.googleapis.com/107.0.5304.62/chromedriver_mac64.zip
[WDM] - Driver has been saved in cache [/Users/bwright/.wdm/drivers/chromedriver/mac64/107.0.5304.62]
127.0.0.1:64896: Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 113, in handle
root_layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/modes/http_proxy.py", line 9, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 285, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http1.py", line 100, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 205, in call
if not self._process_flow(flow):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 304, in _process_flow
return self.handle_regular_connect(f)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 223, in handle_regular_connect
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 278, in call
self._establish_tls_with_client_and_server()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 358, in _establish_tls_with_client_and_server
self._establish_tls_with_server()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 445, in _establish_tls_with_server
self.server_conn.establish_tls(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 290, in establish_tls
self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 382, in convert_to_tls
context = tls.create_client_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 276, in create_client_context
context = _create_ssl_context(
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 163, in _create_ssl_context
context = SSL.Context(method)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/OpenSSL/SSL.py", line 674, in init
res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

PHP version?

Without going too much into the background, Facebook have disabled the 'Like' button plugin (for 3rd party websites) in Europe except for users who are logged in and have consented to the relevant cookies.

After two years, Facebook has failed to come up with an alternative (such as a simple link showing the number of followers, as Twitter has).

Small businesses need to use social media to keep apace. A simple 'Like on Facebook' button showing the number of followers/likes is all they need. But Facebook has taken that away out of mindless self-interest (probably disgruntlement at the court rulings, and perhaps whilst continuing to collect data illegally). They do however provide a 'brand asset pack' which, in conjunction with your scraper, could be used to recreate the same, with the bonus of not leaking information to Facebook.

However, you've used Python, which is not so convenient to incorporate into a web application, particularly portably as a library. Would it be easy to port the Python code to PHP?

No posts were found - with newest version

Hey,

I get this error:
2022-11-07 20:10:22,447 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

running: python3 posts.py

posts.py file (unchanged from readme.md suggestion):
**#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "metaai"
posts_count = 10
browser = "chrome"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json()
print(json_data)
**

Tries master and 4.x branches with the same result. Checked the code - I can't find that css selector on the facebook by myself as well, so will this be working, or did facebook change everything ? Thanks

At Import: AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'

When trying to import the module, I get the following:

>>> from facebook_page_scraper import Facebook_scraper as fbscrape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/USER/.local/lib/python3.9/site-packages/facebook_page_scraper/__init__.py", line 1, in <module>
    from .driver_initialization import Initializer
  File "/home/USER/.local/lib/python3.9/site-packages/facebook_page_scraper/driver_initialization.py", line 3, in <module>
    from seleniumwire import webdriver
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/webdriver.py", line 13, in <module>
    from seleniumwire import backend
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/server.py", line 4, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/handler.py", line 5, in <module>
    from seleniumwire import har
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'

Is this a known issue? Searching the internet, I didnt found any useful solutions...

Thx in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.