This script allows you to extract links from a website by providing its URL. It utilizes the requests
and bs4
libraries to retrieve the source code of the website and parse it to extract the URLs.
-
Import the required libraries:
import requests import bs4
-
Prompt the user for input:
url = input("Enter the URL (starting with https://): ")
The user needs to provide the URL of the website from which they want to extract the links.
-
Define the
extract_links
function:def extract_links(url): site_data = requests.get(url) data = bs4.BeautifulSoup(site_data.text, "lxml") links = data.find_all('a') for link in links: print(link['href'])
This function takes a URL as input, retrieves the source code of the website using
requests.get()
, and then parses it usingBeautifulSoup
frombs4
. It selects all the<a>
tags in the website and prints the URLs found in theirhref
attributes. -
Call the
extract_links
function with the provided URL:extract_links(url)
This will execute the function and print the extracted URLs.
Before running the script, make sure you have the following libraries installed:
- requests
- bs4
You can install them by running the following command:
pip install requests bs4
This script was created by Vinay Ghate.