This project involves scraping information from GitHub's topics, retrieving repositories within those topics, and creating separate CSV files for each topic with relevant data.
The project aims to automate the process of collecting repository information from GitHub topics. The main steps of the project are:
- Scraping the GitHub Topics page to get a list of topics.
- For each topic, gathering topic title, URL, and description.
- For each topic, collecting the top 25 repositories from the topic page.
- Extracting repository name, URL, username, and star count for each repository.
- Generating separate CSV files for each topic with repository information.
To use this project:
- Clone the repository to your local machine.
- Install the required libraries listed in the
requirements.txt
file usingpip install -r requirements.txt
. - Run the Python script to scrape GitHub topics and repositories.
- Run the script
scrape_github_topics.py
to initiate the scraping process. - The script will create a CSV file for each topic in the
output
directory. - Each CSV file will contain repository information with columns: "Repo Name", "Username", "Stars", "Repo URL".
- Python - The programming language used for the project.
- Beautiful Soup - Python library for web scraping.
- Requests - Python library for making HTTP requests.
Contributions are welcome! If you have suggestions, improvements, or bug fixes, feel free to open an issue or a pull request.
This project is licensed under the MIT License.
Special thanks to the contributors and libraries that made this project possible.