GithubHelp home page GithubHelp logo

web-scraping's Introduction

Python Web Scraping

Install Dependencies

python3 packages
pip3 install Pillow requests beautifulsoup4 selenium webdriver-manager opencv-python numpy
pip3 install pytest pytest-cov pytest-xdist pytest-html
# or
pip3 install -r requirements.txt

The pytest command installed by pip3 may not be in the system's PATH, so the terminal can't locate the command.
This can be resolved using the following approach.

vim ~/.bashrc
# add this line to .bashrc
# export PATH=$PATH:$HOME/.local/bin
source ~/.bashrc
Chrome on Ubuntu-22.04
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install gdebi-core
sudo gdebi google-chrome-stable_current_amd64.deb

# test run
google-chrome

Unit Test

pytest
pytest -v
pytest -vs
pytest --cov # test coverage report
pytest -v -n auto # distributed testing
pytest -v --html=report.html --self-contained-html # generating HTML report

Run

python3 run_scraper.py

Build executable file

  • Installation:

    • pip3 install pyinstaller
  • Build

    • pyinstaller -F run_scraper.py
      • -F, --onefile Create a one-file bundled executable.
    • output:
      • ubuntu: dist/run_scraper
      • windows: dist/run_scraper.exe
  • Notice

    • Please ensure that the executable file is placed alongside the font file(SourceSerifPro-SemiBold.ttf) during execution.
      app_folder/
      ├── run_scraper
      └── SourceSerifPro-SemiBold.ttf

web-scraping's People

Contributors

asli18 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.