GithubHelp home page GithubHelp logo

oxylabs / seo-monitoring Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 72 KB

Tutorial for building SEO Monitoring System using Python, Celery, and a SERP Scraper API

License: MIT License

Dockerfile 4.62% Python 95.38%
celery python redis seo seo-monitoring serp-api github-python screen-scraping google-search-api seo-tools

seo-monitoring's Introduction

Scraping Experts - Building SEO Monitoring System using Python, Celery, and a SERP Scraper API

Oxylabs promo code

Video

Building an SEO Monitoring System with Python, Celery, and SERP Scraper API

Abstract

Schema

This solution is based on the data engineering principles of data ingestion and processing with a combination of remote calls for data enrichment.

The features are as follows:

  • Accepts CSV or XLSX files as an input for keyword SERP scraping
  • Moves input file to different directory after it was processed
  • Cleans the input keywords and prepares them to be submitted to the Oxylabs SERP Scraper API
  • Uses Celery to produce parallel requests to the SERP Scraper API (refer docker-compose for --autoscale parameter use)
  • Aggregates the responses in exact-same order as they were submitted to the Celery worker as a task
  • Retry & timeout added for the Celery tasks
  • Authenticates each request to the SERP Scraping API
  • Produces a new output file (CSV or XLSX) with the results from the SERP Scraper API
  • Continuously watches for a new input file to be added for processing

Installation

This project uses Python 3.10.x version and runs on virtual environment (venv), therefore make sure that the Python installation on your local system exists.

Credentials and configuration

To properly configure the application, copy-rename bundled dist.env to .env and update the parameters as needed (refer the docs at Oxylabs SERP Scraper API docs):

SERP configuration

Local directories and file watcher poll (using seconds)

  • INPUT_KEYWORDS="./input" (Where keyword input file will be put)
  • INPUT_PROCESSED="./input/processed" (Where processed keyword input file will be put)
  • OUTPUT_KEYWORDS="./output" (Where result output file will be put)
  • OUTPUT_FILE_TYPE=xlsx (What OUTPUT file type to use [CSV/XLSX])
  • OUTPUT_FILE_NAME=keywords_serps (What name to use for OUTPUT file)
  • INPUT_POLL_TIME=5 (How many seconds to wait before checking for new input files)

SERP Scraper API authentication

  • OXY_SERPS_AUTH_USERNAME=XXXXX
  • OXY_SERPS_AUTH_PASSWORD=YYYYY

Local (Mac)

  1. Checkout the scraping-experts-seo-monitoring source
  2. Run: cd scraping-experts-seo-monitoring
  3. Run: python3.10 -m venv venv
  4. Run: source venv/bin/activate
  5. Run: pip install --upgrade pip wheel setuptools
  6. Run: pip install -r requirements.txt

Additionally, it is required to download internal python library artefacts to use the word tokenizer. To do this, after the project was installed, follow:

  1. Run: cd scraping-experts-seo-monitoring
  2. Run: source venv/bin/activate
  3. Run: python (you will be prompted with Python CLI)
  4. Run: import nltk; nltk.download('punkt')
  5. Run: import nltk; nltk.download('stopwords')
  6. Use CTRL+D to exit the Python CLI

Now you should be able to develop the project locally in your favourite IDE.

Docker (using Docker Compose)

  1. Checkout the scraping-experts-seo-monitoring source
  2. Run: cd scraping-experts-seo-monitoring
  3. Run: docker-compose build
  4. Run: docker-compose up -d --scale worker=5 && docker-compose logs -f
  5. To stop the services running, exit the log watch mode with CTRL+C and run docker-compose down

INPUT file

The input keywords file must be placed at the root of /input directory, where the Python application will scan for new files and as soon as it finds (INPUT_POLL_TIME) the file it starts to process.

The application expects the XLSX file (or CSV) to have a following format:

XLSX

Keyword
sample1
sample2
other

CSV (with header)

keyword
sample1
sample2
other

seo-monitoring's People

Contributors

augustoxy avatar oxyjohan avatar oxylabsorg avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.