GithubHelp home page GithubHelp logo

uschtwill / re-employment-kraken Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 1.0 1.96 MB

re-employment-kraken scrapes (job) sites, remembers what it saw and notifies downstream systems of any new sightings.

JavaScript 98.89% Dockerfile 1.11%
crawling javascript scraping

re-employment-kraken's People

Contributors

isensee-bastian avatar uschtwill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

isensee-bastian

re-employment-kraken's Issues

Add Telegram bot notification strategy

I suggest Telegram as an alternative way of receiving project alerts for users who are not already using Slack or Notion. The process is quite simple and straight-forward:

  • Pick a library providing Telegram bot API access.
  • Add config settings for enabling / disabling, Telegram user ID and Telegram bot token.
  • Use Telegram bot client to send new projects to configured user.
  • Document how to activate Telegram support in README file.

Note: I will provide a pull request for this as I am interested in using this feature.

Handle '429 too many requests' error gracefully when scraping jobs

How to reproduce:

  • Configure a search term with many results like java
  • Configure plugins freelance-de and freelancer-map-de (could happen with other strategies, too)
  • Use a fresh DB by either removing the old one, changing the path or setting DATABASE_ENABLED to false

Observed error:

An unhandled Promise rejection occurred: {
  ok: false,
  error_code: 429,
  description: 'Too Many Requests: retry after 4',
  parameters: { retry_after: 4 }
}

This happens many times. We should think about adding a backoff delay or some kind of rate limiting to prevent this errors. Right now, we have an unhandled promise rejection handler in index.js to help with debugging.

Build: Enable running as a Docker container

When deploying the application to a server, using a container is preferable since it brings all the required dependencies and runs in isolation.

For this purpose, I suggest the following changes:

  • Provide a Dockerfile for packaging the application as a container image.
  • Pass .env and database files with Docker volume to preserve them between application runs.
  • Provide a Docker compose file as a simple option for running the container with volumes using default paths.
  • Document how to build and run the application as a container.

I will provide a pull request for this suggestion soon.

Performance: Replace text file persistence with a real database

First things first: Nice project, thanks for your effort!

I have seen that the current persistence solution is using plain text files and checking for substrings. While this is good enough for a prototype, it will become a performance issue at some point, given the application is used long enough.

Therefore, I suggest to:

  • Include a lightweight embedded database like SQLite.
  • Store visited projects in a simple table and set an index on the URL column.
  • Retrieve known projects from the database table for determining known projects.
  • Remove the text file solution.

If we want to be extra convenient, we could add a migration script for moving existing text file content into the database. However, this might be over-the-top.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.