GithubHelp home page GithubHelp logo

miningape / huntress Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 313 KB

A job scheduler with materialise and webscraping jobs - currently being used to scan Copenhagen's rental market prices

JavaScript 1.54% TypeScript 94.39% Dockerfile 4.07%

huntress's Introduction

Huntress

Highly scalable job scheduler that can easily be configured to run complex workflows. Currently the "frontend" is a postgres database however complex tasks can still easily be configured. Jobs can be "orchestrated" meaning that they are able to wait for another job to complete before running, this allows for complex flows of data where the order of completion is not necessarily known.

Uses JSON job definitions to determine type of job, schedule as well as other necessary parameters. Uses a highly scalable microservice architecture so it can be scaled to massive workloads.

  • Job Scheduler
    • Materialised Views
    • Web Scraper
  • Orchestrate jobs

Dockerfile is finicky! This is because chromium (used by pipeline-worker / webscraper module) and docker do not play nicely and exposes odd quirks where the underlying architecture (e.g. ARM) can determine if the app can run or not.

Roadmap:

  • orchestration
  • "jobs" / non pipeline
    • Made workers (see below)
  • notifier
  • break into many services
  • scan for specific conditions
  • scan other websites
  • remove dead listings
  • frontend to see statuses / manage jobs
  • Cycle detection (make sure orchestrated jobs do not have an infinite run time - there should always be a start and end and no loops)

Planned Workers:

  • pipeline (streaming data: source -> destination)
  • materialise (refresh complex and large tables in postgres as needed)
  • notify
  • analyse (ai / analytics to find desired data)
  • generic (point at a docker container online)

Planned Integrations For pipeline:

  • Files (pipeline source / destination)
  • Postgres (pipeline source / destination)
  • BoligPortal.dk (pipeline source)
  • BoligZonen.dk (pipeline source)
  • FindBoliger.dk (pipeline source)
  • Generic / simple scraper for anything

Possible Ideas:

  • refactor materialise to generic job
  • docker worker (reads any git repo and runs the dockerfile)
  • [ ]

huntress's People

Contributors

miningape avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.