GithubHelp home page GithubHelp logo

mrc-ide / rrq Goto Github PK

View Code? Open in Web Editor NEW
23.0 6.0 4.0 2.88 MB

:runner::runner::runner: Lightweight Redis queues

Home Page: https://mrc-ide.github.io/rrq/

License: Other

Makefile 0.48% R 98.76% Shell 0.58% Dockerfile 0.18%
infrastructure cluster

rrq's Introduction

rrq

Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. R-CMD-check codecov.io CodeFactor

Task queues for R, implemented using Redis.

Getting started

library(rrq)

Create an rrq_controller object

obj <- rrq_controller("rrq:readme")
rrq_default_controller_set(obj)

Submit work to the queue:

t <- rrq_task_create_expr(runif(10))
t
#> [1] "0e339d67495c33cde07d5c9784b24f31"

Query task process:

rrq_task_status(t)
#> [1] "PENDING"

Run tasks on workers in the background

rrq_worker_spawn()
#> ℹ Spawning 1 worker with prefix 'bossy_americanredsquirrel'
#> <rrq_worker_manager>
#>   Public:
#>     clone: function (deep = FALSE)
#>     id: bossy_americanredsquirrel_1
#>     initialize: function (controller, n, logdir = NULL, name_config = "localhost",
#>     is_alive: function (worker_id = NULL)
#>     kill: function (worker_id = NULL)
#>     logs: function (worker_id)
#>     stop: function (worker_id = NULL, ...)
#>     wait_alive: function (timeout, time_poll = 0.2, progress = NULL)
#>   Private:
#>     check_worker_id: function (worker_id)
#>     controller: rrq_controller
#>     logfile: /tmp/RtmpReX932/file920c215c4c48c/bossy_americanredsquir ...
#>     process: list
#>     worker_id_base: bossy_americanredsquirrel

Wait for tasks to complete

rrq_task_wait(t)
#> [1] TRUE

Retrieve results from a task

rrq_task_result(t)
#>  [1] 0.5660174 0.7272429 0.3753210 0.4081318 0.5124689 0.3461631 0.3997216
#>  [8] 0.7212593 0.9927757 0.3032974

Query what workers have done

rrq_worker_log_tail(n = Inf)
#>                     worker_id child       time       command
#> 1 bossy_americanredsquirrel_1    NA 1713860205         ALIVE
#> 2 bossy_americanredsquirrel_1    NA 1713860205         ENVIR
#> 3 bossy_americanredsquirrel_1    NA 1713860205         QUEUE
#> 4 bossy_americanredsquirrel_1    NA 1713860205    TASK_START
#> 5 bossy_americanredsquirrel_1    NA 1713860205 TASK_COMPLETE
#>                            message
#> 1
#> 2                              new
#> 3                          default
#> 4 0e339d67495c33cde07d5c9784b24f31
#> 5 0e339d67495c33cde07d5c9784b24f31

For more information, see vignette("rrq")

Installation

Install from the mrc-ide package repository:

drat:::add("mrc-ide")
install.packages("rrq")

Alternatively, install with remotes:

remotes::install_github("mrc-ide/rrq", upgrade = FALSE)

Testing

To test, we need a redis server that can be automatically connected to using the redux defaults. This is satisfied if you have an unauthenticated redis server running on localhost, otherwise you should update the environment variable REDIS_URL to point at a redis server. Do not use a production server, as the package will create and delete a lot of keys.

A suitable redis server can be started using docker with

./scripts/redis start

(and stopped with ./scripts/redis stop)

License

MIT © Imperial College of Science, Technology and Medicine

rrq's People

Contributors

m-kusumgar avatar r-ash avatar richfitz avatar weshinsley avatar yapus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rrq's Issues

Issue with environment resolution while running in bulk mode

This is a stripped down version of the bug reported by @cwhittaker1000

with src.R containing:

thing <- R6::R6Class(
  "thing",
  public = list(fn = function(x) x + 1))

f <- function(x) {
  thing$new()$fn(x)
}
id <- ids::random_id(1, 4)
r <- rrq::rrq_controller$new(id)
r$envir(rrq::rrq_envir(sources = c("src.R")))
w1 <- rrq::rrq_worker$new(r$queue_id)

## Works fine:
t1 <- r$enqueue(f(1))
w1$step(TRUE)
r$task_result(t1)

## This works in a blocking worker:
source("src.R")
g1 <- r$lapply(1, f, timeout_task_wait = 0)
w1$step(TRUE)
r$task_result(g1$task_ids)

## Also works fine in a real worker:
w2 <- rrq::rrq_worker_spawn(r)
t2 <- r$enqueue(f(1))
r$task_wait(t2) # 2 as expected

## But this fails in a real worker:
g2 <- r$lapply(1, f, timeout_task_wait = 10)

The error object:

<rrq_task_error>
  from:   `2110a78728ee57f02b634a8498f1845c`(1)
  error:  object 'thing' not found
  queue:  ec732816
  task:   e90ce910790f9964ffbc6ededff04a6c
  status: ERROR
  * To throw this error, use stop() with it
  * This error has a stack trace, use '$trace' to see it

the trace is not very illuminating:


  1. ├─base::tryCatch(...)
  2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  3. │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  4. │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
  5. ├─base::withCallingHandlers(...)
  6. ├─base::eval(expr, envir)
  7. │ └─base::eval(expr, envir)
  8. │   └─global `2110a78728ee57f02b634a8498f1845c`(1)
  9. └─base::.handleSimpleError(...) at src.R:6:2
 10.   └─rrq (local) h(simpleError(msg, call))

Fix tests on appveyor

These might be timing related, not sure. There are 14 failed, 55 ok so at least some if this is working at least.

redis_command error WRONGTYPE Operation against a key holding the wrong kind of value in rrq_controller$worker_delete_exited()

I'm trying to understand who is supposed to clean-up EXITED & LOST workers from Redis database.
At the moment i can see after restarting Worker docker container for a few times i'm getting something like this:

> queue$worker_list_exited()
 [1] "unentertained_poodle"        "unanimating_spider"         
 [3] "indoor_appaloosa"            "marauding_yeti"             
 [5] "abdicative_bluebottle"       "coeducational_wrasse"       
 [7] "corncolored_cricket"         "piggish_lizard"             
 [9] "possible_clumber"            "craniometrical_yellowjacket"
[11] "semidry_coney"       

> queue$worker_delete_exited() throws error:
Error in redis_command(ptr, cmd) : 
  WRONGTYPE Operation against a key holding the wrong kind of value

Worker process dies with "This is an uncaught error in rrq, probably a bug!" message

Hello again. I'm having two workers processing a queue of tasks . After processing a task each - both workers die with the following error:

worker_1       | [2021-10-15 17:18:25] TASK_COMPLETE f94a41f63478ebddbd2ac156410f0daa                                                                                                                              
worker_1       | [2021-10-15 17:18:25] TASK_START 2cbf5520762384481743f061af6ae5bf                                                                                                                                 
worker_1       | [2021-10-15 17:18:25] HEARTBEAT stopping                                                                                                                                                          
worker_1       | [2021-10-15 17:18:25] STOP ERROR                                                                                                                                                                  
worker_1       | This is an uncaught error in rrq, probably a bug!                                                                                                                                                 
worker_1       | Error in self$location(hash, TRUE) : Some hashes were not found!                                                                                                                                  
worker_1       | Calls: rrq_worker_main ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>                                                                                                                 
worker_1       | In addition: Warning messages:                                                                                                                                                                    
worker_1       | 1: In max(thisIndexValues) :                                                                                                                                                                      
worker_1       |   no non-missing arguments to max; returning -Inf                                                                                                                                                 
worker_1       | 2: In max(thisIndexValues) :                                                                                                                                                                      
worker_1       |   no non-missing arguments to max; returning -Inf                                                                                                                                                 
worker_1       | 3: In max(thisIndexValues) :                                                          
worker_1       |   no non-missing arguments to max; returning -Inf                                     
worker_1       | 4: In max(thisIndexValues) :                                                          
worker_1       |   no non-missing arguments to max; returning -Inf                                     
worker_1       | 5: In max(thisIndexValues) :                                                          
worker_1       |   no non-missing arguments to max; returning -Inf                                     
worker_1       | 6: In max(thisIndexValues) :                                                          
worker_1       |   no non-missing arguments to max; returning -Inf                                     
worker_1       | 7: In max(thisIndexValues) :       
worker_1       |   no non-missing arguments to max; returning -Inf                                       
worker_1       | 8: In max(thisIndexValues) :       
worker_1       |   no non-missing arguments to max; returning -Inf                                       
worker_1       | Execution halted 

I was trying to play around with worker config and found out that when i use heartbeat_period = 3 this error almost always appears and nothing works. When i use heartbeat_period = 30 - it works as expected.

is it somehow related to high CPU load - like heartbeat cannot refresh redis key in time and worker got killed?

i'm also randomly catch other type of errors in my HTTP controller:

api_1          | 2021-10-15 17:59:10.970794 rserve[122] INFO /result task_id = ad21417c5eec438f4d33f8789a6e5bc8
api_1          | 2021-10-15 17:59:10.972284 rserve[94] ERROR Error in redis_pipeline(ptr, cmds): Unknown type [redux bug -- please report]
api_1          |                                    
api_1          | 2021-10-15 17:59:10.973574 rserve[94] DEBUG Outgoing POST response (id 981310c4-2de1-11ec-8c9f-0242ac120006): 500 Internal Server Error
api_1          | {"timestamp":"2021-10-15 17:59:10.972620","level":"ERROR","name":"Application","pid":102,"msg":"","context":{"request_id":"987c44a4-2de1-11ec-8c9f-0242ac120006","message":{"error":"Failure commu
nicating with the Redis server","call":"redis_command(ptr, cmd)","traceback":["FUN(request, response)",".res$set_body(queue$task_status(task_id))","queue$task_status(task_id)","task_status(self$con, self$keys, t
ask_ids)","from_redis_hash(con, keys$task_status, task_ids, missing = TASK_MISSING)","con$HMGET(key, fields)","command(list(\"HMGET\", key, field))","redis_command(ptr, cmd)"]}}}

Integration with {targets}

The targets package currently uses clustermq and future to send tasks to workers running on traditional clusters. As a next step for targets, I aim to support workers running on the cloud (AWS Batch, Fargate, Google Cloud Run, Kubernetes, etc.) just like Airflow, Metaflow, Nextflow, and Prefect. A task queue would be an excellent layer between targets and cloud platforms.

Before I learned about rrq, I started crew to extend https://www.tidyverse.org/blog/2019/09/callr-task-q/ to other types of workers. There are a couple callr-based queues, and the future-based queue seems to make future.batchtools workloads a bit more efficient. That's about as far as I have pursued crew up to this point. Interprocess communication and heartbeating seem like huge challenges given how much more isolated AWS Batch jobs are than jobs on a traditional cluster.

So I am wondering if I can use rrq for targets. Can it support workers on the cloud? There are mentions of AWS in the docs, particularly about the Redis server, and I would like to learn more about how the pieces fit together for a use case like mine.

queuing calls inside other functions

Love this package as I am a big fan of the python rq package and this feels like the closest I can get in R at the moment. I notice that you have been doing a fair amount of work in this space, so thanks very much and kudos.

I have come across a problem that yields an error:

[2018-04-10 13:30:49] STOP ERROR
worker_1 | This is an uncaught error in rrq, probably a bug!
worker_1 | Error in storr_copy(dest, self, list, namespace, skip_missing) :
worker_1 | Missing values; can't copy:
worker_1 | - from namespace 'objects', key: '6717f2823d3202449301145073ab8719'
worker_1 | Calls: rrq_worker ... tryCatch -> tryCatchList -> tryCatchOne ->
worker_1 | Execution halted

as a reproducible example i have a a script file "myfuns.R"

slowdouble <- function(x) {
  Sys.sleep(x)
  print(x * 2)
  x*2
}

I create a context and save it separately to be loaded in both the master and worker with

context_holder = "context"
dir.create(context_holder)
root = paste0(context_holder, "/test_context")
out = paste0(context_holder,"/context.RData")
orig_source = "myfuns.R"

new_source = paste0(context_holder,"/",orig_source)
file.copy(orig_source,new_source)

context = context::context_save(root,sources = new_source)
context = context::context_load(context, new.env(parent = .GlobalEnv))
saveRDS(context, out)

Run the worker in one R process

library(rrq)
context = readRDS("context/context.RData")
rrq_worker(context, redux::hiredis(host = host, port = port))

Then in the master

library(rrq)
context = readRDS("context/context.RData")
obj = rrq_controller(context,redux::hiredis(host = host,port = port))

to create the controller object.

It works fine if I run

obj$enqueue(slowdouble(1))

but breaks if I do

f = function(x){
  obj$enqueue(slowdouble(x))
}
f(1)

Any ideas on why this might be or what I might do to fix it?

CRAN release?

Are there plans for a CRAN release? rrq is the most sophisticated task queue for R that I have been able to find, and I would love to build on it for #64.

Worker garbage logs in redis

First of all thanks for this wonderful package.

While queue running, there are many keys created in redis, like this:

queue_id:worker:worker_name_7fd48df4:log

Over time, they accumulate a lot in database.

Is there any way to clean-up theese keys?

recommended redis.conf for rrq

I am beginning a rewrite of crew, and part of my plan is to build on top of rrq and take responsibility for configuring and launching short-lived instances of redis-server (initiate when a pipeline begins, terminate when the pipeline ends or crashes). I am writing to ask your advice on a good redis.conf file. What settings would you suggest to only enable the Redis commands that rrq uses?

I read through https://raw.githubusercontent.com/redis/redis/7.0/redis.conf, and so far I am thinking about the following template for ephemeral conf files. (I do not know how to use the TLS settings, and I do not know where to find the certificates or how to securely deliver them to clients, but it would be nice to have eventually.)

# set dynamically
bind {{insert_local_IP_on_the_LAN_or_VPC}} # 127.0.0.1 -::1
port {{insert_random_ephemeral_port}} # 6379
# requirepass {{insert_random_long_password}}
proc-title-template "{{insert_name}} {title} {port} {tls-port}"
loglevel {{insert_loglevel}} # notice

# logs
set-proc-title yes
logfile ""

# scale
daemonize no
cluster-enabled no
timeout 0
tcp-keepalive 300
maxclients 10000
databases 1
# io-threads 4
# io-threads-do-reads no

# storage
save ""
appendonly no
appendfsync no
shutdown-timeout 0
shutdown-on-sigint nosave
shutdown-on-sigterm nosave

# security
protected-mode yes
enable-protected-configs no
enable-debug-command no
enable-module-command no

# tls
# tls-port 6379
# tls-cert-file redis.crt
# tls-key-file redis.key
# tls-key-file-pass secret
# tls-cert-file redis.crt
# tls-key-file redis.key
# tls-key-file-pass secret
# tls-client-cert-file client.crt
# tls-client-key-file client.key
# tls-client-key-file-pass secret
# tls-ca-cert-file ca.crt
# tls-ca-cert-dir /etc/ssl/certs
# tls-auth-clients no
# tls-auth-clients optional
# tls-protocols "TLSv1.2 TLSv1.3"
# tls-ciphers DEFAULT:!MEDIUM
# tls-ciphersuites TLS_CHACHA20_POLY1305_SHA256
# tls-prefer-server-ciphers yes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.