deriv-com / perl-myriad Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 15.0 822 KB

Microservices framework

License: Other

Dockerfile 0.14% Perl 99.75% Shell 0.11%

perl-myriad's People

Contributors

Stargazers

Watchers

Forkers

ea-binary tom-binary eyadof leonerd pablo-binary nael-binary mehdiorangpour-deriv youssef-deriv haritha-deriv mahrous-deriv garg472 ngeojiajun-deriv dark-warlord14

perl-myriad's Issues

Service events not received in `Receiver`

With this, I would expect output indicating that events were delivered to the Receiver.

However, only the Emitter appears to be generating anything:

package Service::Spam;

use Myriad::Service;

=head2 relentless

A furious storm of neverending events.

Aside from pausing while blocked, and deferring after each message to the next
event loop iteration, this will continuously emit sequentially-ascending
C<< { id => 123 } >> events.

=cut

async method relentless : Emitter() ($sink) {
    my $id = 1;
    while(1) {
        $log->debugf('Wait for later');
        await $self->loop->later;
        $log->debugf('It is now later');
        my $blocked = $sink->unblocked;
        unless($blocked->is_ready) {
            $log->debugf('We were blocked, and will wait');
            await $blocked;
            $log->debugf('We are no longer blocked after %.2fms, and will resume', 1000.0 * $blocked->elapsed);
        }
        $log->debugf('Next ID will be %d', $id);
        $sink->emit({ id => $id++ });
        $log->infof('Emitted and incremented to %d', $id);
        await $self->loop->delay_future(after => 0.5);
    }
}

=head2 burn

The counterpart to L</relentless>, this will receive those events to clear up
space in the queue.

=cut

async method burn : Receiver(channel => 'relentless') ($src) {
    $src->each(sub {
        $log->infof('Have item %s', $_)
    })->retain;
    return $src;
}

1;

at info level, the output is:

Starting Myriad on ... pid ... at 2021-04-27T11:41:50.349351+08:00
Starting service [service.spam]
Done
Emitted and incremented to 2
Have item {data => {id => 1}}
Emitted and incremented to 3
Have item {data => {id => 2}}
Emitted and incremented to 4
Have item {data => {id => 3}}
Emitted and incremented to 5
Have item {data => {id => 4}}
Emitted and incremented to 6
Have item {data => {id => 5}}

Don't allow Redis RPC's client to add messages to streams that doesn't exists

We have at least two issues in the current RPC handling:

A client sending messages to a service that doesn't exist.
A client sending messages to a method that isn't implemented.

Although the client (per spec) will eventually timeout and the correctness of the application will be maintained, this will leave garbage in Redis.

We should implement a way to prevent the RPC client from adding messages in these two cases.

In Redis 6.2+ xadd has the NOMKSTREAM option.
for Redis 5 and less the key idea is to combine one exists call then monitor the keyspace for changes on the key.

Syntax issue in the "Changes" file

in the v0.006 release changelog, I made few formatting mistakes like a trailing space and missed a blank line between the sections,

we should keep the format consistent in that file and this issue is a reminder.

Inaccurate messages acknowledgment in Subscriptions

In Myriad subscriptions, emitters continuously check for their stream overflow status, and calls cleanup on it.
Where the check will be on oldest_processed_id of that stream to determine how much we need to clean based on limit.

Pending info [10398,"1626681092140-0","1626681989722-40",[["coffee.drinker.judge/drinkers_tracker",10398]]]
Pending from 1626681092140-0
Pending check where oldest was 1626681989722-40 and first 1626681092140-0
=> Earliest ID to care about: 1626681092140-0
No point in trimming: oldest is 1626681092140-0 and this compares to 1626681092141-0
No receivers, waiting for a few seconds

As you can see above right where I added the arrow, it seems like compare_id method in Myriad transport here
is not working as intended as its confusing the second part of Redis streams IDs (at line 182 in the link).

This is from Redis documentation about IDs

Both quantities are 64-bit numbers. When an ID is auto-generated, the first part is the Unix time in milliseconds of the Redis instance generating the ID. The second part is just a sequence number and is used in order to distinguish IDs generated in the same millisecond.

so I think there is a need for a better condition than return $first[0] <=> $second[0] || $first[1] <=> $second[1];

Excess log output during tests

t/RPC/full-cycle.t ..... Starting service [test.ping]
Done
Starting service [test.pong]
Done

When a group of services run as collection

When we run a group of dependent services configured in a docker-compose file. Where they will have their dedicated Redis cluster configured with them. These issues might be coming along:

On start Redis cluster will take some be to be up. During this time services will start failing unexpectedly with:
- Cluster is not ready error:

Starting Myriad on 2f548cc6f853 pid 7 at 2021-07-26T02:17:08.367528Z
Use of uninitialized value $idx in array element at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 281.
Starting service [coffee.manager.stats]
Use of uninitialized value $idx in array element at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 281.
 Failed while starting up receiver: IO::Async::Future=HASH(0x55c28e388160) is already failed and cannot be ->fail'ed at /opt/perl-5.32.1/lib/site_perl/5.32.1/Myriad.pm line 461.
Done
/opt/perl-5.32.1/bin/myriad.pl failed due to no node found for slot at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 287.

no stream exists

 /opt/perl-5.32.1/bin/myriad.pl failed due to There is no such stream, is the other service running? (category=transport_redis , reason=no such stream: service.subscriptions.coffee.drinker.heavy/drink)

occasional calling as_string on undefined in lib/Myriad.pm 696

   	(in cleanup) Can't call method "as_string" on an undefined value at /opt/perl-5.32.1/lib/site_perl/5.32.1/Myriad.pm line 695 during global destruction.

so far on my local Im not able to produce this, but on staging environment when these issues starts to happen especially with a subscriber service, it might make the emitter to go into stall mode, and receiver might go into that state of not getting anything and emitter not sending anything. ( However its not yet clear how to trigger such state )

Redis RPC inefficient reads

There is a bug in Myriad::RPC::Implementation::Redis when Myriad is loaded with more than one service, the current design will attempt to read from a stream with a BLOCK and if there aren't any messages it'll delay the other services RPCs from getting their messages.

Suggestion: To make the BLOCK time optional in the ready_from_steram sub in Myriad::Transport::Redis and modify the Myriad::RPC::Implementation::Redis accordingly.

Memory leak in service `:Emitter`s

We've observed high memory usage even in simple emitter code:

package Example::Service;
use Myriad::Service qw(:v1);
async method relentless : Emitter() ($sink) {
    my $id = 1;
    while(1) {
        await $self->loop->later;
        await $sink->unblocked;
        $sink->emit({ id => $id++ });
    }
}

This shows high CPU usage and rapidly (exponential?) increasing memory usage.

Batch method stops emitting due to length limit

The batch method silently dies when the number of events (that have no subscribers) reaches the length limit

127.0.0.1:6379> xinfo stream myriad.service.subscriptions.deriv.service.aggregation.perfectmoney/fetch_updates
 1) "length"
 2) (integer) 10405
 3) "radix-tree-keys"
 4) (integer) 109
 5) "radix-tree-nodes"
 6) (integer) 250
 7) "last-generated-id"
 8) "1668785969317-0"
 9) "groups"
10) (integer) 1
11) "first-entry"
12) 1) "1668523313286-0"
    2) 1) "data"
       2) "{\"default\":3}"
13) "last-entry"
14) 1) "1668785969317-0"
    2) 1) "data"
       2) "{\"default\":0,\"c\":4}"

We need to monitor and detect when this behavior occurs, or provide a log that highlights that the key in Redis needs clean up.

RPC/Subscriptions: Group names are unique per service in some cases

The group names in the consumer groups model, shouldn't be unique per service - they are UUID sometimes - and that is a bug affecting our workers model.

Redis transport: Cleanup empty groups on shutdown

Redis will be filled with empty unused groups, to avoid that we should delete the group on shutdown.

The groups are:

RPC groups
Subscriptions groups

The process should consider other consumers who are still active and using the group, so it should check the idle time for the consumers in that group.

An arbitrary assumption is that the consumers will be active every 15000 ms.

A better alternative is to make the service deletes its consumers from Redis then check the group info for deletion.

A cleanup script should be also available through Myriad's CLI.

Weird behavior on the hash related function for `Myriad::Role::Storage` instances backed by Redis

Summary

When the hash related functions such as hash_set, it is apparently prepend the prefixes to both of the key and the hash key in a weird way. Does this is intended?

Minimal code example

package ZZZ::BugReproduce;

use Myriad::Service;

=head1 DESCRIPTION

This is not a real service but is a minimal show case for the code
that reproduce the bug associate with race access done
against the C<Myriad::Role::Storage> object

=cut

has $storage;
has $iter;

async method startup() {
  # get us the required objects
  $storage = $api->storage;
  $iter = 1;
}

async method diagnostics($level){
  return 'ok'; # nothing to check for its liveness
}

async method bug_reproc :Batch () {
  my @futures=(
    $storage->hash_set('hash','vvv',$iter),
    $storage->hash_get('hash','vvv'),
    $storage->set('xxx', $iter),
    $storage->get('xxx'),
    $storage->orderedset_add('set',$iter,'xxx')
  );
  my @result = await Future->wait_all(@futures);
  map {die $_->failure if defined $_->failure} @result;
  $iter++;
  return [];
}

1;

What is in redis

Getting all the keys set by the service

>>> Calling KEYS *bug*
localhost:6379: 
192.168.64.25:6379: 
172.27.0.3:6379: myriad.service.subscriptions.zzz.bugreproduce/bug_reproc
myriad.storage.service.zzz.bugreproduce/set
service.zzz.bugreproduce/hash
172.27.0.25:6379: myriad.storage.service.zzz.bugreproduce/xxx
172.26.0.25:6379:

The actual content of the has in question

172.27.0.3:6379: myriad.storage.vvv
245

bootstrap.t failing on cpantesters

We still have failures in bootstrap.t:

http://www.cpantesters.org/cpan/report/abd6df3e-accf-11eb-84bc-edd243e66a77

(this is v0.005, which includes the Linux::Inotify2 check from #132 ).

Batch silent failures

Batch attribute for Myriad services is very limited to one structure, an Arrayref containing Hashrefs
Also no strict checking if we returning wrong value, as it will fail but without errors.

package Service::Test;
use Myriad::Service;

has $count;

async method test_a  :  Batch() {
    await $self->loop->delay_future(after=>0.5);
    $count++;
    $log->warnf('Publishing %d', $count);
    return $count; # Will not work
    return { count => $count }; # will not work
    return  [ { count => $count } ]; # Only this will work.
}

1;

I believe we should be allowing those two returns:

{ count => $count }
[ { count => $count } ]

else throw exception.

Wider Perl versions support

In the memory transport we are using a Perl 5.32+ specific syntax isa

this is not essential and it's the only place that we require such feature, better to replace it with a more compatible approach.

Config: The config namespace isn't correct

Because the config is managed through the storage layer, an extra storage is added to the config namespace, the namespace of config should ideally be <root_namespace>.config.

Receivers when stream is not created yet

When a receiver is defined in a service, and the stream that is configured to create a consumer group from is not yet created. We will see these in logs:

Starting service deriv.service.dfpublisher
 deriv.service.dfpublisher Service has started!
 
skipped subscription on stream service.subscriptions.deriv.service.doughflow.websocketapi/ewallet_trxns because: There is no such stream, is the other service running? (category=transport_redis , reason=no such stream: service.subscriptions.deriv.service.doughflow.websocketapi/ewallet_trxns) will try again

the thing is the logic for retrying after some time, does exist in Myriad, however it seems that we also interrupt the process when we reach this step (due to throwing an exception)
And when we are doing so, the service will not be able to run other attributes. i.e because of the exception is triggered on an individual Receivers, no other components will be able to run/function.

Memory leak when using Redis client-side cache

There appears to be a rapid memory leak, taking several GB after running for less than a minute.

This can be reproduced by enabling client-side caching (set MYRIAD_TRANSPORT_REDIS_CACHE=10000 as an environment variable, with #298 applied), then running a simple receiver/emitter pair:

package Service::First;
use Myriad::Service qw(:v1);
async method diagnostics ($level) { 'ok' }
async method gen : Emitter() ($sink) {
    my $src = $sink->source;
    while(1) {
        await $self->loop->delay_future(after => 1);
        $src->emit({ me => $self->loop->time });
    }
    return;
}

1;

and

package Service::Second;
use Myriad::Service qw(:v1);
async method diagnostics ($level) { 'ok' }
async method in : Receiver(service => 'service.first', channel => 'gen') ($src) {                                             
    return $src->map(sub {                          
        $log->infof('Received: %s', $_);                     
        return undef;                               
    });
} 
1;

Devel::MAT::Dumper currently segfaults, but the file shows a lot of arrayrefs with size 49428, all elements undef.

Syntax: We should be able to pass config to the sub attributes

for instance, we have the receiver configured like this:

async method get_something :Receiver(service => '', ...) ($src) {

}

currently, we are unable to pass anything but simple scalars to configure service that's limiting our abilities to point the same code to a different service instance (staging/production) through the config.

Two reasons for this limitation:

the parameters o the sub-attributes are parsed through an eval statement.
the configs are not available at the compile time, they are available at some stage after Myriad is actually running.

Overflowing RPC

When an RPC method have a lengthy step, typically an I/O operation thats takes time to finish, e.g: external API call. A certain behaviour arises when this RPC method gets called extensively in bursts mode; as in bursts of concurrent requests. Due to how currently Myriad implemented here it causes myriad to read all messages from stream as batches without waiting on any read batch before adding the next one, choking service and causing wait time to increase exponentially, since we didn't use transport layer to buffer these requests for us, and instead buffered them on service memory itself causing it to interfere with process itself.

We would need to wait on processing a read batch then only read the next one.
Solved in #157

RPCs wait on each other blocking read command

When running a microservice with multiple RPC methods being defined in it. A certain behaviour in Myriad will arise, where outgoing xreadgroup calls will end up waiting/blocking each others.

while (1) {
    fmap0{
        xreadgroup
    }, foreach => [$rpcs->@*]
}

so we will not loop over triggering xreadgroup again once done.

example from redis monitor while running a service with testing RPC methods:

...
1621644007.286567 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "response" "{}" "deadline" "10" "stash" "{}" "message_id" "1" "trace" "{}" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"
1621644007.326306 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service.rpc/test_method" ">"
1621644007.329050 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621643997281-0"
1621644007.333831 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service2.rpc/test_method2" ">"




1621644012.287435 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "response" "{}" "deadline" "10" "stash" "{}" "trace" "{}" "message_id" "1" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"




1621644017.285819 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "deadline" "10" "response" "{}" "message_id" "1" "trace" "{}" "stash" "{}" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"





1621644022.287674 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method" "deadline" "10" "response" "{}" "trace" "{}" "message_id" "1" "stash" "{}"
1621644022.407103 [0 172.17.0.3:49764] "XPENDING" "myriad.service.testing_service.rpc/test_method" "processors" "-" "+" "50" "561e3b988fc6"
1621644022.408710 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service.rpc/test_method" ">"
1621644022.411166 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644012287-0"
1621644022.412178 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644017285-0"
1621644022.413048 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644022287-0"
1621644022.413993 [0 172.17.0.3:49764] "XPENDING" "myriad.service.testing_service2.rpc/test_method2" "processors" "-" "+" "50" "561e3b988fc6"
1621644022.415379 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service2.rpc/test_method2" ">"

Also redis tests have been modified to test for such cases in t/transport/redis.t

The worker lost RPC calls when there is a lot of calls made into the method

Summary

The RPC make uses of the redis stream to store all the pending calls. In myriad this stream can be manipulated by the following futures:

The worker to dispatch the RPC call (in the service)
Cleanup worker to remove the old item (in the service)
Caller (can be either inside or outside the service)

These future are running in parallel, and usually it is safe to use under normal circumstances.
However inside the high load situation, the cleanup worker (https://github.com/deriv-com/perl-Myriad/blob/master/lib/Myriad/Transport/Redis.pm#L298-L336) might remove too much item than it supposed to be due to the assumption that stream length would be constant during the stream length computation (which introduces the race window).

To reproduce this, the calling rate need to be higher than the rate the calls are picked up.

Possible fixes

Redis 6.2.0 added a new flag which allow us to trim the stream so it would retains all item after the specified ID
To reduce the race window by considering the number of the entries added between the stream size calculation

Attached this patch as workaround for now:

diff --git a/lib/Myriad/Transport/Redis.pm b/lib/Myriad/Transport/Redis.pm
index 97e0b15..bd73f13 100644
--- a/lib/Myriad/Transport/Redis.pm
+++ b/lib/Myriad/Transport/Redis.pm
@@ -329,8 +329,16 @@ async method cleanup (%args) {
         }
         $total = $info->{length} - $total if $direction eq 'xrange';
 
+        # As the cleanup operation is NOT atomic, this will introduce a race window
+        # when the caller makes calls in a high rate, we should compensate the number of entries
+        # added during the calculation
+        my ($info2) = await $self->stream_info($stream);
+        my $entries_added = ($info2->{length} - $info->{length});
+        # Add the entries added during the process
+        # gaurd against race trimming by multiple instances of the service
+        $total += int($entries_added);
 #        my ($before) = await $redis->memory_usage($stream);
-        my ($trim) = await $redis->xtrim($stream, MAXLEN => $total);
+        my ($trim) = await $redis->xtrim($stream, MAXLEN => $total) unless $entries_added < 0;
 #        my ($after) = await $redis->memory_usage($stream);
         $log->tracef('Trimmed %d items from stream: %s', $total, $stream);
     }

Should use class methods instead of plain subs

We have a couple of methods in RPC/Message.pm that are factory-style subs:

as per @tom-binary this is a problem as we should not mix plain subs and methods in the same package:

we should make them class methods like:

use Object::Pad;
 class Example { 
   sub new_from_something ($class, %args) { 
     return $class->new(%args)
   }
} 
Example->new_from_something()

`bootstrap.t` fails when Linux::Inotify2 is not installed

Example report, courtesy of BinGOs:

http://www.cpantesters.org/cpan/report/05daefa6-a6be-11eb-84bc-edd243e66a77

For now, we should skip_all when the module is not available. Eventually we could implement a polling approach as fallback for other platforms.

deriv-com / perl-myriad Goto Github PK

perl-myriad's People

Contributors

Stargazers

Watchers

Forkers

perl-myriad's Issues

Summary

Minimal code example

What is in redis

Summary

Possible fixes

Recommend Projects

Recommend Topics

Recommend Org

Jobs