deriv-com / perl-myriad Goto Github PK
View Code? Open in Web Editor NEWMicroservices framework
License: Other
Microservices framework
License: Other
With this, I would expect output indicating that events were delivered to the Receiver
.
However, only the Emitter
appears to be generating anything:
package Service::Spam;
use Myriad::Service;
=head2 relentless
A furious storm of neverending events.
Aside from pausing while blocked, and deferring after each message to the next
event loop iteration, this will continuously emit sequentially-ascending
C<< { id => 123 } >> events.
=cut
async method relentless : Emitter() ($sink) {
my $id = 1;
while(1) {
$log->debugf('Wait for later');
await $self->loop->later;
$log->debugf('It is now later');
my $blocked = $sink->unblocked;
unless($blocked->is_ready) {
$log->debugf('We were blocked, and will wait');
await $blocked;
$log->debugf('We are no longer blocked after %.2fms, and will resume', 1000.0 * $blocked->elapsed);
}
$log->debugf('Next ID will be %d', $id);
$sink->emit({ id => $id++ });
$log->infof('Emitted and incremented to %d', $id);
await $self->loop->delay_future(after => 0.5);
}
}
=head2 burn
The counterpart to L</relentless>, this will receive those events to clear up
space in the queue.
=cut
async method burn : Receiver(channel => 'relentless') ($src) {
$src->each(sub {
$log->infof('Have item %s', $_)
})->retain;
return $src;
}
1;
at info
level, the output is:
Starting Myriad on ... pid ... at 2021-04-27T11:41:50.349351+08:00
Starting service [service.spam]
Done
Emitted and incremented to 2
Have item {data => {id => 1}}
Emitted and incremented to 3
Have item {data => {id => 2}}
Emitted and incremented to 4
Have item {data => {id => 3}}
Emitted and incremented to 5
Have item {data => {id => 4}}
Emitted and incremented to 6
Have item {data => {id => 5}}
We have at least two issues in the current RPC handling:
Although the client (per spec) will eventually timeout and the correctness of the application will be maintained, this will leave garbage in Redis.
We should implement a way to prevent the RPC client from adding messages in these two cases.
xadd
has the NOMKSTREAM
option.exists
call then monitor the keyspace for changes on the key.in the v0.006 release changelog, I made few formatting mistakes like a trailing space and missed a blank line between the sections,
we should keep the format consistent in that file and this issue is a reminder.
In Myriad subscriptions, emitters continuously check for their stream overflow status, and calls cleanup on it.
Where the check will be on oldest_processed_id
of that stream to determine how much we need to clean based on limit.
Pending info [10398,"1626681092140-0","1626681989722-40",[["coffee.drinker.judge/drinkers_tracker",10398]]]
Pending from 1626681092140-0
Pending check where oldest was 1626681989722-40 and first 1626681092140-0
=> Earliest ID to care about: 1626681092140-0
No point in trimming: oldest is 1626681092140-0 and this compares to 1626681092141-0
No receivers, waiting for a few seconds
As you can see above right where I added the arrow, it seems like compare_id
method in Myriad transport here
is not working as intended as its confusing the second part of Redis streams IDs (at line 182 in the link).
This is from Redis documentation about IDs
Both quantities are 64-bit numbers. When an ID is auto-generated, the first part is the Unix time in milliseconds of the Redis instance generating the ID. The second part is just a sequence number and is used in order to distinguish IDs generated in the same millisecond.
so I think there is a need for a better condition than return $first[0] <=> $second[0] || $first[1] <=> $second[1];
t/RPC/full-cycle.t ..... Starting service [test.ping]
Done
Starting service [test.pong]
Done
When we run a group of dependent services configured in a docker-compose file. Where they will have their dedicated Redis cluster configured with them. These issues might be coming along:
Starting Myriad on 2f548cc6f853 pid 7 at 2021-07-26T02:17:08.367528Z
Use of uninitialized value $idx in array element at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 281.
Starting service [coffee.manager.stats]
Use of uninitialized value $idx in array element at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 281.
Failed while starting up receiver: IO::Async::Future=HASH(0x55c28e388160) is already failed and cannot be ->fail'ed at /opt/perl-5.32.1/lib/site_perl/5.32.1/Myriad.pm line 461.
Done
/opt/perl-5.32.1/bin/myriad.pl failed due to no node found for slot at /opt/perl-5.32.1/lib/site_perl/5.32.1/Net/Async/Redis/Cluster.pm line 287.
/opt/perl-5.32.1/bin/myriad.pl failed due to There is no such stream, is the other service running? (category=transport_redis , reason=no such stream: service.subscriptions.coffee.drinker.heavy/drink)
calling as_string on undefined in lib/Myriad.pm 696
(in cleanup) Can't call method "as_string" on an undefined value at /opt/perl-5.32.1/lib/site_perl/5.32.1/Myriad.pm line 695 during global destruction.
so far on my local Im not able to produce this, but on staging environment when these issues starts to happen especially with a subscriber service, it might make the emitter to go into stall mode, and receiver might go into that state of not getting anything and emitter not sending anything. ( However its not yet clear how to trigger such state )
There is a bug in Myriad::RPC::Implementation::Redis
when Myriad is loaded with more than one service, the current design will attempt to read from a stream with a BLOCK
and if there aren't any messages it'll delay the other services RPCs from getting their messages.
Suggestion: To make the BLOCK
time optional in the ready_from_steram
sub in Myriad::Transport::Redis
and modify the Myriad::RPC::Implementation::Redis
accordingly.
We've observed high memory usage even in simple emitter code:
package Example::Service;
use Myriad::Service qw(:v1);
async method relentless : Emitter() ($sink) {
my $id = 1;
while(1) {
await $self->loop->later;
await $sink->unblocked;
$sink->emit({ id => $id++ });
}
}
This shows high CPU usage and rapidly (exponential?) increasing memory usage.
The batch
method silently dies when the number of events (that have no subscribers) reaches the length
limit
127.0.0.1:6379> xinfo stream myriad.service.subscriptions.deriv.service.aggregation.perfectmoney/fetch_updates
1) "length"
2) (integer) 10405
3) "radix-tree-keys"
4) (integer) 109
5) "radix-tree-nodes"
6) (integer) 250
7) "last-generated-id"
8) "1668785969317-0"
9) "groups"
10) (integer) 1
11) "first-entry"
12) 1) "1668523313286-0"
2) 1) "data"
2) "{\"default\":3}"
13) "last-entry"
14) 1) "1668785969317-0"
2) 1) "data"
2) "{\"default\":0,\"c\":4}"
We need to monitor and detect when this behavior occurs, or provide a log that highlights that the key in Redis needs clean up.
The group names in the consumer groups model, shouldn't be unique per service - they are UUID sometimes - and that is a bug affecting our workers model.
Redis will be filled with empty unused groups, to avoid that we should delete the group on shutdown.
The groups are:
The process should consider other consumers who are still active and using the group, so it should check the idle time for the consumers in that group.
An arbitrary assumption is that the consumers will be active every 15000 ms
.
A better alternative is to make the service deletes its consumers from Redis then check the group info for deletion.
A cleanup script should be also available through Myriad's CLI.
When the hash related functions such as hash_set
, it is apparently prepend the prefixes to both of the key and the hash key in a weird way. Does this is intended?
package ZZZ::BugReproduce;
use Myriad::Service;
=head1 DESCRIPTION
This is not a real service but is a minimal show case for the code
that reproduce the bug associate with race access done
against the C<Myriad::Role::Storage> object
=cut
has $storage;
has $iter;
async method startup() {
# get us the required objects
$storage = $api->storage;
$iter = 1;
}
async method diagnostics($level){
return 'ok'; # nothing to check for its liveness
}
async method bug_reproc :Batch () {
my @futures=(
$storage->hash_set('hash','vvv',$iter),
$storage->hash_get('hash','vvv'),
$storage->set('xxx', $iter),
$storage->get('xxx'),
$storage->orderedset_add('set',$iter,'xxx')
);
my @result = await Future->wait_all(@futures);
map {die $_->failure if defined $_->failure} @result;
$iter++;
return [];
}
1;
Getting all the keys set by the service
>>> Calling KEYS *bug*
localhost:6379:
192.168.64.25:6379:
172.27.0.3:6379: myriad.service.subscriptions.zzz.bugreproduce/bug_reproc
myriad.storage.service.zzz.bugreproduce/set
service.zzz.bugreproduce/hash
172.27.0.25:6379: myriad.storage.service.zzz.bugreproduce/xxx
172.26.0.25:6379:
The actual content of the has in question
172.27.0.3:6379: myriad.storage.vvv
245
We still have failures in bootstrap.t
:
http://www.cpantesters.org/cpan/report/abd6df3e-accf-11eb-84bc-edd243e66a77
(this is v0.005, which includes the Linux::Inotify2 check from #132 ).
Batch attribute for Myriad services is very limited to one structure, an Arrayref containing Hashrefs
Also no strict checking if we returning wrong value, as it will fail but without errors.
package Service::Test;
use Myriad::Service;
has $count;
async method test_a : Batch() {
await $self->loop->delay_future(after=>0.5);
$count++;
$log->warnf('Publishing %d', $count);
return $count; # Will not work
return { count => $count }; # will not work
return [ { count => $count } ]; # Only this will work.
}
1;
I believe we should be allowing those two returns:
{ count => $count }
[ { count => $count } ]
else throw exception.
In the memory transport we are using a Perl 5.32+ specific syntax isa
this is not essential and it's the only place that we require such feature, better to replace it with a more compatible approach.
Because the config is managed through the storage layer, an extra storage
is added to the config namespace, the namespace of config should ideally be <root_namespace>.config.
When a receiver is defined in a service, and the stream that is configured to create a consumer group from is not yet created. We will see these in logs:
Starting service deriv.service.dfpublisher
deriv.service.dfpublisher Service has started!
skipped subscription on stream service.subscriptions.deriv.service.doughflow.websocketapi/ewallet_trxns because: There is no such stream, is the other service running? (category=transport_redis , reason=no such stream: service.subscriptions.deriv.service.doughflow.websocketapi/ewallet_trxns) will try again
the thing is the logic for retrying after some time, does exist in Myriad, however it seems that we also interrupt the process when we reach this step (due to throwing an exception)
And when we are doing so, the service will not be able to run other attributes. i.e because of the exception is triggered on an individual Receivers, no other components will be able to run/function.
There appears to be a rapid memory leak, taking several GB after running for less than a minute.
This can be reproduced by enabling client-side caching (set MYRIAD_TRANSPORT_REDIS_CACHE=10000
as an environment variable, with #298 applied), then running a simple receiver/emitter pair:
package Service::First;
use Myriad::Service qw(:v1);
async method diagnostics ($level) { 'ok' }
async method gen : Emitter() ($sink) {
my $src = $sink->source;
while(1) {
await $self->loop->delay_future(after => 1);
$src->emit({ me => $self->loop->time });
}
return;
}
1;
and
package Service::Second;
use Myriad::Service qw(:v1);
async method diagnostics ($level) { 'ok' }
async method in : Receiver(service => 'service.first', channel => 'gen') ($src) {
return $src->map(sub {
$log->infof('Received: %s', $_);
return undef;
});
}
1;
Devel::MAT::Dumper currently segfaults, but the file shows a lot of arrayrefs with size 49428, all elements undef.
for instance, we have the receiver configured like this:
async method get_something :Receiver(service => '', ...) ($src) {
}
currently, we are unable to pass anything but simple scalars to configure service
that's limiting our abilities to point the same code to a different service instance (staging/production) through the config.
Two reasons for this limitation:
eval
statement.When an RPC method have a lengthy step, typically an I/O operation thats takes time to finish, e.g: external API call. A certain behaviour arises when this RPC method gets called extensively in bursts mode; as in bursts of concurrent requests. Due to how currently Myriad implemented here it causes myriad to read all messages from stream as batches without waiting on any read batch before adding the next one, choking service and causing wait time to increase exponentially, since we didn't use transport layer to buffer these requests for us, and instead buffered them on service memory itself causing it to interfere with process itself.
We would need to wait on processing a read batch then only read the next one.
Solved in #157
When running a microservice with multiple RPC methods being defined in it. A certain behaviour in Myriad will arise, where outgoing xreadgroup
calls will end up waiting/blocking each others.
while (1) {
fmap0{
xreadgroup
}, foreach => [$rpcs->@*]
}
so we will not loop over triggering xreadgroup again once done.
example from redis monitor while running a service with testing RPC methods:
...
1621644007.286567 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "response" "{}" "deadline" "10" "stash" "{}" "message_id" "1" "trace" "{}" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"
1621644007.326306 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service.rpc/test_method" ">"
1621644007.329050 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621643997281-0"
1621644007.333831 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service2.rpc/test_method2" ">"
1621644012.287435 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "response" "{}" "deadline" "10" "stash" "{}" "trace" "{}" "message_id" "1" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"
1621644017.285819 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "deadline" "10" "response" "{}" "message_id" "1" "trace" "{}" "stash" "{}" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method"
1621644022.287674 [0 172.17.0.3:49760] "XADD" "myriad.service.testing_service.rpc/test_method" "*" "args" "{\"data\":{\"test\":\"HI\"}}" "who" "me" "rpc" "test_method" "deadline" "10" "response" "{}" "trace" "{}" "message_id" "1" "stash" "{}"
1621644022.407103 [0 172.17.0.3:49764] "XPENDING" "myriad.service.testing_service.rpc/test_method" "processors" "-" "+" "50" "561e3b988fc6"
1621644022.408710 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service.rpc/test_method" ">"
1621644022.411166 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644012287-0"
1621644022.412178 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644017285-0"
1621644022.413048 [0 172.17.0.3:49760] "XACK" "myriad.service.testing_service.rpc/test_method" "processors" "1621644022287-0"
1621644022.413993 [0 172.17.0.3:49764] "XPENDING" "myriad.service.testing_service2.rpc/test_method2" "processors" "-" "+" "50" "561e3b988fc6"
1621644022.415379 [0 172.17.0.3:49764] "XREADGROUP" "BLOCK" "15000" "GROUP" "processors" "561e3b988fc6" "COUNT" "50" "STREAMS" "myriad.service.testing_service2.rpc/test_method2" ">"
Also redis tests have been modified to test for such cases in t/transport/redis.t
The RPC make uses of the redis stream to store all the pending calls. In myriad this stream can be manipulated by the following futures:
These future are running in parallel, and usually it is safe to use under normal circumstances.
However inside the high load situation, the cleanup worker (https://github.com/deriv-com/perl-Myriad/blob/master/lib/Myriad/Transport/Redis.pm#L298-L336) might remove too much item than it supposed to be due to the assumption that stream length would be constant during the stream length computation (which introduces the race window).
To reproduce this, the calling rate need to be higher than the rate the calls are picked up.
Attached this patch as workaround for now:
diff --git a/lib/Myriad/Transport/Redis.pm b/lib/Myriad/Transport/Redis.pm
index 97e0b15..bd73f13 100644
--- a/lib/Myriad/Transport/Redis.pm
+++ b/lib/Myriad/Transport/Redis.pm
@@ -329,8 +329,16 @@ async method cleanup (%args) {
}
$total = $info->{length} - $total if $direction eq 'xrange';
+ # As the cleanup operation is NOT atomic, this will introduce a race window
+ # when the caller makes calls in a high rate, we should compensate the number of entries
+ # added during the calculation
+ my ($info2) = await $self->stream_info($stream);
+ my $entries_added = ($info2->{length} - $info->{length});
+ # Add the entries added during the process
+ # gaurd against race trimming by multiple instances of the service
+ $total += int($entries_added);
# my ($before) = await $redis->memory_usage($stream);
- my ($trim) = await $redis->xtrim($stream, MAXLEN => $total);
+ my ($trim) = await $redis->xtrim($stream, MAXLEN => $total) unless $entries_added < 0;
# my ($after) = await $redis->memory_usage($stream);
$log->tracef('Trimmed %d items from stream: %s', $total, $stream);
}
We have a couple of methods in RPC/Message.pm
that are factory-style subs:
as per @tom-binary this is a problem as we should not mix plain subs and methods in the same package:
we should make them class methods like:
use Object::Pad;
class Example {
sub new_from_something ($class, %args) {
return $class->new(%args)
}
}
Example->new_from_something()
Example report, courtesy of BinGOs:
http://www.cpantesters.org/cpan/report/05daefa6-a6be-11eb-84bc-edd243e66a77
For now, we should skip_all
when the module is not available. Eventually we could implement a polling approach as fallback for other platforms.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.