rabbitmq / tgir Goto Github PK
View Code? Open in Web Editor NEWOfficial repository for Thank Goodness It's RabbitMQ (TGIR)!
License: Apache License 2.0
Official repository for Thank Goodness It's RabbitMQ (TGIR)!
License: Apache License 2.0
Regardless how much guidance & effort we have put into explaining and reasoning about RabbitMQ's use of system memory, we keep finding ourselves in these types of situations:
Hey everyone - I have a decent size rabbitmq server with 61G of RAM, sitting on a 0.4 watermark default, which gives a 24GB High watermark. It's been up for about 7 days now, and memory is hovering around 23G/24G. Our monitoring is picking this up as super high usage (which I guess is correct), but if I use the cli to get stats out, I see allocated_unused: 19.2819 gb (75.78%)
which makes me think this isn't a problem at all? Anything I can do to either drop the RAM usage or get it show the real usage not just allocated?
I don't think there's harm in leaving this alone...
hehe you don't have pagerduty screaming at you ๐
Restart it?
Last time I restarted the node, it crashed on shutdown and I lost all persistent messages ๐ฑ
it got stuck on rebuilding indexes
I think it kept getting hit by OOM
so it was in an endless loop
I even doubled the RAM to 122G and tried to start up
What are you running?
Rabbit v3.7.2 with Erlang v20.1.7
Can we see your RabbitMQ Overview, rabbitmq_top & memory breakdown?
[root@liverabbitone centos]# rabbitmqctl eval 'recon:bin_leak(10).'
[{<9202.2247.0>,-22841,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.511.0>,-15892,
[channel_queue_exchange_metrics_metrics_collector,
{current_function,{gen_server,loop,7}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.2667.0>,-15865,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.2635.0>,-15865,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.2127.0>,-2160,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.513.0>,-1443,
[channel_exchange_metrics_metrics_collector,
{current_function,{gen_server,loop,7}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.12165.991>,-955,
[rabbit_mgmt_db_cache_connections,
{current_function,{gen_server,loop,7}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.2043.0>,-558,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.3353.0>,-548,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]},
{<9202.2059.0>,-522,
[{current_function,{gen_server2,process_next_msg,1}},
{initial_call,{proc_lib,init_p,5}}]}]
Well, maybe your best option would be to stand up the latest version and re-direct your traffic there. Then use the shovel to move existing messages. https://www.rabbitmq.com/blue-green-upgrade.html
oh that's cool, thanks for that ๐
@Farkie can you please add a RabbitMQ Overview screenshot from your environment to this issue? I would like to see the global counts. A sanitized rabbitmqctl report
attached as a file would be champion.
HAProxy, load balancer, socket tuning ?
How does RabbitMQ handle network latency? What about a clean network partition? And a partial network partition, or Byzantine fault?
We have at our disposal a wide variety of tooling for the Kubernetes infrastructure that will let us make new discoveries about the behaviour of RabbitMQ.
Chaos Mesh - Chaos testing framework for Kubernetes clusters by the CNCF, very recently made GA. Allows for many cluster disturbances to be run continuously, or on a cron schedule, and on subsets of pods. The different chaos events are known as experiments, which consist of:
RabbitTestTool - Tool for orchestration and benchmarking of RabbitMQ clusters in EC2, GKE or EKS. Allows for 'playlists' to be created and run, where a playlist consists of systems, benchmarks and workloads. This allows easy A/B benchmarking of the same workloads against different systems, or different workloads on the same system.
Kubestone - Benchmarking tool for Kubernetes clusters. Implements a number of controllers for various benchmarking tools, such as system performance profiling, HTTP load benchmarks, etc. Can be extended with custom operators to support different benchmarks. For example, we could contribute to this project to provide a RabbitMQ benchmark if we so wished.
Sonobuoy - Security benchmarking & e2e testing of K8s workloads. Extensible with custom plugins. We could also contribute to this project with a RabbitMQ benchmarking plugin.
A new RabbitMQ version comes out. Exciting! Shiny new features, bug fixes, security patches, etc. It's time to upgrade. But hang on a bit: we have tens or even hundreds of applications using RabbitMQ in production. How should we go about upgrading?
This question comes up frequently in the enterprise RabbitMQ community, as part of what we call Day 2 Operations. Every company or team decide which upgrade strategy works better for them: blue-green deployment, rolling (one node at a time) upgrades, etc. But every strategy comes with its advantages and trade-offs, which are not well understood by RabbitMQ users.
What happens to clients during a rolling upgrade? What happens to particular types of queues? What if an alarm gets triggered during an upgrade? When should I expect for downtime? When there's a risk of data loss? How clusters reform after an upgrade? How to configure RabbitMQ or its clients to be upgrade-resilient? These questions come up over and over again and I think many people would benefit from some literature or guidance around RabbitMQ upgrades.
I know that RabbitMQ has these great Grafana dashboards, and even a RabbitMQ Summit 2019 talk that covers them in more detail. How can I set this up?
Thanks @piomin for your great RABBITMQ MONITORING ON KUBERNETES post that reminded me to cover this explicitly.
The obligatory Commit Strip, which is the admission price for TGIR-ing.
Following up on this amazing first ticket for the topic.
RabbitMQ runs also in a cluster. Kubernetes pods are unpredictable, What should take care of while operating a RabbitMQ cluster on K8s.
Kubernetes and Istio
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.