GithubHelp home page GithubHelp logo

mfdlabs / grid-bot Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 0.0 6.1 MB

The underlying code used for the MFDLABS Grid Bot.

Home Page: https://grid-bot.ops.vmminfra.net

License: Apache License 2.0

C# 90.83% Lua 8.80% Dockerfile 0.20% Shell 0.16%
grid gridbot bot discord roblox rbx rblx rcc rccservice server

grid-bot's People

Contributors

bkordan avatar easternbloxxer avatar jvalara avatar mfdlabs-ops avatar nikita-petko avatar nosyliam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

grid-bot's Issues

Migrations to Discord.Net 3.3.2

Is your feature request related to a problem? Please describe.
We need to migrate to the latest discord.net as a maintainer suggests that the latest version fixes the timeouts

Describe the solution you'd like
A fix to timeouts

Describe alternatives you've considered
Making all callbacks multithreaded

Additional context
Add any other context or screenshots about the feature request here.

Easy deployment on remote machines.

This issue and milestone aims to integrate the LOVE_ALL_ENVIRONMENTS flag.

Right now it is tedious as hell to set this up on any machine that doesn't already have the requirements.

OpSec: SEC-04-LAE

Integrations of Windows Docker images, AWS images and better tooling scripts for easy deployment.

This also ties into SEC-13-ADP with a possibility of a new script on deployments that bootstraps setups for new devices.

Also tie this into SEC-04-LAP with integration of Linux based environments.

AS WELL as tying into SEC-10-ARBITERS to integrate remote management of arbiter instances. Which may also integrate its own gRPC service that integrates its own local arbiter service.

TODO: draft implementation.

Fix logger issues.

This will try to fix the major issue with the logger being the fact that Ansi characters don't render on non AnsiConsole.

Also I will migrate a few thing to P/Invoke etc.

Fix auto release uploader.

There's an issue with the auto release uploader that causes it to upload corrupted archives.

I may fix this by either uploading form-data or raw bytes.

Fix the CODEOWNERS file.

The current codeowners file has the following problems:

  1. The specified teams are not present within this organization
  2. It is out of date

We can fix this by creating the teams and determining on the code-owners registry who owns what.

Grid Server issue

So there’s a new issue with the frontend user that causes all single instance and arbitered grid server requests to timeout.

It has to do with the WebServer that drives the backend for the GridServer; the grid server needs this backend to download settings for it’s own operation, and will crash if it fails (I could do a file based settings thing so I can avoid the web server, but it’s there so just keep it.)

Screenshots:
02/11/2021
A3E8E1CC-70CD-4AAB-8540-F0333575DFC1
03/11/2021
11BF890D-433F-4208-BE26-42AFB7A9D75F

The subdomain api.sitetest4.robloxlabs.com causes it, and it may be due to an open handle not being called (not calling a next() in a middleware)

How do I want to go about fixing this? What I can do is remove EVERY api other than ClientSettings, Avatar, and Version Compatibility, that will also speed up the start time of the web server drastically, but it comes with the downside of unsupporting some features like game persistence etc.

All in all, if I seriously want to fix it, I could use something extremely streamlined like a C++ or C# instead of JavaScript, but I don’t want to go through the pain of rewriting it, anyway.

Edit:
MFDLabs.Internal.RbxAvatar.Site
MFDLabs.Internal.RbxClientSettings.Site
MFDLabs.Internal.RbxVersionCompatibility.Site
May become things :check_mark:

Ripped from:
https://backlog.mfdlabs.local/ui/grid/mfdlabs.grid.bot/issues/7/?t=no&focusSummary=true

Grid Server Instance Recovery System.

Currently when a timeout is caused on the Render Queue instance it's gone forever until you manually kill it, every subsequent request after the timeout is instantly dropped because that Job will still be running, and you cannot execute multiple Jobs at the same time on the same grid server instance (there may be a setting that allows you to, but who knows 🤷).

What this change will introduce is the following:

There will be a property on GridServerArbiter.GridServerInstance that determines if this can perform auto recovery, there will also be a Type array of Exception types that will invoke the auto recovery. If an auto recovery is invoked, it will close and reopen the instance, execute the command again (if the setting that says to do so is enabled). If the amount of times it has recovered has exceeded a specific number, it will permanently terminate the instance.

This will make it so HA integrity can be maintained on instances that need a long life cycle such as the render queue, or the up and coming shared user instances.

There may be multiple PRs related to this, so keep an eye on it.

From the WOTS on the matter:

We keep getting spammed on backlog with the exceptions of there being another job running already. If we can sort of fix it by adding something to kill it when it finds these exceptions that would be amazing!!

PvIX Metrics

GRIDBOT-28: Fix discord gateway hang

So we are having a major issue lately where the bot will die and never recover, the reasoning for this has been discovered as a "Hearbeat missed" exception:

Error Type: Discord.WebSocket.GatewayReconnectException
Error Detail: Server missed last heartbeat
Inner Exception: 
Exception Stack Trace: 
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\git\MFDLABS\MFDLabs.Grid\Dependencies\Discord.Net.WebSocket\ConnectionManager.cs:line 79
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

We've raised this with the developers of Discord.Net here: discord-net/Discord.Net#2126

One fix we may implement is the following: Instead of making initial checks in MessageReceived handler Sync, make it post to some WorkQueue for background processing so we don't block the gateway task

Time to address GRIDBOT-3 (HATE_TASK_THREAD)

Within the grid bot's infrastructure, we use 3 Expiring task threads (1 Async), this code is considered legacy, and breaks the development rule of HATE_SINGLETON.

If we can get rid of TaskThreads for AsyncWorkQueues, that would be better.

GRIDBOT-34: New rendering features.

Right now we only support the rendering of Avatars and Closeups of users, we would like to extend this functionality by creating new ways of rendering.

Change base namespaces

I want to change the base namespace from MFDLabs to something else because I just do not like the look of this name.

GA4 Client

Right now, MFDLabs.Analytics.Google.Client is UA only, I will want to create another client for GA4 (Metrics Protocol) and use both of these so we can be prepared for when they ultimately remove UA on July 2023.

MFDLabs.Analytics.Google.UniversalAnalytics.Client
MFDLabs.Analytics.Google.MetricsProtocol.Client

Better clarity for Copyright Notices

This development branch will aim to add a comment to the top of every file owned proprietarily by MFDLABS and the teams within it's own scope.

/* Copyright © MFDLABS Corporation 2001-. All rights reserved. */

This text will also be placed within the C# project file or Assembly Info file Copyright section.

Auto deployer for new releases

So currently we have someone manually deploy files to the nodes.

We are thinking of writing dedicated software that automates this (only for machines not hooked onto our network, else we can just use ARCBD)

The app will poll this repositories releases and determine if there's a new version by reading registry key. By default it will not deploy releases marked as "pre-release", unless we put a very specifc string into the release title that overrides this (do not do this, only do this on very specific cases as it will deploy to every available node)

To do this we may also need to create a bridge tcp or udp server on the bot that can accept maintenance commands from an external source, this is so we can enable maintenance automatically before deployment. This will also kill every arbitered instance so we have enough memory to complete the deploy.

When it finds a new release, it will download the release and decompress it into the predefined path that is in it's configuration (check vault), in there it will Set-Location to the predefined path, and run the unpacker script, and then Set-Location into the newly created folder. If the settings say, it will copy the configuration files from the previous deployment and overwrite the new deployment's configuration files.

Finally it will run a script that:

  1. Kills all arbitered instances (if not killed already)
  2. Kills the currently running bot
  3. Goes into the new directory and starts the ps1 script RunService.ps1

It will then persist this version in the registry.

This auto deployer will have timespan setting that determines the polling interval.

WARNING: If we can somehow get a webhook for this that listens for changes instead of polling, do that instead.

Minimal logging prefix.

Currently with logs, there's a massive prefix on the log string, and half the data on it is not cached, such as DNS resolution and IP lookup.

I want to make it so it will cache the data that doesn't change (Anything other than the current date, uptime and thread ID)

An example of this abomination is as shows:
image

That prefix is nearly 200 characters long and contains 13 data sets that are fetched every time a log is called, ultimately making this log inefficient.

I can shorten it to be only be around ~66 characters in length

Please watch for changes on fix/shorter-log-prefixes-and-faster-logging

GRIDBOT-25: Deployment, hostname and log file commands

Is your feature request related to a problem? Please describe.
This will create 2 commands that will be useful for debugging.

Describe the solution you'd like
2 commands to log the deployment ID and debug file name.

Describe alternatives you've considered
Removing the option from support ticket

Additional context
Add any other context or screenshots about the feature request here.

Better repository labels.

This repository need more labels that not only reflect opsecs but also reflect a Key ID of work.

SEC-04-REPOOPS

grid-bot labels system.

Platforms, despite the only available platform being Windows, should have their own labels.

OPSECS, despite having milestones should have their own labels to show relationships. The milestone represents the primary OPSEC for an issue or PR.

Priority should have labels:
P2 - Key Deliverable (Highest priority)
P1 - Deliverable (Mid priority)
P0 - Stretch Goals (Lowest priority, can be spread over multiple quarters without putting into at-risk)

Status should have labels despite the project:
Not Started
Research
Opportunistic
Deferred
On Track
At Risk
Blocked
Delayed
Partial Release
Complete
Cancelled

Motivated Areas should have their own labels, acts like opsecs but covers a major spectrum of implementation less an idea.

Issue kind should have its own label:
feature - single feature
fix - bug fix
enhancement - feature modification to existing product
ops - overhaul
dev - team branch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.