uninett / argus Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 12.0 2.4 MB

Argus is an alert aggregator for monitoring systems

License: GNU General Public License v3.0

Python 98.44% HTML 1.08% Shell 0.18% Dockerfile 0.23% Makefile 0.07%

argus's People

Contributors

Stargazers

Watchers

Forkers

hmpf lunkwill42 ddabble jorgenbele katsel stveit johannaengland maxadamo mjsaarin tcr-lux sunet

argus's Issues

Implement an API endpoint for changing individual incident fields

This is mostly nice-to-have for the frontend, and will potentially replace e.g. the /api/v1/incidents/<pk>/active/ endpoint.

Replace generate_fixtures.py

First: there are two types of fixtures:

1. For filling lookup-tables/setting defaults in the database (like a default superuser). Use either
- a data-migration. Experience-wise, a hassle. Data-migrations are best for changing existing data in a production database, and removed whenever migrations are squashed.
- a management-command with lots of manager.objects.get_or_create. Kubernetes-friendly. Best to have one file defining the data and use the management command to check that they exist in the database.
- a json-fixture dumped from a live database with dumpdata and stored in version control. Process/checklist then needed to keep this up to date.
2. For testing. Use factory_boy with faker instead of reinventing the wheel. One file with the definitions, that are run by the tests only when needed.

Second. Any script to be run from the CLI, make it a management command.

Receiving alerts: plugins or glue-services?

Should the backend have a mapper (in plugins) from a sender to native format, or should the sender send native format, or should there be a glue service that translates from sender's format to the backend's native format?

A problem state may be opened which will never be closed by detection

There is another issue which may need to be discussed, related to my experience with NAV.

A problem state may be opened which will never be closed by detection. E.g.:

If an admin physically removes a piece of hardware from a router, NAV will detect this and flag the problem.

However, the hardware was removed on purpose and will never be re-inserted. NAV will only consider the problem resolved once it detects the hardware as having been re-inserted, so the problem will stay open indefinitely.

The NAV admin solves this by manually resolving the problem from NAV's status UI.

However, this closes the problem state in NAV, but never sends an event/alert through the system.

This would cause the problem state to stay open in AAS, while it's closed in NAV, and the two systems will be out of sync.

So, what would be the best way to ensure the updated problem state in NAV is propagated to AAS? That's a discussion we need to have...

Originally posted by @lunkwill42 in #45 (comment)

Sending notifications: plugins or glue-service?

Should the code sending notifications be hosted inside the backend or live as their own services? That is: if hosted in the backend, a plugin system is needed to add more notification recipient types. If hosted as microservices, we need to define a push API that the backend pushes to a microservice that then translates this to SMS/teams/email/rt/what have you

StackedJSONParser is presumably useless.

The POST handler for the /alerts/ API endpoint uses a custom JSON parser called StackedJSONPArser:

https://github.com/Uninett/aas/blob/22e11818400b76e5e4a822595ff4027e678a82bc/src/aas/alert/parsers.py#L10-L29

This is based on a code example I pointed to as a reference for parsing a stream of disparate JSON objects. It was meant as an example of how to parse the stream of disparate JSON blobs that would be emitted by NAV's eventengine, not as a way of parsing blobs posted to the API.

I would rather the endpoint support a proper JSONParser, and require the Content-Type of the request to be application/json. As it stands now, only text/plain works, which maybe makes sense, when you think about the fact that the endpoint will allow things that aren't strict JSON.

The endpoint only need support a single alert with each request, IMHO. A glue service would turn the stream of NAV events into individual API requests.

Sending notifications to SMS

We will use the 3rd party app django-phonenumber-field to store a phone number for a user.

Currently only one, on the User. OneToOneField or directly?

We will need an endpoint to add/edit/delete the phone number, and add/edit/delete an existing phone number to a notification profile. What else?

Make detail view of an alert work

Clicking on the link to more details currently:

uses an unsupported url.
sends you to the API and needs a token in the header

Break out notification-styling into type-dependent reusable components

There are multiple ways of sending SMSes: SaaSes like Twilio, talking to a locally connected phone etc. The style of the content is the same however, 160 characters. That styling can possibly be reused for slack.

For email, Django has multiple backends, so we hopefully can use that instead of our own plugin system. But: the styling for subject and body can be reused for Teams channels.

Add a severity level to Incident

Version 1.0 of Argus was released without an Incident attribute for "Severity", as this feature was underspecified.

After discussing internally how our service desk should operate, mostly according to ITIL principles, we have arrived at a simple design decision for Argus.

To avoid overloading terminology too much, we shall simply introduce a new attribute to the Incident model to loosely map to the concept of severity:

level

level shall be an integer value, in the range 1 through 5. 1 is the highest level, used for critical incidents. 5 is the lowest level. For compatibility purposes, the attribute shall not be mandatory (i.e. NULL values are allowed).

The 5 levels shall be assigned names for most human consumption. These names are subject to change, so ease of modification at a later stage must be considered. Each level must also feature a long-form description, which the frontend can use to help users understand the severity level properly.

The suggested level names for the initial implementation are:

5=Information
4=Low
3=Moderate
2=High
1=Critical

A null value may be termed "None" or "N/A".

How incidents are classified into these levels remains entirely up the API clients / glue services that report incidents from their corresponding source systems.

The severity level is only ever intended to be set by API clients, not including the frontend UI client.

As a side effect, notification filters must be changed to be able to filter on severity levels (greater than/less than/equals/not set). This also means that the incidents API endpoint needs a new argument to filter on levels as well. We should perhaps post these as separate issues to this one?

Expose real-time API for active alerts

Should expose a websocket real-time API for the frontend clients.

Implement a model and API for Acknowledgements

Incidents can be Acknowledged, primarily by end users of Argus.

Semantically, an Acknowledgment is a way for a user to comment on an Incident, signalling, for example, to other users that this Incident has the attention of someone, and something is being done about it.

Functionally, in a normal list of active/open Incidents, Incidents that have an active acknowledgment should be suppressed. I.e. an API endpoint that returns a list of active Incident objects, should remove acknowledged Incidents from the list, unless specifically instructed to keep them in the result (by way of a GET parameter).

Multiple acknowledgements can be created on an Incident.

A ack model should contain:

A relation to an Incident
A relation to the User that made the acknowledgement
A timestamp for when the acknowledment was made.
A message string, entered by the acknowledging user.
An optional expiry time.

For the purposes of filtering (as mentioned above), an Incident should be considered acknowledged when there is at least one non-expired Acknowledgement object associated with it.

Rename AAS everywhere to Argus

Name change decided 2020-06-22.

Rename NetworkSystem model to AlertSource

Multiple sources send alerts from multiple subsystems. The source is currently stored in NetworkSystem, which has a name and a type. The type is hard-coded to one of NAV or Zabbix, but should be a lookup table. What the name is is not very clear, it should probably be the hostname/ip-address of the host that sent the alarm.

I think "Source" is a better name for this object than NetworkSystem, better self-documentation.

SMS: where should the phone number be stored?

Notifications through email is easy, since the User-model already has an email-field. It does mean however that a NotificationProfile is locked to the User's email-address.

However, there is no predefined place to store a phone number. We could easily put it on User, and just as easily on NotificationProfile.

Ditto other notification methods: if slack, do we send all to a specific channel, or should it be possible to vary by NotificationProfile or User?

Maybe there should be a NotificationSystem-table, with a json-field with the type-specific data.

More phone number features

Verification of phone numbers (we could reuse the new system to verify email maybe, but less necessary due to fetching known emails from Feide)
Ensure that phone numbers are unique per user, user + phone number should be unique
#344

Important that changing a phone number is possible, since the numbers in the notification-profiles also change then.

Implement an API endpoint for manually resolving an Incident

An Argus end user should be able to resolve an Incident manually (this was discussed in #53). Several things should happen when this operation is selected by a user:

The operation must be logged to the Event table (the API should support adding a message for the event log, so the user can specify why they resolved an incident).
The Incident's end_time attribute must be set to the current time, to indicate is has been closed.
A notification containing the details of the event (who did what, when, and why/message) must be added to the notification queue.

Argus must expect that some message about an already resolved Incident might come in from a source system. This should not change the Incident record, but should be logged to the Event table.

Reporting errors in Argus itself

The types of errors that argus can report about itself, for instance: Failure to send a notification because the notification-endpoint isn't answering (email server down, say), should be reported as a incident.

This means we need a SourceType "argus" and a SourceSystem representing the host argus is running on. Named "self" maybe? "me"? I suspect hostname would be tricky. Also, a function/method argus can use to write to the incidents-table, with SourceType/SourceSystem locked.

(This is very nice, because we can dogfood the system using itself, triggering errors in argus in order to have incidents turn up in argus :) )

Set up CI using GitHub Actions

Properly formatting sent notifications

The styles currently in use for email and sms, is fine for dev but not very useful for production.

Email

Example, current email subject, before removal of object and parent_object:

"Incident at 2020-08-04 13:26:09.635804+00:00 [fre: fwfwrf: Object created via "create_fake_incident" (None) <ID >]

An email subject is preferably
- max 50 characters long
- most important things first

Having the timestamp that early is not good. Maybe...

Argus: fre: fwfwrf: Object created via "create_fake_incident" (None) <ID > 2020-08-04 13:26:09.635804+00:00

"Argus" might also not be necessary.

An email has a body where we could, if we wanted, put tons of stuff.

SMS

SMS is preferrably no longer than 160 characters, or it will be sent as an MMS.

Example, current experimental sms system, before removal of object and parent_object:

2020-08-04 13:26:09.635804+00:00 [fre: fwfwrf: Object created via "create_fake_incident" (None) <ID >]

I suspect we need to ask anticipated recipients what they would prefer...

Handle format errors in posted alerts gracefully

Attempting to post an alert JSON structure to the /alerts/ endpoint crashes the endpoint with KeyError traceback.

Errors like this should be handled more gracefully and be reported back to the client using a proper response code and human-readable detailed error message. In this particular instance, a useful message would explain that the input format is erroneous, possibly enumerating any validation problems with the input:

KeyError at /alerts/
'subid'

Request Method: POST
Request URL: http://aas.labs.uninett.no/alerts/
Django Version: 2.2.12
Python Executable: /usr/local/bin/python
Python Version: 3.7.6
Python Path: ['/usr/local/bin', '/', '/usr/local/lib/python37.zip', '/usr/local/lib/python3.7', '/usr/local/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/site-packages']
Server time: Thu, 30 Apr 2020 12:52:36 +0200
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'corsheaders',
 'social_django',
 'rest_framework',
 'rest_framework.authtoken',
 'aas.auth',
 'aas.alert',
 'aas.notificationprofile']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'corsheaders.middleware.CorsMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'social_django.middleware.SocialAuthExceptionMiddleware']


Traceback:

File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python3.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  54.         return view_func(*args, **kwargs)

File "/usr/local/lib/python3.7/site-packages/django/views/generic/base.py" in view
  71.             return self.dispatch(request, *args, **kwargs)

File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
  505.             response = self.handle_exception(exc)

File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py" in handle_exception
  465.             self.raise_uncaught_exception(exc)

File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py" in raise_uncaught_exception
  476.         raise exc

File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
  502.             response = handler(request, *args, **kwargs)

File "/usr/local/lib/python3.7/site-packages/aas/alert/views.py" in post
  42.             for json_dict in request.data

File "/usr/local/lib/python3.7/site-packages/aas/alert/views.py" in <listcomp>
  42.             for json_dict in request.data

File "/usr/local/lib/python3.7/site-packages/aas/alert/mappings.py" in create_alert_from_json
  261.     return mapping.create_model_obj_from_json(json_dict)

File "/usr/local/lib/python3.7/site-packages/aas/alert/mappings.py" in create_model_obj_from_json
  151.             field_mappings.update(choice.based_on(json_dict))

File "/usr/local/lib/python3.7/site-packages/aas/alert/mappings.py" in based_on
  66.             value = dict_[self.if_value_of]

Exception Type: KeyError at /alerts/
Exception Value: 'subid'

(This was from attempting to post an empty JSON blob: {})

Implement logging out with token authentication

Currently, only logging in is implemented. Logging out should ideally delete the user's authentication token.

Explain what Argus is

There's nothing publicly anywhere that explains what Argus is and who it's for. We should add this at least to the internal docs.

Detail URL should be fully qualified, since it points to the source's UI

Closes #42

Move the psa keys and secrets out of `aas.site.settings.base`

They don't belong there. "base" is not for site-specific settings at all.

Move them to both aas.site.settings.dev and aas.site.settings.prod, and document that tuned settings-files should start with one of those as a basis.

Store Source System base URLs

The alerts submitted on behalf of an "Alert Source" may contain relative URLs to more information. However, AAS currently has no way of linking back to these, since the base URL/FQDN of the source system is not stored anywhere. These should be stored within the Alert Source record, as that will allow the network system's FQDN to change without the need to update every alert database record related to that network system.

To clarify: An "Alert Source" may have a "name" attribute that often corresponds to a FQDN as well, but this attribute is meant for human consumption and identification. An actual base URL attribute should be machine readable and contain values such as https://nav.example.org/.

Should incidents pushed in through the API be active by default?

.. or maybe all alerts with timestamps equal to now or within five minutes? What about alerts that have timestamps in the future?

Revise Alert model to include both start and end timestamps

In the current incarnation of the codebase, the Alert model has only a single timestamp, and another relation is used to indicate whether an alert is active or not. This is somewhat removed from the original requirements, which were based on NAV - this model does not support the requirement to retain historic information about the time extents of problems (i.e. most problems were detected at a specific time, and were resolved at a later specific time).

To recap the required model, based on NAV's model:

There are two basic types of alerts

Stateful alerts

A stateful alert has a time extent, and represents an ongoing problem. I.e. it has a start timestamp, and either has, or is expected to have at some point in the future, an end timestamp.

The start timestamp indicates when the problem was detected. The end timestamp indicates when the problem was resolved.

A stateful alert should be considered active as long as it has not yet received an end timestamp.
A stateful alert should always be listed when fetching a list of currently active problems.

Stateless alerts

A stateless alert has no time extent. It has only a single timestamp, indicating when the alert was generated.

A stateless alert is never displayed on any lists of active problems, only when searching for historic alerts. Normally, when a stateless alert is generated, it is only logged and any user subscribed to matching criteria receives a one-time notification.

Data model

NAV's internal data model for this heavily relies on PostgreSQL's timestamp data types. These do not always translate well into other databases, such as SQLite or MySQL.

NAV uses the end timestamp to indicate several things:

A NULL value indicates the alert represents a stateless alert. Only the start timestamp is considered.
A value of infinity (which is PostgreSQL specific), indicates the alert represents an ongoing and active problem, that is expected to resolve some time in the future. I.e. this timestamp is updated once the problem is resolved.
Any valid timestamp value indicates that this problem was resolved at that time, and is no longer active.

Using this representation enables uncomplicated time-based SQL queries against the data:

Querying active problems means querying everything that has end_time >= infinity, or even just end_time >= NOW().
Querying any alert that was active at a specific point in time (A) only requires start_time <= A AND end_time <= A
PostgreSQL provides the OVERLAPS operator, allowing for simple query statements to find alerts that overlap with some given time period between A and B: (start_time, end_time) OVERLAPS (A, B)

I would really like for this model to be replicated in AAS, even though it would inexorably link it to using only PostgreSQL as the underlying RDBMS. The same design choice was made for NAV many years ago, and has worked reasonably well so far.

As long as only the API is the acceptable way of accessing AAS data, the actual implementation can change later, without causing breaking API changes.

Implement authentication token expiration

Without changing Django Rest Framework's Token model, an acceptable solution would be to check the token's age against a TOKEN_EXPIRATION_DAYS setting every time a user logs in, and delete the token and reject the login request if the token is too old.

What happens when a Source System is decommissioned?

If a user decommissions a Source System, that system will no longer submit incidents to Argus. However, it is assumed the user still wants to keep the incident history in Argus. How can this be supported by Argus? Thoughts:

Most importantly, Argus should be able to prevent future Incidents to be posted on behalf of the decommissioned Source System - but how?
- User deletes the Source System entry from Argus: This would either cascade-delete all the stored incidents related to the source (a big no-no), or remove the relation from the existing Incidents, thereby fully or partially losing history.
- Signal a "decommissioned state" for an Source System by either
  - Deleting the Source's associated User account.
  - Setting a boolean flag and/or timestamp on the Source System record to indicate its time of decommission.

post_save signal reciever won't get correct Incident instance field values

end_time...

For (at least) dev, it should be possible to log in with Feide, bypassing the frontend entirely

Needed: a single login-page, or changing the admin login-page to also include Feide.

Rename Alert model to Incident

The word has come down from the Uninett Service Center that we should try to be more compliant with ITIL terminology, and so to avoid confusion with existing ITIL terms, the model currently called Alert should be renamed to Incident.

This has implications for the previously renamed AlertSource as well - it should either be renamed IncidentSource, or just Source (as was one of the original suggestions). We could also browse the ITIL terminology dictionary to see if there is terminology also for this (the dictionary has been posted to our internal channels).

Upgrade Django to version 3

Research how Zabbix alerts can be exported

Enable AlertSource to be associated with a (system) User.

For each AlertSource, a User should be created, and the two should be associated. This User should not be able to log in normally through the web interface, but will still be assigned an API token.

This enables the AlertSource to be identified using the API token as it sends events into the API.

Assign source to events/alerts based on the API token used by the client

ATM, the alert POST API endpoint is hardcoded to expect any posted alert to be a NAV alert from the first registered Alert Source instance:

Argus/src/aas/alert/views.py

Lines 37 to 43 in 22e1181

 def post(self, request, *args, **kwargs): 

 created_alerts = [ 

 mappings.create_alert_from_json( 

 json_dict, NetworkSystem.NAV 

 ) # TODO: interpret network system type from alerts' source IP? 

 for json_dict in request.data 

 ]

That's just silly. Once #48 is implemented, the API endpoint that receives events from a client can identify the AlertSource from the API token used.

Implement filtering on tags in Filter model

A couple ideas:

The filter strings could take inspiration from Django's query syntax with double underscores, e.g.:
- "tags__key": ["nav_problem", "zabbix_problem"]
- "tags__value": ["Netbox 4", "boxDown"]
  No double underscores would filter on both key AND value:
- "tags": ["object=Netbox 4", "problem_type=boxDown"]
  And we could potentially also add support for e.g. regex:
- "tags__value__regex": ["Netbox [0-9]+", "^https://argus\..*"]

The filter strings could consist of a dictionary, e.g.:

"tags": [
    {
        "key": "nav_problem"
    },
    {
        "key": "zabbix_problem"
    },
]

"tags": [
    {
        "key": "object"
        "value": "Netbox 4"
    },
    {
        "value": "boxDown"
    },
]

Or a string instead of a dictionary, which would filter on both key AND value:

"tags": ["object=Netbox 4", "problem_type=boxDown"]

Both of these ideas could benefit from being more fleshed out.

Implement a model and API for events

A "start event" would create/open a new incident, an "end event" would close the incident, and other events would simply provide a log of things connected to the incident. The latter would possibly include acknowledgements (see #72), which could automatically create an event that they're connected to.

The Event model should probably include these fields:

A relation to an Incident
A relation to the SourceSystem the event came from
A timestamp
An enum for the type of event (incident_start/incident_end/close/acknowledge/other)
A description of the event

Revise data model for short text fields

During my attempt at bulk loading some alerts into AAS for demo purposes, I received a crash error indicating that some field (unclear which one) only accepts 100 characters. These kinds of restrictions are kind of silly, since it is rare to know in advance what the real world limits on text strings are, if any. This is why we have strived to remove most of these restrictrions from NAV over the years.

Here is the traceback to aid in potential debugging (no test data available, as multiple alerts were attemtped in the same request, and they consist mostly of confidential internal data):

DataError at /api/v1/alerts/
value too long for type character varying(100)


Request Method: POST
Request URL: https://REDACTED.paas2.uninett.no/api/v1/alerts/
Django Version: 2.2.13
Python Executable: /usr/local/bin/python
Python Version: 3.8.3
Python Path: ['/app/aas', '/usr/local/bin', '/app', '/app/aas/src', '/usr/local/lib/python38.zip', '/usr/local/lib/python3.8', '/usr/local/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/site-packages', '/app/aas/aas/src']
Server time: Thu, 11 Jun 2020 09:05:59 +0200
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'corsheaders',
 'social_django',
 'rest_framework',
 'rest_framework.authtoken',
 'aas.auth',
 'aas.alert',
 'aas.notificationprofile']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'whitenoise.middleware.WhiteNoiseMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'corsheaders.middleware.CorsMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'social_django.middleware.SocialAuthExceptionMiddleware']


Traceback:

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in get_or_create
  538.             return self.get(**kwargs), False

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in get
  406.             raise self.model.DoesNotExist(

During handling of the above exception (Object matching query does not exist.), another exception occurred:

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in _execute
  84.                 return self.cursor.execute(sql, params)

The above exception (value too long for type character varying(100)
) was the direct cause of the following exception:

File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py" in wrapped_view
  54.         return view_func(*args, **kwargs)

File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py" in view
  71.             return self.dispatch(request, *args, **kwargs)

File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py" in dispatch
  505.             response = self.handle_exception(exc)

File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py" in handle_exception
  465.             self.raise_uncaught_exception(exc)

File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py" in raise_uncaught_exception
  476.         raise exc

File "/usr/local/lib/python3.8/site-packages/rest_framework/views.py" in dispatch
  502.             response = handler(request, *args, **kwargs)

File "/app/aas/src/aas/alert/views.py" in post
  38.         created_alerts = [

File "/app/aas/src/aas/alert/views.py" in <listcomp>
  39.             mappings.create_alert_from_json(

File "/app/aas/src/aas/alert/mappings.py" in create_alert_from_json
  261.     return mapping.create_model_obj_from_json(json_dict)

File "/app/aas/src/aas/alert/mappings.py" in create_model_obj_from_json
  153.         alert_kwargs = {

File "/app/aas/src/aas/alert/mappings.py" in <dictcomp>
  154.             field_name: field_value_getter.get_value_from_dict(json_dict)

File "/app/aas/src/aas/alert/mappings.py" in get_value_from_dict
  120.         foreign_model_obj, _created = self.foreign_model.objects.get_or_create(

File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py" in manager_method
  82.                 return getattr(self.get_queryset(), name)(*args, **kwargs)

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in get_or_create
  541.             return self._create_object_from_params(kwargs, params)

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in _create_object_from_params
  575.                 obj = self.create(**params)

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in create
  422.         obj.save(force_insert=True, using=self.db)

File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py" in save
  740.         self.save_base(using=using, force_insert=force_insert,

File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py" in save_base
  777.             updated = self._save_table(

File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py" in _save_table
  870.             result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)

File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py" in _do_insert
  907.         return manager._insert([self], fields=fields, return_id=update_pk,

File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py" in manager_method
  82.                 return getattr(self.get_queryset(), name)(*args, **kwargs)

File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py" in _insert
  1186.         return query.get_compiler(using=using).execute_sql(return_id)

File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py" in execute_sql
  1375.                 cursor.execute(sql, params)

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in execute
  99.             return super().execute(sql, params)

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in execute
  67.         return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in _execute_with_wrappers
  76.         return executor(sql, params, many, context)

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in _execute
  84.                 return self.cursor.execute(sql, params)

File "/usr/local/lib/python3.8/site-packages/django/db/utils.py" in __exit__
  89.                 raise dj_exc_value.with_traceback(traceback) from exc_value

File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py" in _execute
  84.                 return self.cursor.execute(sql, params)

Exception Type: DataError at /api/v1/alerts/
Exception Value: value too long for type character varying(100)

Rename "active"->"open" and "inactive"->"closed" (incidents)

"Closed" is in the ITIL terminology list, and "open" could be a fitting antonym.

Nice to have management commands: fake an incident in dev

Make a sender-independent API endpoint for inputting incidents and events

Depends on #49, #58, #52, #45.

This will make glue-services (#44) possible.

Change object hierarchy model to a tag based model

I'm opening this issue to further discuss this specific idea from today's design meeting.

The original purpose of Object (and ParentObject, for that matter) was to have some representation in Argus of objects that Alert Sources are referring to in the reported alerts - for purposes of:

Being able to correlate objects that are represented in multiple sources.
Being able to link back to the source system's UI for more information about an object.

However, hierarchies can be fickle things, and Argus can only model what has been part of a received alert. It is also unclear how the hierarchies can be used when constructing Argus filters (one cannot even know in advance if the hierarchies present when constructing a filter are still valid at the time an alert comes in). It is also hard to construct a filter based on a hierarchy, if there is not hierarchy information in Argus yet (because the source has not submitted any alerts yet).

One idea that was discussed at today's meeting is the idea of using the age-old tag concept instead. Tags need not be hierarchical at all, and do not need to carry specific meaning with Argus. Tags could be variable=value pairs, meaning the glue services can submit alerts with tag combinations that describe an alert in a manner that can be used in filters. E.g.

host=example-gw.example.org, interface=GigabitEthernet3/1, location=Trondheim
service=DNS, customer=example.org

In this case, filters could be constructed for almost any case, making Argus flexible enough to accept any kind of source system.

Make an AbstractYesNoFilter for the admin

A YesNoFilter basically checks if a value is null or not. Instances in the code: has_phone_number, is_stateful, is_active.

NetworkSystem doesn't differ between multiple senders of the same `type` and `name`

Currently, all alerts of the same type are merged into the same stream, with no way to find back to the original sender. There can be multiple NAVs sending, and multiple Zabbixes etc.. If there is ever an email-to-API translator (for for instance CRON) which host sent the email is rather vital.

Add AlertSourceType model

AlertSource.type is currently a "choice field" with hardcoded choices. We should convert this into a foreign key to a new model - AlertSourceType, which enables users to register their own alert source types. This should be done in conjunction with adding glue services for incoming alerts (see #44), instead of having code in the backend for every alert source type, as is currently facilitated for (see mapping.py).

Allow end-users to set priority depending on a filter

Some problems are more important than others. If Very Important Switch has trouble, that needs a higher severity than Not Very Important Switch for the same kind of trouble. The converter inputting incidents cannot know the end-user's priorities, so there needs to be a way to mark some incidents as more important than others. This way, the next time something resembling that incident happens, it is automatically flagged correctly.

Should users be able to set the end_time of an existing incident to a specific time in the past/future?

Use a proper 12 factor environment layer to set deployment-specific settings

There are many packages on pypi for this. Needs:

As few dependencies as possible
Good license
Bonus: can handle at least DATABASE settings and EMAIL_* settings as urls

	def post(self, request, args, *kwargs):
	created_alerts = [
	mappings.create_alert_from_json(
	json_dict, NetworkSystem.NAV
	) # TODO: interpret network system type from alerts' source IP?
	for json_dict in request.data
	]