The audit from wiggin77

Logging

In order to not reinvent the wheel, I think we should consider Logstash and something like: https://godoc.org/github.com/cheshir/logrus-logstash-hook

Audit/log questions

General Observations:

would’ve been great to have a requirements/“why” section since there is no PM spec/link available. That would have helped clarify the rationale behind some of the items in the Objectives section. Are we doing a redo/v2 of Auditing because of identified perf issues, data completeness issues, spottiness, customer complaints, etc.
having separate sections for auditing and logging parts would have helped (me) identify the commonalities and differences in requirements/solutions - are the objectives for logging the same as for auditing (in terms of reliability, quickness of access, time length availability of entries, etc.)
would’ve been great to have several design alternatives being presented, with their pros/cons. A lot of emphasis is on making the logging asynchronous. Is that to make the logging quicker, more reliable, reduce the load on the mid-tier, etc.? Queue integration/management introduces its own set of issues, would’ve been great to call out the pros/cons of more than one approach.
is one of the requirements to be able to store the audits in more than one place? (file, email, DB table)? If yes, can it be achieved in some other way (e.g. store in the DB first and export to other output types/stores using some ETL job)? Is it a customer/compliance request to reconcile the different audit storages?
wrt to events capture, one example of alternate approach could’ve been to capture the high-frequency or low/system-level audit events in SQL (as part of the same transaction that does the actual action, since most of the audit fields would be available in the SQL transaction). Would that approach solve some of the concerns wrt to correctness, immutability, etc. or would that introduce unacceptable performance issues? I’m not advocating one approach over another, just evaluating/measuring alternatives with pros/cons being considered would help.
would’ve been great to gather some metrics to drive decisions if possible - like the amount of data we expect to accumulate hourly/daily per company size, frequency, burstiness, etc.
do we see the audit/log store as a silo or are we planning of integrating with other event-consuming apps on the customer side (like data analytics/graph apps, etc.) - in which case a producer-consumer kind of approach is prob. more suitable.

Audit event types:

having a more complete list of event types we plan to capture at this moment would help identify differences/commonalities, the level (system or user) + frequency of the action being captured - driving what fields we need to capture in the schema. A higher-level user action can get translated into several low-level events in the SQL - how comprehensive do we want to be?
do we plan to also allow capturing of client-only events (not sure if currently InvitePeople is a client-only action but we could prob. think of user actions that don't always translate in API calls).
could these events be eventually queried through an API and/or processed and fed back in the MM app (for example # of posts in the last hour), to enhance/build new features on top (e.g. spotlight “hot” channels). Having them sourced out of the audit/logging data would avoid putting extra load on the main user activity flows.
will we allow plugins to define custom events (describing specific actions they are interested to capture/audit), independently of our API-based set, that we would then store on their behalf?

API:

do we plan to have an API/build queries on top of the audit/log stores (for debugging or by eventid) If yes, how would that impact the schema of the data being stored (like do we store the metadata as a json blob or do we have a “user->action->object” type of structure, etc.). Would also help to monitor/build histograms of most common log warnings/errors in a specific time-frame to detect/alert on regressions.

Operations on the audit entries:

do we plan to add PII scrubbing for user-identifiable data (email, name, etc.)?
data trimming - is there a timeframe for keeping the audit entries or do we plan data-trimming jobs to remove entries periodically.
do we plan to allow turning on/off auditing/logging by entity type (e.g. per team/channel, etc.)
since you mention that "This means certain code paths will emit multiple audit records for the same event”, are we planning to do any coalescing of audit entries (by time interval or sessionId) to reduce storage space for example.

Schema
Id- is that a UUID/GUID?
ObjectId - capturing the object on which the event mightObject.id (team/channel, etc) or is that captured by the Meta map?

Cloud:

would help to discuss any potential cloud-implications of the current design (e.g. data partitioning, monitoring, etc.). Do we see cloud presence as taking the existing audit/log setup and move it into a cloud env. or do we plan to make structural changes to make it cloud-first/native.

Some comments

Auditing will be done with a dedicated API but utilize the logging engine for storage.

That means that is not going to be affected by the Logging engine level's configuration right?

For example, when logging a struct asynchronously care must be taken to ensure the contents of the data being logged does not change while the log record is being created and formatted

Could you please provide an example of this case so I can understand better?

what to do when the queue is full.

I'd be also good, if is not on the list, to know how to react when we are not able to send to the audit/logging API because discard the messages doesn't seem like an option. I don't if we're using message queues that let us persist the messages but could be a good idea.

Queue(s) will be monitored for percent full and periodically reported.

I don't know if you have it in mind but could be a good idea to add metrics for this as well

Data Model

I've seen that the Meta field is map[string]string. Why not map[string]interface{}? Is it related to the data transformation pre-queue?

wiggin77 / audit Goto Github PK

audit's People

Contributors

Stargazers

Watchers

audit's Issues

Logging

Audit/log questions

Some comments

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs