Logging using ETW and EventSource

This project aims to provide a suite of tools for using .NET's EventSource to perform logging within applications. Along with systems for logging to a variety of destinations (memory, console, disk, network) tools are provided which wrap TraceEvent to provide a streamlined experience for parsing ETW data both from disk and realtime sessions.

Additional documentation is available in the doc directory.

What's In the Box

The core library provides the following major types to facilitate reading and writing logs:

LogManager: The core type used for managing one or more log destinations. It can either be provided with XML-based configuration or controlled through sets of APIs to create, query, and tear down logging sessions. Available facilities include:
- Activity ID management (setting ETW activity IDs)
- Configuration management (providing either configuration strings or files with configuration contents).
- Session management (creating, querying, and destroying individual logging sessions).
ETWfileProcessor, ETWRealtimeProcessor: Provide facilities for reading event streams from either a an existing file or a realtime listening session. All events are wrapped in ETWEvent objects which provide wrappers to the objects presented by TraceEvent with a focus on making uniform access to named parameters for an event trivial no matter the source of the ETW event.
ExpiringCompositeEventCollection: Provides a wrapper type for building complex views on top of many ETW events with a common key (e.g. the same ActivityID). This type helps by handling 'expiration' of multi-event data by allowing the defined type to declare when it is complete and, even in the case of a lossy event stream, expiring out events which never completed after a given period.

Logging Philosophy

The below is sampled from the original documentation within Bing for this library and explains a bit of why this method was chosen for this logging implementation.

In many applications logging is treated as an infrequent activity composed of complex monolithic log statements containing large amounts of disparate data. A typical example would be a statement like Processed and responded to request <x> in time <y> with <z> byte response. Within this statement three pieces of crucial data exist, along with an array of metadata about what happened (Processed and responded, along with the actual meanings of x, y, and z.)

However, this method of logging presents some challenges and opportunities for improvement. During application debugging it is often very helpful to have a timeline of detailed events. While it is possible to reverse engineer this timeline from a small set of monolithic statements it can be difficult, and some information must either be preserved for the duration of the timeline or lost. For example, in order to provide time <y> `above you must keep a start time for the action. You may also wish to know what thread(s) have touched a request, or other additional information. You must then store all this data and emit it in a fashion that allows you to reassemble the timeline of activity.

In contrast to the monolithic approach it is possible to emit a small message relating to very small and specific actions and combine those actions into a summary or timeline of the activity. This is relatively simple given a simple common key (for example a GUID) across all actions common to an overall activity. This also provides a mechanism for "filtering" the timeline for only specific actions of interest (perhaps you do not care about the size of the reply, or perhaps you do not care about the duration of the activity.) This library provides a facility for simplifying this through the ExpiringCompositiveEventCollection type.

The drawback to writing atomic events has always been the cost of writing many events. This cost manifests both as a runtime cost (each event imposes an I/O penalty) and a size cost (it is usually more space-efficient to write the monolithic event with a single copy of all the pertinent data.) However, on modern hardware with advanced systems the absolute cost becomes negligible. In particular on consumer-grade hardware ETW has proven itself capable of writing over 500,000 distinct events per second with minimal penalties to the process performing the writing and excellent performance.

Given this we are choosing to follow a model of many atomic events with offline processing to "join" them as needed. So while the typical logging practice has long been collect-and-write we are encouraging users to split their activities into individual actions and write them as they occur.

ETW Primer

This MSDN Magazine article presents a nice in-depth overview of ETW. Take a look at it, there's an abbreviated summary here as well.

ETW (Event Tracing for Windows) is an eventing/tracing system composed of providers and trace sessions. Providers emit lightly schematized events composed of zero or more pieces of atomic data (strings, integers), and each event is marked up with metadata describing its severity and 'keywords' (in the form of distinct bits in a 64 bit integer) to help categorize events. Sessions subscribe to events from one or more providers, with per-provider filters on the aforementioned severity and keywords.

Providers

Every ETW provider is uniquely identified by a GUID. The GUID is how providers are subscribed to in tracing sessions, and must be registered with the system when the provider is available.

Within a provider one or more events may be emitted. These events are described by a manifest which describes the provider as a whole and its individual events.

Individual events definitions consist of the below information. Some additional metadata has been left off this list as it is not commonly used in modern ETW scenarios.

An ID from 1 - 65535 (unsigned 16 bit integer)
The severity of the event (verbose, informational, warning, error, or critical)
A 64 bit field in which each bit is a 'keyword' used for categorizing the event. An event may have 0 or more keywords. The upper 16 bits are considered reserved for the operating system, providing an effective per-provider space of 48 keywords.
An optional 8 bit 'Opcode' value that can be used to categorize the event in the scope of a larger operation. E.g. a provider may have two events for the same task, one indicating the beginning and one the end, with opcodes of 'start' and 'stop' respectively. Opcodes may also be user-defined.
An optional 32 bit 'Task' value that maps a specific event to a certain task.
An optionl 8 bit 'Version' value used for coordinating revisions to the above metadata.

An event has zero or more embedded pieces of data. ETW supports weak structuring using individual arguments with basic types (8 and 16 bit character strings, integers, floating point values, and byte segments). Complex/nested structures are not natively supported, in those cases either text (e.g. JSON) or binary (e.g. Bond) serialization is recommended.

Finally, the ETW contract implies that you will not change the metadata above at runtime. That is, once an event has a particular set of keywords, severity, task, et cetera this won't change for the duration of that provider's lifetime. This means that events do not dynamically change severity / opcode / et cetera. This data is all encoded with the manifest and ETW parsers will be broken if runtime changes are made that break with the declaration of the manifest.

Sessions

ETW trace sessions come in two broad flavors: realtime and file-backed. In addition sessions may either be kernel-mode or user-mode.

Realtime sessions use a hidden backing file as a circular buffer and expect somebody attached to the session to pick up the events as they are emitted. These are broadly useful for work such as sampling, performance measurement of apps, et cetera.

File-backed sessions use on-disk files to emit binary ETW data which may be processed once the file has been closed. For user-mode file backed sessions only a single session per process may be open. Additionally, if the process exits the session necessarily terminates. Kernel-mode sessions stay active until they are manually terminated or the operating system shuts down.

Within each session one or more providers may be subscribed to. For each provider subscribed to in the session events from that provider can be filtered based on their severity and keywords. No other metadata (ID, opcode, et cetera) can be used for filtering. In particular this means you cannot get "only events 1-10", or "only events for task 57", or "only events where the first string argument is abcdefg."

Reading ETL Logs

The LogTool utility provides code to build an executable called 'ELT' (formerly 'BLT' when this code was Bing-internal). This tool is both meant to demonstrate the various library facilities provided by the codebase and to present an easy-to-use command line interface for interacting with ETW sessions. It can be used to dump the contents of any ETW file complete with parsed arguments (e.g. kernel, any EventSource provider, and any registered ETW provider). The data can be dumped either as an easy-to-read text format or in JSON or XML.

You can also use the tool to stand up realtime listening sessions that emit to the console for quick run-time debugging.

Building and testing the code

The code is currently developed using Visual Studio 2015. Some C# 6 language features are used. Additionally the code currently only works on Windows (as it makes extensive use of Event Tracing For Windows).

In order to run all unit tests Visual Studio must be started as an elevated process. If it is not run elevated some tests will exit as 'inconclusive' since they rely on the ability to create ETW sessions or HTTP listeners.

What API souuld I use to consume ETW?

What API souuld I use to consume ETW:

System.Diagnostics.Tracing.EventSource (BCL)
Microsoft.Diagnostics.Tracing.EventSource (NuGet)
Microsoft.Diagnostics.Tracing.TraceEvent (NuGet)
krabsetw (NuGet)
?
And where Microsoft.Diagnostics.Tracing.Logging fits in?

I read this at https://blogs.technet.microsoft.com/office365security/hidden-treasure-intrusion-detection-with-etw-part-2/:

TraceEvent is a library used by the PerfView tool and has the benefits of being a well-designed .NET API. Unfortunately, it doesn’t perform well for scenarios where we want to keep memory usage to a minimum. System.Diagnostics.Tracing has the advantage of being part of the .NET BCL but we’ve observed intermittent exceptions and unexpected behavior in the past. Additionally, it suffers from the same memory consumption issue that TraceEvent does.
In response to these challenges, Office 365 Security chose to implement our own API with three primary goals:
•Intuitive and flexible API
•High performance – filtering events in the native layer
•Available both in .NET and native C++
The result of this work is krabsetw, a library we’ve open-sourced under the MIT license. It contains both a native C++ API as well as a .NET API. This library is used in production today across Office 365 workloads on more than 100,000 machines. With filtering, we’re able to process more than more than 500 billion events per day, generating more than 7TB of data per day across the machines.

Is this still true?.. It is written one year ago (May 9, 2017), and other API's have updates since.
I want to know what API I can depend on, if not now, at least in upcoming updates.

Another question: Do I have to use these NuGets too?
Microsoft.Diagnostics.Tracing.TraceEvent.SupportFiles
Microsoft.Diagnostics.Tracing.EventRegister

microsoft / microsoft.diagnostics.tracing.logging Goto Github PK

microsoft.diagnostics.tracing.logging's Introduction

Logging using ETW and EventSource

What's In the Box

Logging Philosophy

ETW Primer

Providers

Sessions

Reading ETL Logs

Building and testing the code

microsoft.diagnostics.tracing.logging's People

Contributors

Stargazers

Watchers

Forkers

microsoft.diagnostics.tracing.logging's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs