autometrics-dev / autometrics-rs Goto Github PK

Easily add metrics to your code that actually help you spot and debug issues in production. Built on Prometheus and OpenTelemetry.

Home Page: https://autometrics.dev

License: Apache License 2.0

Rust 100.00%

metrics monitoring observability opentelemetry prometheus rust telemetry tracing

autometrics-rs's Introduction

Metrics are a powerful and cost-efficient tool for understanding the health and performance of your code in production. But it's hard to decide what metrics to track and even harder to write queries to understand the data.

Autometrics provides a macro that makes it trivial to instrument any function with the most useful metrics: request rate, error rate, and latency. It standardizes these metrics and then generates powerful Prometheus queries based on your function details to help you quickly identify and debug issues in production.

Benefits

✨ #[autometrics] macro adds useful metrics to any function or impl block, without you thinking about what metrics to collect
💡 Generates powerful Prometheus queries to help quickly identify and debug issues in production
🔗 Injects links to live Prometheus charts directly into each function's doc comments
📊 Grafana dashboards work without configuration to visualize the performance of functions & SLOs
🔍 Correlates your code's version with metrics to help identify commits that introduced errors or latency
📏 Standardizes metrics across services and teams to improve debugging
⚖️ Function-level metrics provide useful granularity without exploding cardinality
⚡ Minimal runtime overhead

Advanced Features

🚨 Define alerts using SLO best practices directly in your source code
📍 Attach exemplars automatically to connect metrics with traces
⚙️ Configurable metric collection library (opentelemetry, prometheus, prometheus-client or metrics)

See autometrics.dev for more details on the ideas behind autometrics.

Example + Demo

use autometrics::autometrics;

#[autometrics]
pub async fn create_user() {
  // Now this function produces metrics! 📈
}

Here is a demo of jumping from function docs to live Prometheus charts:

Autometrics.Demo.mp4

Quickstart

Add autometrics to your project:

cargo add autometrics --features=prometheus-exporter

Instrument your functions with the #[autometrics] macro
```
use autometrics::autometrics;

// Just add the autometrics annotation to your functions
#[autometrics]
pub async fn my_function() {
  // Now this function produces metrics!
}

struct MyStruct;

// You can also instrument whole impl blocks
#[autometrics]
impl MyStruct {
  pub fn my_method() {
    // This method produces metrics too!
  }
}
```
Tip: Adding autometrics to all functions using the tracing::instrument macro

You can use a search and replace to add autometrics to all functions instrumented with tracing::instrument.

Replace:
```
#[instrument]
```
With:
```
#[instrument]
#[autometrics]
```
And then let Rust Analyzer tell you which files you need to add use autometrics::autometrics at the top of.
Tip: Adding autometrics to all pub functions (not necessarily recommended 😅)

You can use a search and replace to add autometrics to all public functions. Yes, this is a bit nuts.

Use a regular expression search to replace:
```
(pub (?:async)? fn.*)
```
With:
```
#[autometrics]
$1
```
And then let Rust Analyzer tell you which files you need to add use autometrics::autometrics at the top of.
Export the metrics for Prometheus
For projects not currently using Prometheus metrics

Autometrics includes optional functions to help collect and prepare metrics to be collected by Prometheus.

In your main function, initialize the prometheus_exporter:
```
pub fn main() {
  prometheus_exporter::init();
  // ...
}
```
And create a route on your API (probably mounted under /metrics) that returns the following:
```
use autometrics::prometheus_exporter::{self, PrometheusResponse};

/// Export metrics for Prometheus to scrape
pub fn get_metrics() -> PrometheusResponse {
  prometheus_exporter::encode_http_response()
}
```
For projects already using custom Prometheus metrics

Configure autometrics to use the same underlying metrics library you use with the feature flag corresponding to the crate and version you are using.
```
[dependencies]
autometrics = {
  version = "*",
  features = ["prometheus-0_13"],
  default-features = false
}
```
The autometrics metrics will be produced alongside yours.

Note

You must ensure that you are using the exact same version of the library as autometrics. If not, the autometrics metrics will not appear in your exported metrics. This is because Cargo will include both versions of the crate and the global statics used for the metrics registry will be different.

You do not need to use the Prometheus exporter functions this library provides (you can leave out the prometheus-exporter feature flag) and you do not need a separate endpoint for autometrics' metrics.
Run Prometheus locally with the Autometrics CLI or configure it manually to scrape your metrics endpoint
(Optional) If you have Grafana, import the Autometrics dashboards for an overview and detailed view of the function metrics

API Docs

Examples

To see autometrics in action:

Install prometheus locally or download the Autometrics CLI which will install and configure Prometheus for you locally.
Run the complete example:
```
cargo run -p example-full-api
```
Hover over the function names to see the generated query links (like in the image above) and view the Prometheus charts

Benchmarks

Using each of the following metrics libraries, tracking metrics with the autometrics macro adds approximately:

prometheus-0_13: 140-150 nanoseconds
prometheus-client-0_21: 150-250 nanoseconds
metrics-0_21: 550-650 nanoseconds
opentelemetry-0_20: 1700-2100 nanoseconds

These were calculated on a 2021 MacBook Pro with the M1 Max chip and 64 GB of RAM.

To run the benchmarks yourself, run the following command, replacing BACKEND with the metrics library of your choice:

cargo bench --features prometheus-exporter,BACKEND

Contributing

Issues, feature suggestions, and pull requests are very welcome!

If you are interested in getting involved:

Join the conversation on Discord
Ask questions and share ideas in the Github Discussions
Take a look at the overall Autometrics Project Roadmap

autometrics-rs's People

Contributors

Stargazers

Watchers

autometrics-rs's Issues

`#[autometrics]` does not work well with `#[async_trait]`

When using #[autometrics] to instrument implementations of traits marked with #[async_trait], depending on where you place the macro, either the recorded duration is incorrect or it does not compile.

#[async_trait] is widely used and it would be helpful if it was at least documented that they can't work together.

Some examples...

main

  #[tokio::main]                                                                                                                                                          
  pub async fn main() {                                                                                                                                                   
      prometheus_exporter::init();                                                                                                                                        
                                                                                                                                                                          
      let app = Router::new()                                                                                                                                             
          .route(                                                                                                                                                         
              "/async",                                                                                                                                                   
             get(|| async { <AsyncServiceImpl as AsyncService>::async_function().await }),                                                                               
          )                                                                                                                                                               
          .route(                                                                                                                                                         
              "/metrics",                                                                                                                                                 
              get(|| async { prometheus_exporter::encode_http_response() }),                                                                                              
          );                                                                                                                                                              
      Server::bind(&([127, 0, 0, 1], 8080).into())                                                                                                                        
          .serve(app.into_make_service())                                                                                                                                 
          .await                                                                                                                                                          
          .unwrap();                                                                                                                                                      
  }

1. `#[autometrics]` on `fn`

  #[async_trait::async_trait]                                                                                                                                             
  impl AsyncService for AsyncServiceImpl {                                                                                                                                
      #[autometrics]                                                                                                                                                      
      async fn async_function() -> Result<(), ()> {                                                                                                                       
          tokio::time::sleep(Duration::from_secs(3)).await;                                                                                                               
          Ok(())                                                                                                                                                          
      }                                                                                                                                                                   
  }

This results in the wrong duration reported

function_calls_duration_seconds_sum{function="async_function",module="scratch",objective_latency_threshold="",objective_name="",objective_percentile="",service_name="autometrics"} 0.000000571

2. `#[autometrics]` on `impl` block after `async_trait`

  #[async_trait::async_trait]                                                                                                                                             
  #[autometrics]                                                                                                                                                          
  impl AsyncService for AsyncServiceImpl {                                                                                                                                
      async fn async_function() -> Result<(), ()> {                                                                                                                       
          tokio::time::sleep(Duration::from_secs(3)).await;                                                                                                               
          Ok(())                                                                                                                                                          
      }                                                                                                                                                                   
  }

This also results in the incorrect duration

function_calls_duration_seconds_sum{function="AsyncServiceImpl::async_function",module="scratch",objective_latency_threshold="",objective_name="",objective_percentile="",service_name="autometrics"} 0.000000671

3. `#[autometrics]` on `impl` block before `async_trait`

  #[autometrics]                                                                                                                                                          
  #[async_trait::async_trait]                                                                                                                                             
  impl AsyncService for AsyncServiceImpl {                                                                                                                                
      async fn async_function() -> Result<(), ()> {                                                                                                                       
          tokio::time::sleep(Duration::from_secs(3)).await;                                                                                                               
          Ok(())                                                                                                                                                          
      }                                                                                                                                                                   
  }

This fails to compile with error:

error: expected `fn`
  --> src/main.rs:25:1
   |
25 | impl AsyncService for AsyncServiceImpl {
   | ^^^^

returning `Result<impl Trait>` from a `#[autometrics]`'d function results in compile error

if one has a function that looks like this:

#[autometrics]
async fn hello() -> Result<impl ToString, std::io::Error> {
    // ...
}

the compiler will fail to compile it with:

error[E0562]: `impl Trait` only allowed in function and inherent method return types, not in variable bindings
  --> src/main.rs:11:28
   |
11 | async fn hello() -> Result<impl ToString, std::io::Error> {
   |                            ^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0562`.
error: could not compile `playground` (bin "playground") due to previous error

removing the #[autometrics] macro ends up making the program compile successfully.

playgrounds:

smallest example: https://www.rustexplorer.com/b/52t7cy
with actix-web: https://www.rustexplorer.com/b/4vsrgc

Support `AUTOMETRICS_DOCS_GEN=0` environment variable

The purpose of this environment variable is to disable documentation generation by the macro. The VS Code extension would set this environment variable automatically in the Rust Analyzer settings, so that it becomes solely responsible for generating documentation tooltips.

This is a blocker for autometrics-dev/vscode-autometrics#27

Feel free to suggest a better name still 😅

Feature flag to use the `metrics` crate for creating metrics

Currently, we use the opentelemetry to export metrics. If people are already using the metrics crate, it would be useful to have a feature flag that changes the macro behavior to insert function-level metrics using that crate instead.

Simplify description in readme

Autometrics is a simple library that gives a lot of power. However, the docs currently have a lot of details in them, which makes it feel more complicated. We should shorten the description in the readme to make it seem simpler.

Support aliases for function & module names (to handle refactors)

If you refactor your code and change the names of functions or modules, you might want to keep some continuity with the metric names.

@akesling suggested allowing users to specify an alias for each function such that the metrics would be produced with the new and the old function/module names. That way, you could keep using the old name and then transition to the new name later and remove the alias once you've moved everything over.

I like the idea of having a parameter for the macro along the lines of function_alias="other_function_name" or function_aliases=["function_1", "function_2"] and the equivalent for module_alias and module_aliases. We could then duplicate the tracking code so it produces the metrics under all of the given names.

Should we switch the `module` label to use the file path?

Currently, the value of the module label is the result of calling module_path!(), which returns the crate::module::submodule syntax.

If we wanted to standardize the value of this label across implementations (autometrics-dev/discussions/5), we could switch this library to use the file!() macro and strip off the env!("CARGO_MANIFEST_DIR") prefix to get the path from the crate root.

Add support for the `prometheus-client` crate

https://crates.io/crates/prometheus-client

[bug] too many open files

When I start Autometrics CLI I get an error

cmd

# am start http://localhost:8902/prom_metrics

error log

Checking if provided metrics endpoints work...
Failed to make request to http://localhost:8902/prom_metrics (job am_0)
Now sampling the following endpoints for metrics: http://localhost:8902/prom_metrics
Explorer endpoint: http://127.0.0.1:6789
Prometheus endpoint: http://127.0.0.1:9090/prometheus
Using Prometheus version: 2.45.0
Starting prometheus
Error proxying request: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9090), path: "/prometheus/api/v1/query_range", query: Some("query=sum+by+%28alertname%2C+objective_name%2C+severity%2C+category%2C+objective_percentile%2C+sloth_id%29+%28ALERTS%29&start=2023-10-18T02%3A53%3A47.240Z&end=2023-10-18T02%3A54%3A18.242Z&step=1s"), fragment: None }, source: hyper::Error(Connect, ConnectError("tcp open error", Os { code: 24, kind: Uncategorized, message: "Too many open files" })) }
Error proxying request: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9090), path: "/prometheus/api/v1/query_range", query: Some("query=sum+by+%28alertname%2C+objective_name%2C+severity%2C+category%2C+objective_percentile%2C+sloth_id%29+%28ALERTS%29&start=2023-10-18T02%3A54%3A18.242Z&end=2023-10-18T02%3A54%3A49.238Z&step=1s"), fragment: None }, source: hyper::Error(Connect, ConnectError("tcp open error", Os { code: 24, kind: Uncategorized, message: "Too many open files" })) }

README Tweaks

The "For projects already using custom Prometheus metrics" section could use a few tweaks, noting them here not to forget.

The toml snippet to add autometrics doesn't work. "Inline tables" need to be on a single line, not multiple ones, at least my version of Cargo didn't want to accept the current form as is. we should rather have a single line version

[dependencies]
autometrics = { version = "*", features = ["prometheus-0_13"], default-features = false }

The "Configure autometrics" link goes to https://docs.rs/autometrics/latest/autometrics/#metrics-libraries and as of today the anchor does not exist on the docs.rs page so it only jumps to the top of the documentation which makes the link less helpful.

Make examples start and manage dockerized Prometheus if Prometheus is missing

Oh I see, so the point of this PR is to automatically start Prometheus in a Docker container if it's missing, instead of having to have Prometheus installed on the computer.

The title of the PR made me think the goal was to connect to something that has been running for months on the machine already, with configured targets, that's why I was confused.

If that's the case, then a simple solution would be to write an extra /util/prometheus.docker.internal.yml config file, that configures scraping targets to point to host.docker.internal instead of localhost, and tell MacOS/Windows users to use this file instead of the current one (I haven't tested it yet, but I'm fairly confident it would work)

A (more complicated) solution could be to:

catch when the prometheus binary is missing,

if that's the case, directly use std::process::Command to spawn the container with the command,

use the Drop implementation of the ChildGuard to kill the spawned command (which will kill the Prometheus container)

The more complex solution still needs the simple version as a stepping stone to work, i.e. using a if cfg! statement to choose whether to use util/prometheus.yml or util/prometheus.docker.yml as the volume mount input argument basically. But if we're able to pull that off, it would keep the examples self contained (so no need to read the docs and find and copy/paste the docker run command), while also working on all major OSes even if you don't have Prometheus installed. What do you think?

If you don't want to commit more time to this though it's fine, we can put in the docs that it'll only work on Linux, and I'll patch the missing bits for different OSes.

(I also noticed that the util/prometheus.yml file also references the autometrics recording and alerting rules files, which allows to see how the instrumented code generates alerts (or not); but this file doesn't exist in this repo anymore so we'll tackle this in another issue)

Originally posted by @gagbo in #114 (comment)

This also means we need to figure out the ports for the example apps, and fix the autometrics rules file

Add a way to configure global settings

@akesling brought up that you might want to configure autometrics settings via a config file or flags, rather than only environment variables.

If we wanted to support this, I can imagine two potential APIs for doing this:

Export an init or SettingsBuilder from the autometrics crate root. You would be able to set these settings only once and they would initialize a OnceCell that would be used by all calls to the #[autometrics] macro.
Have an autometrics::settings module with setter methods for each setting we support. This would store the settings in a RwLock. Theoretically, you could set them at any time but it would be strongly recommended to set them before any metrics are actually collected.

So far, the use cases I can think of for these settings would be:

Setting the service.name label, without relying on environment variables as @akesling suggested in #116 (comment)
Adding other labels to all metrics produced (we could potentially limit them to pairs of &'static strs if we want to guarantee that the feature is not used for anything that could be very high cardinality)
Configuring the histogram buckets. Currently you cannot set the buckets when you're using the prometheus or prometheus-client crates, because they are set when the counters are initialized rather than on the exporter like for opentelemetry and metrics. This might be a minor issue but as a result, you cannot use the custom-objective-latencies feature flag with the prometheus or prometheus-client crates. If we had an API for setting the buckets, you would be able to use these feature flags together.
We could potentially allow users to control the functionality related to initializing the call counters to zero #119. We could keep the default behavior such that it's run on debug builds but you could also disable that if you want, or you could run it in release builds as well.
Configuring the metrics registry #20
Maybe others?

Note that this would not be able to configure the PROMETHEUS_URL used to generate the links inserted into the docs, because that needs to be available at macro expansion time.

Incorporate `tokio-metrics`.

Feature Request: Incorporate tokio-metrics into the exported metrics.

Specifically exposing the RuntimeMetrics automatically seems like low hanging fruit that most people who are monitoring their stack would like.

This is done in other runtimes already with most of go prometheus exporters doing this and with python exporters at least exporting the memory usage.

ability to define a trait for what 'ok' or 'err' means

building on this ticket #32

not having to decorate each function with this meaning and defining it once as a trait would be nice

Support applying autometrics to an `impl` block

In addition to adding the #[autometrics] annotation to a single function, we can enable adding it to a whole impl block. This would make it even easier to apply to a group of functions like HTTP or database handlers.

The macro just needs to be extended to take an ItemImpl and iterate through the methods applying the original macro.

Add `error_if` and `ok_if` parameters

Some functions return a type that is not a Result but does contain some notion of whether it was the "ok" or "error" outcome.

We should have parameters that let you call a function to determine whether the outcome was ok or error if you aren't returning a Result.

We might want to add special handling for HTTP responses. Though that might open a big can of worms.

`Label` derive macro

Right now, autometrics adds the return type as a label if the concrete types in a Result<T, E> implement Into<&'static str>.

It would be better if Autometrics had its own Label derive macro that you would use with your enums. That would make it more explicit that you're opting into making that a label.

One thing to consider: if you have a Result type, it currently uses the label ok="variant_name" or error="variant_name". If you wanted to include a label from a function parameter instead of the return value, we'd probably want the label to be something like type_name="variant_name". If we do that, should we change the behavior for Results so you have the same type_name="variant_name" label or is it helpful to have a standard error="variant_name label?

Handling singint for telemetry http endpoint in a Tonic gRPC service?

Hi,

thanks for this great project, it really makes telemetry a lot easier.

I'm porting currently a bunch of Golang services to Tokio/Rust while learning the Tokio/Tonic framework. I tried to add telemetry to one of my gRPC services, but bumped into an issue with handling sigint. Basically, without sigint, everything works just fine and I get the metrics straight into the dashboard. To make the service ready for Kubernetes deployment, a health and sigint handler is required so that k8s can manage the service correctly.

The problem arises, when I add a signint handler to the tonic gRPC service, it is not clear to me how to add the same signal channel to the web server that exposes the metric.

Again, I am still learning all this stuff and might got a few things wrong.

However, I made a fully self-contained example based on one of the online tutorials that illustrates the point.

Full code: https://github.com/marvin-hansen/tonic-autometrics/blob/main/src/main.rs

The relevant section.

    // Construct sigint signal handler for graceful shutdown
    let (signal_tx, signal_rx) = signal_channel();
    spawn(handle_sigterm(signal_tx));

    // Build gRPC server with health service and signal sigint handler
    let server = TonicServer::builder()
        .add_service(svc)
        .add_service(health_svc)
        .serve_with_shutdown(grpc_addr, async {
            signal_rx.await.ok();
        });

    // Start gRPC servedr
    // This one probably blocks the subsequent start of the web server. How do I start them either in Tandem?
    println!("Server listening on {}", grpc_addr);
    server
        .await
        .expect("Failed to start server");

    // Http handler that exposes metrics to Prometheus
    let app = Router::new().route("/", get(handler)).route(
        "/metrics",
        get(|| async { prometheus_exporter::encode_http_response() }),
    );

    // Web server with Axum
    // How do I add a graceful shutdown signal handler
    // that triggers a proper shutdown together with the gRPC server?
    axum::Server::bind(&web_addr)
        .serve(app.into_make_service())
        .await
        .expect("Web server failed");

Apparently, the Axum API doesn't have anything similar to the serve_with_shutdown method, but instead
requires to construct a TCP listener that runs in an infinite loop waiting for a sigint signal.

This seems a bit complicated to me and I just cannot figure out how to make the http server receive the same
sigjnit as the gRPC server.

Repo with code: https://github.com/marvin-hansen/tonic-autometrics/blob/main/src/main.rs

Any help is appreciated on this one.

Feature flags for backend versions

Before we could declare a 1.0 of this library, we probably need to make the feature flags related to specific backends include those crates' versions. All of them are 0.X so we can add feature flags like prometheus-0_13. This way, we can keep semver compatibility even if one of the dependencies introduces a new version.

Enable `Objective` details to be loaded from environment variables

A number of people have described use cases where the details of an SLO may be different for different instances of the same service. For example, a service that has specific instances for each geographical region, where each has its own SLO.

The main thing we would need to change in order to support this would be to make the Objective details use Strings instead of &'static strs and the methods would need to be non-const (this would be a breaking change). Then, you'd be able to define an objective like this:

use std::{env, cell::LazyCell};

static OBJECTIVE: LazyCell<Objective> = LazyCell::new(|| {
    Objective::new(env::var("OBJECTIVE_NAME").unwrap())
        .success_rate(env::var("OBJECTIVE_PERCENTILE").unwrap())
});

We could potentially add helper functions for loading the values from environment variables, or we could just show how to do this in the docs.

Opentelemetry Prometheus & Autometrics Compatibility

Hello! First, I'm a big fan of y'alls work. Thank you again for streaming your talk at Rust NYC @emschwartz 🙏

I am doing a 'zero-to-prod' style conference talk next month on Rust, with an emphasis on observability. I'd like to showcase autometrics, and am able to get autometrics working when I implement AM alone using the the AM prometheus exporter & metrics endpoint handler (i.e. follow the quickstart guide).

However, I also want to show off opentelemetry-prometheus metrics, and expose both metrics together. Is this a supported use case? If so, any pointers as to what I'm doing wrong? The current behavior is that only the OTEL-prom metrics are being exposed, not the autometrics metrics.

Initializing the otel-prom exporter:
https://github.com/ShockleyJE/zero-to-rust-at-neurelo/blob/main/data-plane/authn_service/src/main.rs#L37

Instrumenting one of the functions for autometrics:
https://github.com/ShockleyJE/zero-to-rust-at-neurelo/blob/main/data-plane/authn_service/src/authn/postgres_token_repo.rs#L22

Expectation:

Annotate a function store_token on the token creation code path
Create a token via the API

curl --location 'localhost:8000/' \
--header 'Content-Type: application/json' \
--data '{
    "environment" : "howderino"
}'

View metrics

curl --location 'localhost:9464/metrics'

I would expect to the function-level metrics for store_token, but only see the otel-prom metrics

# HELP http_server_active_requests HTTP concurrent in-flight requests per route
# TYPE http_server_active_requests gauge
http_server_active_requests{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_target="/",net_host_port="8000",service_name="unknown_service"} 0
# HELP http_server_duration HTTP inbound request duration per route
# TYPE http_server_duration histogram
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="1"} 0
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="2"} 0
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="5"} 0
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="10"} 0
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="20"} 0
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="50"} 1
http_server_duration_bucket{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service",le="+Inf"} 1
http_server_duration_sum{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service"} 29.957
http_server_duration_count{http_flavor="HTTP/1.1",http_host="localhost:8000",http_method="POST",http_scheme="http",http_server_name="0.0.0.0:8000",http_status_code="200",http_target="/",net_host_port="8000",service_name="unknown_service"} 1

Trying to implement push OTLP exporter

Hello team,

My colleagues and I are trying to export autometrics macro-generated metrics to an OTLP collector by implementing a push exporter based on the opentelemetry-rust SDK.

Unfortunately, our attempts were unsuccessful 😅. We hope that with your help, we can achieve our goal and perhaps contribute to the implementation of the idea discussed in Push Vs Pull #34.

From our understanding, when importing the autometrics crate with default features, the macro defaults to your OpenTelemetry tracker. According to the code and issue Make the metrics registry configurable #20, it calls opentelemetry_api::global::meter("") to declare and register every metric.
We then implemented a metric push controller from the opentelemetry-rust SDK, as shown in this example; The method set_meter_provider is called inside the build method of the opentelemetry_otlp::new_pipeline() constructor.

We were able to successfully push custom metrics to the OTLP collector, but we couldn't see the autometrics macro-generated ones!

We have published a basic example of our efforts, along with instructions on how to replicate our tests. We hope this will be useful for you or anyone else interested in implementing this feature to understand what we are trying to do.

Perhaps we are missing something from your or the OpenTelemetry SDK implementation...

Thank you for your time. If you decide to spend time on this, we are at your disposal for any clarification or further testing.

Initialize all metrics to zero

@mies made a good point that it would be useful to have all the metrics start out at zero so you'd see them appear in external systems pulling from Prometheus even before the app has seen any traffic

Provide Grafana dashboard

Exemplars are not produced with 'tracing-opentelemetry:0.21.0'

In an experimentation, I tried to produced exemplars and were not able to produce them. Looking at the example in the repository, it appears that I had tracing-opentelemetry:0.21.0, but the example used tracing-opentelemetry:0.20.0.

This issue is easy to reproduce.

Go into autometrics-rs/examples/exemplars-tracing-opentelemetry
cargo run --release
curl --silent localhost:3000/metrics | grep 'trace_id' should produce some entries
Then cargo add [email protected]
cargo run --release
curl --silent localhost:3000/metrics | grep 'trace_id' does not produce entries anymore

By the way, thank you for this wonderful piece of software that autometrics-rs is.

Refactor query generation logic into its own crate

@hatchan suggested refactoring out the logic for building the Prometheus queries into a separate crate so it can potentially be used outside of the macro.

Maybe we call it something like autometrics-query or autometrics-queries?

add some way to skip a function inside of a `#[autometrics]`'d `impl` block

while it is really convenient to just #[autometrics] a whole impl block, it is not always really all blocks one wants to audit. one example in this repo illustrates it very well:

#[autometrics]
impl Database {
    pub fn new() -> Self {
        Self
    }

    pub async fn load_details(&self) -> Result<(), ApiError> {
        // [...]
    }
}

here the new() function is very small and the compiler could even inline it, but by adding the #[autometrics] macro on top of it it'll add the metrics stuff and thus may even disqualify it from being inline'd and thus be a performance penalty.

something like this would be cool:

#[autometrics]
impl Database {
    #[skip_autometrics]
    pub fn new() -> Self {
        Self
    }

    pub async fn load_details(&self) -> Result<(), ApiError> {
        // [...]
    }
}

workarounds

currently, it is possible to workaround this by just splitting it into two impl blocks:

impl Database {
    pub fn new() -> Self {
        Self
    }
}

#[autometrics]
impl Database {
    pub async fn load_details(&self) -> Result<(), ApiError> {
        // [...]
    }
}

internal ref: slack chat with evan

Use `opentelemetry` span context for exemplars

As suggested by @hatchan in #100 (comment)

Rethink how we handle the integrations with multiple metrics libraries

To date, we've abstracted away the differences between the underlying metrics libraries in the tracker module. We implement the TrackMetrics trait for each metrics library and then export the specific implementation as autometrics::__private::AutometricsTracker (renaming it from its library-specific struct name), depending on which feature flag(s) you have enabled. The autometrics macro generates code that uses the AutometricsTracker struct.

This works okay as long as every library has a close enough API that we don't need to expose library-specific things. However, that changes a bit with the prometheus-client support (#25 and #88), because we need to expose a handle to the Registry. We can simply expose that as a top-level export from the crate, but it seems somewhat likely that the list of things in this category will grow.

The other thing that's a bit odd with how we're currently handling things is the precedence order for the feature flags related to the metrics libraries. We currently allow multiple to be set and then use an arbitrary order for which should take precedence. It seems like this can cause somewhat unexpected behavior, because it's not obvious from the outside which library would take precedence.

Some options we have for resolving these issues:

Expose metrics library-specific functionality from submodules such as integrations::prometheus, integrations::prometheus_client, etc
Make the metrics library feature flags mutually exclusive so you can only choose one and it's clear which one you're getting
- This would probably mean we would want to remove opentelemetry as the default and make it clear in the docs that you need to pick which underlying library it's going to use. We can have examples in the docs that use one so that would serve as an informal default
Alternatively, we could make it so that if you have multiple metrics libraries enabled, autometrics would track the metrics with all of them. This would be the least strict option, but it seems somewhat likely that people would end up collecting duplicate metrics

Thoughts?

Document conditional compilation use-cases

I think some of the feedback we got was to know whether it is possible to conditionally compile autometrics, and maybe we could add examples of this in the doc; at least what I think would solve the issue: the cfg_attr attribute macro.

Instrumenting only debug builds

#[cfg_attr(debug_assertions, autometrics::autometrics)]
fn foo() -> Result<(), String> {
}

Optionally instrumenting on a feature flag

# In Cargo.toml

[features]
metrics = [ "autometrics" ]

[dependencies] 
autometrics = { version = "0.6", optional = true }

#[cfg_attr(feature = "metrics", autometrics::autometrics)]
fn foo() -> Result<(), String> {
}

Mixing and matching

# In Cargo.toml

[features]
metrics = [ "autometrics" ]

[dependencies] 
autometrics = { version = "0.6", optional = true }

#[cfg_attr(all(debug_assertions, feature = "metrics"), autometrics::autometrics)]
fn foo_instrumented_on_debug_only() -> Result<(), String> {
}

#[cfg_attr(all(not(debug_assertions), feature = "metrics"), autometrics::autometrics)]
fn foo_instrumented_on_prod_only() -> Result<(), String> {
}

#[cfg_attr(feature = "metrics", autometrics::autometrics)]
fn foo_all_the_time() -> Result<(), String> {
}

Should we add an (optional) listener on a specific port?

Right now, we provide functions that serialize the metrics to a string and expect you to add a route to your API to expose the metrics.

Instead, we might want to provide an optional feature that just exports the metrics on a specific port and path. We would need to decide which port https://github.com/orgs/autometrics-dev/discussions/32 to listen on, and this would probably add hyper as a dependency.

If we add this, should it be enabled by default or opt-in?

Add support for adding exemplars?

Some metrics libraries, such as prometheus-client, support adding OpenMetrics exemplars to counters and histograms. If people are interested in such a feature, we could investigate adding support to the autometrics API for attaching dynamic function parameters as exemplars.

Please 👍 if you would be interested in this.

Suggested by SpudnikV on Reddit.

Make concurrent request gauge optional?

We're currently using a gauge to track the number of concurrent requests to every function. It might add more overhead than is really useful in a lot of cases.

We could either remove the tracking of concurrent requests or make it an optional parameter for the autometrics macro: #[autometrics(track_concurrency)] (or something like that).

Thoughts?

Autometrics `prometheus-exporter` is not compatible with `axum` 0.7

The repro below is a simplified version of the example found in the repo

Cargo.toml

[package]
name = "axum-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
autometrics = {version = "1.0.0", features = ["prometheus-exporter"]}
axum = "0.7.1"
tokio = {version="1.34.0", features = ["full"]}

main.rs

use std::net::SocketAddr;

use axum::{routing::get, Router};
use autometrics::prometheus_exporter;

#[tokio::main]
pub async fn main() {
    prometheus_exporter::init();
    let app = Router::new()
        .route(
            "/metrics",
            get(|| async { prometheus_exporter::encode_http_response() }),
        );

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    let tcp_listen = tokio::net::TcpListener::bind(&addr).await.unwrap();
    axum::serve(tcp_listen, app.into_make_service()).await.unwrap();
}

add branch details to build_info metric

I think it's useful to also include the git branch name to the build_info metric for even easier correlation.

Add `am.toml` files to the example folders

Once you're done setting up autometrics there could be a "now what?" moment, where you might not know what those metrics would bring. Maybe adding a small am.toml file (or even just a am start command to copy-paste, with a link to autometrics-dev/am) could be nice

Add examples showing how to add additional custom metrics

Using the 3 different metrics libraries

Suggested by @axiomatic-aardvark

Missing HTTP feature flag for opentelemetry-otlp

Hey, thanks for such an awesome project!

I followed the Rust Quickstart guide for OpenTelemetry, dropping this line into my Rust project:

otel_push_exporter::init_http("some_url")?; // Note the `?` is missing from your example

When I run my binary, the above line returns the following error:

Error: Metrics exporter otlp failed with no http client, you must select one from features or provide your own implementation

I then tried to run your example-opentelemetry-push project, but it returned a similar error:

Error: ExportErr(NoHttpClient)

I found that if I modified the "otel-push-exporter-http" feature in ./autometrics/Cargo.toml, I could get it working:

otel-push-exporter-http = [
  "otel-push-exporter",
  "opentelemetry-otlp/http-proto",
  "opentelemetry-otlp/reqwest-client" # <-- new line
]

I'm happy to raise a PR but wasn't sure if maybe I'm missing something on my end? Instrumentation is a new subject for me.

Include links to graphs with different time ranges?

It might be useful to include links to the Prometheus graphs with different time ranges, like:

Request Rate: last hour, day, week
Error Rate: last hour, day, week
...

Give better error message with `async_trait`

async_trait is a popular macro to get around the limitation of asynchronous functions in trait definitions in Rust.

autometrics seems currently incompatible with trait level annotations for async_trait traits; and currently gives an incoorect error message. This example comes fronm trying to add metrics to lemmy, which uses async_trait in the ActivityPub-related traits

#[autometrics::autometrics(objective = super::APUB_SLO)]
#[async_trait::async_trait]
impl ActivityHandler for DeleteUser {
  type DataType = LemmyContext;
  //...

won't work with a cryptic error message:

error: expected `fn`
  --> crates/apub/src/activities/deletion/delete_user.rs:50:1
   |
50 | impl ActivityHandler for DeleteUser {
   | ^^^^

But adding the attribute to inner functions will work:

#[async_trait::async_trait]
impl ActivityHandler for DeleteUser {
  type DataType = LemmyContext;
  type Error = LemmyError;

  #[autometrics::autometrics(objective = super::APUB_SLO)]
  fn id(&self) -> &Url {
    &self.id
  }

  #[autometrics::autometrics(objective = super::APUB_SLO)]
  fn actor(&self) -> &Url {
    self.actor.inner()
  }

  #[autometrics::autometrics(objective = super::APUB_SLO)]
  async fn verify(&self, context: &Data<Self::DataType>) -> Result<(), LemmyError> {
    insert_received_activity(&self.id, context).await?;
    verify_is_public(&self.to, &[])?;
    verify_person(&self.actor, context).await?;
    verify_urls_match(self.actor.inner(), self.object.inner())?;
    Ok(())
  }

  #[autometrics::autometrics(objective = super::APUB_SLO)]
  async fn receive(self, context: &Data<Self::DataType>) -> Result<(), LemmyError> {
    let actor = self.actor.dereference(context).await?;
    if self.remove_data.unwrap_or(false) {
      purge_user_account(actor.id, context).await?;
    } else {
      Person::delete_account(&mut context.pool(), actor.id).await?;
    }
    Ok(())
  }
}

Note

The issue might also come from the associated type in the trait now that I think about it, seeing the code. So adding an extra compilation test to make sure we can use #[autometrics] on trait implementation that have an associated type would also be welcome to remove that possible cause from the list of suspects.

Move prometheus-exporter related functions into a submodule

I think we may want to move the functions:

global_metric_exporter --> prometheus_exporter::init
encode_global_metrics --> prometheus_exporter::encode_to_string

Then we would also put the functionality for https://github.com/orgs/autometrics-dev/discussions/7 in this same module

We should probably keep the same exports we have now and mark them as deprecated for a while to avoid a breaking change right away.

Add benchmarks for using autometrics with the different metrics libraries

Refactor error handling

Right now, we're directly returning prometheus::Errors from the prometheus exporter function. However, since we're now using different metrics libraries, it would either make sense to:

Export our own error type that wraps underlying metrics libraries' types as enum variants
Depending on the metric library used (based on the feature flags), export that crate's specific error type

Pick a metrics library if none is selected and the `prometheus-exporter` feature is used

If a user uses the prometheus-exporter feature, there's a good chance they don't care which metrics library they're using. We should use one by default iff the prometheus-exporter feature is enabled and no metrics library is explicitly selected.

The question is which one to use. I'm somewhat tempted to make it prometheus-client because that's the official Rust prometheus client and the only one that currently supports exemplars.

Make the metrics registry configurable

Feature flag to use `prometheus` crate for creating metrics

Currently, we use the opentelemetry to export metrics. If people are already using the prometheus crate, it would be useful to have a feature flag that changes the macro behavior to insert function-level metrics using that crate instead.

Handling methods annotated within an impl block

The issue

There is a small inconsistency that might need to be resolved when annotating a complete impl block vs. annotating a method within the block:

// in src/my_mod.rs
#[autometrics]
impl Fitz {
    fn chivalry() {}
}

will produce metrics with

{
    "module": "my_mod",
    "function": "Fitz::chivalry"
}

whereas

// in src/my_mod.rs
impl Fitz {
    #[autometrics]
    fn chivalry() {}
}

will produce metrics with

{
    "module": "my_mod",
    "function": "chivalry"
}

In the second case we should also have the Fitz:: prefix but it doesn’t appear.

It cannot be solved without user input

The issue in the second case here is that the proc-macro cannot access code outside of its scope to find the parent impl Fitz clause. I don’t think there is a way to still just have the bare #[autometrics] annotation and work out the parent class name in the macro code.

A proposal

Besides documenting this limitation and advising splitting impl blocks so that all autometrics functions live within an annotated impl block, we could also maybe add an extra argument to the macro like

// in src/my_mod.rs
impl Fitz {
    #[autometrics(class = "Fitz")]
    fn chivalry() {}
}

to restore the labelling for these cases. What do you think?

Generate alerts / SLOs

Either generate a sloth or OpenSLO file, which can then be used to create alerts, or directly generate the Prometheus AlertManager alert definitions.

I'm imagining passing parameters to the autometrics macro, such as:

#[autometrics(objectives(success_rate = 99.9, latency_target = 0.2, latency_percentile = 99))]

You would add this to specific important functions like a top-level HTTP request handler on an API.

Generated queries should handle _total suffix

The prometheus-client crate adds the _total suffix to the counter. We should update the queries to handle this.