awslabs / aws-embedded-metrics-node Goto Github PK
View Code? Open in Web Editor NEWAmazon CloudWatch Embedded Metric Format Client Library
License: Apache License 2.0
Amazon CloudWatch Embedded Metric Format Client Library
License: Apache License 2.0
Sorry, I have two questions RE AWS EMF and this library:
Why isn't it part of the AWS JS SDK?
Use case: Measure a fetch. Lambda functions typically depend on external services and I am wondering if EMF logging is a good approach to tracking the performance of these dependencies. Or is that best left to some AWS X-ray instrumentation?
The AWS Publishing custom metrics page mentions standard and high resolution metrics. I have had a look through the documentation and source code for this package and can find no mention of either.
I am wanting to write a blog post about this package, but want to be clear to any reader what the granularity of the resulting metrics will be.
Thanks
FLUENT_HOST
and uses that to configure the sinkaws-embedded-metrics-node/src/sinks/AgentSink.ts
Lines 92 to 95 in 6132122
We could do this by subclassing AgentSink into GenericAgentSink
and CloudWatchAgentSink
or by delegating configuration of the AgentSink
to the environment.
Alternatively, we could just make LogGroup entirely optional for agents.
Hi, sorry in advance if this is already something supported by the library and I couldn't find it.
I'm trying to override the default dimensions to remove things that aren't useful to me, such as LogGroup and ServiceType.
I can't find a way to actually do this. I see from the Configuration in the README that I can set namespace which is also useful, but not enough.
I thought there'd be a similar way to handle Dimensions, but there aren't. Ideally, I'd like to be able to create a single MetricsLogger in my application and wire it in as needed. That logger should always have the dimensions I want as baseline, and some methods might add more.
I recognize that this wouldn't work for an environment variable configuration, but how about something like:
const { Configuration } = require("aws-embedded-metrics");
Configuration.dimensions = [{
version: 'N+1' // Latest Lambda Version specific
},
{} // Whole fleet metrics
];
This would allow me to alarm on both the "latest" Lambda version specific metrics, and the fleet as a whole as I'd be emitting metrics for both at the same rate. This is especially useful during deployments as I want to see how the newest version is doing.
An alternative would be to open up the constructor of the MetricsLogger. I see that it is technically public today, but we don't have access to the EnvironmentProvider (again, as far as I am able to tell).
Let me know what you think, and if there's already a way to do this I apologize in advance.
allowed:
putDimensions({ Method: "GET", StatusCode: "200" })
not allowed:
putDimensions({ Method: "GET", StatusCode: 200 })
I have a usecase where I want to get access to one or all the metrics that have been set with putMetric
before sending a response back to the user. Specifically I'm planning on putting some of the metric values into server timing headers so that I can tie clientside behavior back to serverside metrics.
I'm found that while there is not a public getMetric
method I can get access to previously set metrics by inspecting the data here
My question is whether this is an okay, somewhat stable method to use? Or if this is exclusively private/internal data that I should not be touching? If the latter, is there interest in adding formal getMetric
or getProperty
methods?
Hi! ๐
It seems that there is a typo in this library for a method on Socket
.
Here is the diff that solved my problem:
diff --git a/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js b/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js
index 48c7a37..f49168b 100644
--- a/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js
+++ b/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js
@@ -78,7 +78,7 @@ class TcpClient {
}
waitForOpenConnection() {
return __awaiter(this, void 0, void 0, function* () {
- if (!this.socket.writeable || this.socket.readyState !== 'open') {
+ if (!this.socket.writable || this.socket.readyState !== 'open') {
yield this.establishConnection();
}
});
Add full EKS example
'use strict';
const { metricScope } = require("aws-embedded-metrics");
const myFunc = metricScope(metrics =>
async () => {
var domains = ["google.com"];
domains.forEach(function(domain){
var https = require('https');
var options = {
host: domain,
port: 443
};
var req = https.request(options, function(res) {
req.end();
const days = Math.floor((new Date(res.connection.getPeerCertificate().valid_to) - new Date()) / 86400000)
console.log(days);
console.log()
});
});
});
exports.handler = myFunc();
metriscope is not returning a function, having a this error
{
"errorType": "Runtime.HandlerNotFound",
"errorMessage": "index.handler is not a function",
"trace": [
"Runtime.HandlerNotFound: index.handler is not a function",
" at Object.module.exports.load (/var/runtime/UserFunction.js:150:11)",
" at Object. (/var/runtime/index.js:43:30)",
" at Module._compile (internal/modules/cjs/loader.js:1015:30)",
" at Object.Module._extensions..js (internal/modules/cjs/loader.js:1035:10)",
" at Module.load (internal/modules/cjs/loader.js:879:32)",
" at Function.Module._load (internal/modules/cjs/loader.js:724:14)",
" at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)",
" at internal/main/run_main_module.js:17:47"
]
}
See #19
Related issue:
This is the equivalent enhancements related to resetting custom dimensions in the Node library. Similar APIs and features should be implemented in all libraries for consistency. Specifially, the enhancements include:
See also: awslabs/aws-embedded-metrics-python#15
I'm using this library to send metrics via the metricScope
and I need to test the functionality. To do so I'm following the example described here https://github.com/awslabs/aws-embedded-metrics-node/blob/master/examples/testing/tests/module.jest.test.js.
The project is written in TypeScript and I use jest for unit testing.
This example https://github.com/awslabs/aws-embedded-metrics-node/blob/master/examples/testing/tests/module.jest.test.js fails in a TypeScript context with the following error:
Module '"aws-embedded-metrics"' has no exported member 'mockLogger'.ts(2305)
mockLogger
to the test function without making TypeScript complain?package.json contains:
"jest": "^24.8.0",
"npm-pack-zip": "^1.2.7",
"prettier": "^1.19.1",
"ts-jest": "^26.1.1",
npm 7's install
, even with --production
:
npm ERR! code ERESOLVE
npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR!
npm ERR! While resolving: [email protected]
npm ERR! Found: [email protected]
npm ERR! node_modules/jest
npm ERR! dev jest@"^24.8.0" from the root project
npm ERR!
npm ERR! Could not resolve dependency:
npm ERR! peer jest@">=26 <27" from [email protected]
npm ERR! node_modules/ts-jest
npm ERR! dev ts-jest@"^26.1.1" from the root project
npm ERR!
npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
You need to bump Jest to 26 to fix this.
We're using this library with our lambda function for creating embedded metrics, and it works fantastically for that use case, but we've been struggling a little bit with getting it all to play smoothly during local development. I'm aware that we can set AWS_EMF_ENVIRONMENT
to Local
at runtime to make the library use stdout, and we've added that to a few of our package.json scripts to make it easier for everyone, but there are still a few cases where there's no simple or easy route to setting this, or somebody does something a little different and forgets to add that environment variable (we work in a large monorepo project, so the number of people contributing who know many detail around EMF is fairly low). This leads to confusing error messages about UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 0.0.0.0:25888
when running in Node 12 - and for anyone who is running in Node 14, it seems to actually just crash the local server.
Looking into the library, it looks like Agent is set up as the default environment, and the Local environment probe
method is intentionally a no-op, which means that the only way to use the library in local environment mode is via this environment variable.
Is there a historical reason why it is this way? Is there no other way to detect if the environment should be Agent
, and thus switch the default environment around, or having some other way to auto-detect if the environment should be Local
(e.g. if NODE_ENV !== 'production'
)?
Currently, all metrics are filtered against all defined dimensions, resulting in metrics associated to unrelated dimensions with a consequent duplication, unless you flush the metric logger multiple times, resulting in separate JSON logs in CloudWatch.
As a developer I'd like to filter a group of metrics against different dimensions within the same metric logger context (same JSON payload)
I'm currently working on a project where the server sends events rather than individual metrics. This event is a single JSON object containing all relevant metrics, dimensions and properties measured during the course of it (the metric logger is flushed only once per event instance).
As an example, let's take a pretty common event for web applications and call it page-request
, which is triggered any time a web page is requested by a user. Let's assume the collected metrics and dimensions are the followings:
RequestCount
Counts the number of HTTP requests the server receives from the user. This metric is used to calculate the RPS.
PageType
Count
Sum
ResponseTime
The response time in milliseconds.
PageType
Milliseconds
Avg
, 50th percentile
, 95th percentile
, 99th percentile
UpstreamRequestCount
Counts the the number of HTTP request the app performs towards its upstream services.
Client
Count
Sum
Where:
PageType:
is the type of page requested by the user (e.g. home, player, etc)Client
: the name of the upstream serviceFrom the example, RequestCount
and ResponseTime
share the same dimension, whereas UpstreamRequestCount
is applied to a different one.
Let's write an example of metric logger which is called once immediately after the HTTP response has been sent to the user
const namespace = config.get('namespace');
export const logPageRequest = metricScope(metrics => {
return async pageRequestEvent => {
const {
requestCount,
responseTime,
upstreamRequestCount,
pageType,
client
} = pageRequestEvent;
metrics.setNamespace(namespace);
metrics.putMetric('RequestCount', requestCount, Unit.Count);
metrics.putMetric('ResponseTime', responseTime, Unit.Milliseconds);
metrics.putMetric('UpstreamRequestCount', upstreamRequestCount, Unit.Count);
metrics.setDimensions(
{ PageType: pageType },
{ Client: client }
);
};
});
This example generates the following JSON log
and the following metrics are extracted
We can see that UpstreamRequestCount
is also applied to the PageType
dimension and RequestCount
and ResponseTime
to Client
, effectively generating unnecessary new metrics (3 in this example).
According to the previous example, I'd like to filter the UpstreamRequestCount
by Client
only and RequestCount
and ResponseTime
by PageType
, resulting in the following metrics
Here, the PageType
group contains only the metrics that we want to apply, same thing for Client
.
According to the EMF specification it is possible to add multiple CloudWatchMetrics
objects
"CloudWatchMetrics": [
{
... ...
},
{
... ...
}
]
in order to define different groups of metrics that we want to apply to different dimensions. If we consider the previous example once again, we need to generate a JSON payload like the following
where the two metrics sharing the same dimensions are defined within the same CloudWatchMetrics
object.
Generally speaking, each CloudWatchMetrics
object contains metric that are filtered by the same group of dimensions.
To do so, I'm proposing to add a new method to the MetricsContext
interface called add
. The method will accept only one parameter which is an object defined in the next section.
{
"Name": String,
"Value": Number,
"Unit": String,
"Metrics": [ MetricItem, ... ],
"Dimensions": Object
}
The metric name.
Required: only if Metrics is undefined or an empty array, optional otherwise.
Type: String
The metric value.
Required: only if Metrics is undefined or an empty array, optional otherwise.
Type: Number
The metric unit (e.g. Unit.Count, Unit.Milliseconds, etc.)
Required: only if Metrics is undefined or an empty array, optional otherwise.
Type: Number
An array of objects (see MetricItem type).
Required: only if Name, Value and Unit are undefined, optional otherwise.
Type: Array
The dimensions to filter the defined metrics by. This objects is a map of key/value pairs that stores the name and value of the dimension. Each property value must be of type String.
Required: yes
Type: Object
{
"Name": String,
"Value": Number
"Unit": String
}
Considering the previous example, our metric logger will look like
const namespace = config.get('namespace');
export const logPageRequest = metricScope(metrics => {
return async pageRequestEvent => {
const {
requestCount,
responseTime,
upstreamRequestCount,
pageType,
client
} = pageRequestEvent;
metrics.setNamespace(namespace);
metrics.add({
Metrics: [
{
Name: 'RequestCount',
Value: requestCount,
Unit: Unit.Count
},
{
Name: 'ResponseTime',
Value: responseTime,
Unit: Unit.Milliseconds
}
],
Dimensions: { PageType: pageType }
});
};
});
When we have one metric, we can either do
metrics.add({
Metrics: [
{
Name: 'UpstreamRequestCount',
Value: upstreamRequestCount,
Unit: Unit.Count
}
],
Dimensions: { Client: client }
});
or
metrics.add({
Name: 'UpstreamRequestCount',
Value: upstreamRequestCount,
Unit: Unit.Count,
Dimensions: { Client: client }
});
We could have modified the LogSerializer
to optionally generate the multiple CloudWatchMetrics
objects by means of a flag, but currently there is no association between group of metrics sharing the same dimensions. To achieve that, we could have modified the internal data structure by adding a mapping between them, resulting in a new method anyway to allow the user to express this relationship via the public API.
By creating a brand new method we keep the current data structure as is and the API back compatible with the previous version. The new method will have a separate data structure to allow the LogSerializer
to easily understand how to serialise it.
Currently, if the agent is down or has not started, metrics can be dropped. It's currently up to the caller of logger.flush
to handle retries. There are 2 options:
logger.flush
. This could negatively impact request latencies.The symptoms of this are:
(node:1) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 172.17.0.2:25888
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)
AgentSinkOptions
with
RetryStrategy
parameter where the default value is None
for backwards compatibility with a single option to start with: ExponentialBackoffRetryStrategy
(see also: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)AsyncBehavior
parameter that controls whether the call should block or not. In the former case we keep the current behavior and in the latter we return immediately, enqueuing to the retry buffer on failure.constructor(options: AgentSinkOptions, ISerializer: serializer)
.NoRetry
propagates errors back to the caller of flush
which maintains current behavior today. ExponentialRetry
(which can be configured by the application) will block flush
on the first attempt, enqueuing to a CircularBuffer
(whose size is also configurable) on failures.setInterval
will be set to check the size of the CircularBuffer
and retry failed requests asynchronously.shutdown
method to gracefully shutdown and block on any outstanding requests.AWS_EMF_AGENT_RETRY_STRATEGY="ExponentialBackoff"
// or
Configuration.agentRetryStrategy = RetryStrategy.ExponentialBackoff;
// or
Configuration.agentRetryStrategy = (...) => customRetryStratgy();
// ...
await logger.flush();
// execution control is returned when logs have been successfully flushed or enqueued for retry
logger.flush()
to enqueue and return immediately? This would allow us to make flush()
a synchronous operation in all cases.I will be trying to run this on EC2 (it seems from the examples it can be done). However when I run my dev server locally i see the error handler(...) is not a function
I can actually see the event logged to the console with stdout
I see the handler error when trying to load a page in the browser.
This is my setup;
## CW Custom Metrics Configuration
AWS_EMF_SERVICE_NAME=AppName
AWS_EMF_LOG_GROUP_NAME=AppServer
AWS_EMF_ENVIRONMENT=Local
// custom CW metrics
const { metricScope, Unit } = require('aws-embedded-metrics');
const sendCustomMetric = metricScope(metrics => {
async (metric, status, pagetype, url) => {
metrics.putDimensions({ PageType: pagetype, StausCode: status });
metrics.putMetric(metric, 1, Unit.Count);
metrics.setProperty('URL', url);
};
});
await sendCustomMetric('200_Response', status, pageType, url);
I was just looking at this code for something else and haven't had time to confirm this issue, but it looks like we have a typo here ("writeable" Vs. "writable").
The README mentions that "If more metric values are added than are supported by the format, the logger will be flushed to allow for new metric values to be captured." but this doesn't seem true if I trust the output of my program using the library and the code I could read
This leads output to go above 100 leading to ignored entries.
https://github.com/kaihendry/yt-aws-emf/blob/main/hello-world/app.js#L48
Hi, if I set the requestID in my EMF, how do I trace a putMetric back to the aws request ID?
Expose 3 methods for publishing counters.
metrics.increment('Increment');
metrics.decrement('Decrement');
metrics.count('Count', 10);
{ "Increment": 1, "Decrement": -1, "Count": 10 }
Multiple calls to the same key will be recorded as separate entries. This allows us to preserve the sample count.
metrics.increment('Key');
metrics.increment('Key');
metrics.decrement('Key');
{ "Key": [ 1, 1, -1 ] }
Alternatively, we can use the PMD syntax:
{
"Key": {
"type": "dist",
"buckets": "explicit",
"values": [ 1, -1 ],
"counts": [ 2, 1 ]
}
metrics.gauge('key', 10);
metrics.time('key', 10);
timedMetricScope('operation', metrics => {
// do things and track how long it takes...
})
metrics.histo('key', 10);
{
"Key": {
"type": "dist",
"buckets": "explicit",
"values": [ 1 ],
"counts": [ 10 ]
}
}
According to #19 you can now override the environment detection by either using
const { Configuration } = require("aws-embedded-metrics");
Configuration.environmentOverride = "Local";
or
export AWS_EMF_ENVIRONMENT=Local
When I export the environment variable it works and I can see the agent logging to stdout, when I try the first approach, I still get the following error:
(node:783) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 0.0.0.0:25888
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
(node:783) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 13)
(node:783) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
I've got the following helper module
import { metricScope, Unit, Configuration } from 'aws-embedded-metrics';
import config from 'config';
const environment = config.get('environment');
const namespace = config.get('namespace');
Configuration.logGroupName = namespace;
Configuration.environmentOverride = 'Local';
export const pageRequestLogger = metrics => {
return async pageRequestEvent => {
const {
requestCount,
errorCount,
responseTime,
pageType,
event,
processPid,
request,
response,
log,
logTrace
} = pageRequestEvent;
metrics.setNamespace(namespace);
metrics.putMetric('RequestCount', requestCount, Unit.Count);
metrics.putMetric('ErrorCount', errorCount, Unit.Count);
metrics.putMetric('ResponseTime', responseTime, Unit.Milliseconds);
metrics.setDimensions({ PageType: pageType });
metrics.setProperty('event', event);
metrics.setProperty('processPid', processPid);
metrics.setProperty('request', request);
metrics.setProperty('response', response);
metrics.setProperty('log', log);
metrics.setProperty('logTrace', logTrace);
};
};
export const logPageRequest = metricScope(pageRequestLogger);
The code runs on a Centos7 Docker container
Node: 12.18.0
NPM: 6.14.4
There is no need for duplicate dimension sets like the following.
{
"Dimensions": [
[ "A", "B"],
[ "A", "B"],
[ "B", "A"],
]
}
This would re-create the same metric 3 times and is equivalent to:
{
"Dimensions": [ [ "A", "B"] ]
}
This is needed for #14. It allows for things like the following while also allowing for re-use of the logger instance.
const doWork = metricScope(metrics => () => {
metrics.putDimensions(dimensions);
// ...
});
// act
doWork();
doWork();
logger.putMetric(metricKey, 0);
await logger.flush();
logger.putMetric(metricKey, 1);
await logger.flush();
It looks like there's a memory leak in TcpClient#sendMessage
, making this SDK dangerous for long-running processes.
I wrote some HTTP client code that uses this SDK and makes a request to a simple server every 100ms. I launched the client on ECS, and I saw the memory grow at a steady rate until the task crashed. For a period I ran a revised version of the task that does not use the EMF SDK (~16:00-20:00 in the graph below), and during that period memory did not grow โ so I know that the EMF SDK is the culprit.
I ran the server and client locally with node --inspect
to see if I could track down the leak. What I found is JSArrayBufferData
growing with every snapshot and never cleaning up. Looking at the list of retainers, I see that the TcpClient
seems to be assigning an event listener that is never cleaned.
This seems to be the responsible code:
aws-embedded-metrics-node/src/sinks/connections/TcpClient.ts
Lines 47 to 58 in 8bc9002
this.socket.once('error', onSendError)
, specifically.
I don't understand the purpose served by that listener. If this.socket.write
fails, there's already code to run onSendError
. Can we remove that listener completely? Or is there some edge case that it is meant to address?
Another option is to add this.socket.removeListener('once', onSendError)
inside the callback for this.socket.write
.
Both of those fixes appeared to cure the memory leak in my local runs. JSArrayBufferData
stopped growing indefinitely.
'use strict';
require('https').globalAgent.keepAlive = true;
const http = require('http');
const { metricScope } = require('aws-embedded-metrics');
const POLL_TIME = 100;
const serverHost = process.env.ServerHost;
const serverPort = process.env.ServerPort;
const sendRequest = metricScope((metrics) => async () => {
metrics.setProperty('Role', 'Client');
metrics.putMetric('ReqCount', 1);
return new Promise((resolve, reject) => {
const start = process.hrtime.bigint();
const handleRes = (res) => {
res.on('data', () => {});
res.once('error', (error) => {
metrics.putMetric('ResError', 1);
metrics.setProperty('StatusCode', res.statusCode);
metrics.setProperty('ResErrorData', error);
reject(error);
});
res.on('end', () => {
const end = process.hrtime.bigint();
const elapsedMs = Number(end - start) * 1e-6;
const elapsedMsRounded = Math.round(elapsedMs * 100) / 100; // Round to 2 decimal places.
metrics.putMetric('ResponseTime', elapsedMsRounded, 'Milliseconds');
metrics.putMetric('ResSuccess', 1);
metrics.setProperty('StatusCode', res.statusCode);
resolve();
});
};
const baseReqOptions = {
method: 'GET',
host: serverHost,
port: serverPort
};
metrics.setProperty('ReqOptions', baseReqOptions);
const req = http.request({ ...baseReqOptions }, handleRes);
req.once('error', (error) => {
metrics.putMetric('ReqError', 1);
metrics.setProperty('ReqErrorData', error);
reject(error);
});
req.end();
});
});
async function main() {
setInterval(() => {
sendRequest().catch((error) => console.error(error));
}, POLL_TIME);
}
exports.main = main;
if (require.main === module) {
main().catch((error) => {
console.error(error);
process.exit(1);
});
process.on('SIGTERM', () => process.exit(0));
}
With the raw embedded metrics format, a single structured log can produce multiple metrics with different dimensions. Here's an example:
{
"_aws": {
"Timestamp": 1574109732004,
"CloudWatchMetrics": [
{
"Namespace": "lambda-function-metrics",
"Dimensions": [["dimension1"]],
"Metrics": [
{
"Name": "metric1"
}
]
},
{
"Namespace": "lambda-function-metrics",
"Dimensions": [["dimension2"]],
"Metrics": [
{
"Name": "metric2"
}
]
}
]
},
"dimension1": "value1",
"dimension2": "value2",
"metric1": 100,
"metric2": 200
}
Currently, this is not possible with the EMF SDK for Node.js. I would need to create a new MetricLogger for each distinct list of dimensions. This issue is a feature request to add an interface to the SDK that enables this kind of configuration.
The reason I'd prefer to produce these metrics from a single log line, instead of multiple, is so that I can create just one structured log per "unit of work," as described in this great AWS Builder's Library article.
When I try to use this library in a Docker container running Node 14, I hit the following error:
events.js:291
throw er; // Unhandled 'error' event
^
Error [ERR_SOCKET_CLOSED]: Socket is closed
at Socket._writeGeneric (net.js:774:8)
at Socket._write (net.js:796:8)
at writeOrBuffer (_stream_writable.js:352:12)
at Socket.Writable.write (_stream_writable.js:303:10)
at /usr/local/src/prauthoxy-platform/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:57:56
at new Promise (<anonymous>)
at TcpClient.<anonymous> (/usr/local/src/prauthoxy-platform/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:52:19)
at Generator.next (<anonymous>)
at fulfilled (/usr/local/src/prauthoxy-platform/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:18:58)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
Emitted 'error' event on Socket instance at:
at emitErrorNT (internal/streams/destroy.js:106:8)
at errorOrDestroy (internal/streams/destroy.js:167:7)
at onwriteError (_stream_writable.js:391:3)
at processTicksAndRejections (internal/process/task_queues.js:82:21) {
code: 'ERR_SOCKET_CLOSED'
}
The exact same code usage works just fine on Node 12.
This issue is to receive feedback on whether or not users want LogGroup
as a default dimensions on your metrics. This was originally intended to enable deep-linking from metrics to the EMF events, but we no longer believe this is the correct approach to creating this linkage. This is a breaking change, so we want to hear your feedback.
aws-embedded-metrics-node/src/logger/MetricsLogger.ts
Lines 142 to 147 in 2ec8a84
Problems
Config
These are the container definitions:
ContainerDefinitions:
# [...]
- Name: app
Image: redacted
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref ContainerLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: app
Environment:
- Name: AWS_EMF_SERVICE_NAME
Value: !Sub ${AWS::StackName}-app
- Name: AWS_EMF_SERVICE_TYPE
Value: "NodeJS-API"
- Name: AWS_EMF_LOG_GROUP_NAME
Value: !Ref ContainerLogGroup
- Name: AWS_EMF_LOG_STREAM_NAME
Value: metrics
- Name: AWS_EMF_NAMESPACE
Value: !Sub ${AWS::StackName}
- Name: AWS_EMF_ENABLE_DEBUG_LOGGING
Value: true
- Name: agent
Image: amazon/cloudwatch-agent:latest
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref ContainerLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: agent
Secrets:
- Name: CW_CONFIG_CONTENT
ValueFrom: !Ref CloudWatchAgentConfigArn
This is the agent config:
{
"logs": {
"metrics_collected": {
"emf": {}
}
}
}
This is what the agent sidecar logs:
// Log stream: agent/agent/227b1b1f66744e318edb2d5e9bb57e2d
2020/11/09 16:31:03 I! 2020/11/09 16:31:03 E! ec2metadata is not available
--
2020/11/09 16:31:03 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
I! Detected the instance is ECS
2020/11/09 16:31:03 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
Cannot access /etc/cwagentconfig: lstat /etc/cwagentconfig: no such file or directory2020/11/09 16:31:03 unable to scan config dir /etc/cwagentconfig with error: lstat /etc/cwagentconfig: no such file or directory
2020/11/09 16:31:03 Reading json config from from environment variable CW_CONFIG_CONTENT.
Valid Json input schema.
I! detect region from ecs
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded
ย
2020/11/09 16:31:03 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2020-11-09T16:31:03Z I! Starting AmazonCloudWatchAgent 1.247346.0
2020-11-09T16:31:03Z I! Loaded inputs: socket_listener socket_listener
2020-11-09T16:31:03Z I! Loaded aggregators:
2020-11-09T16:31:03Z I! Loaded processors:
2020-11-09T16:31:03Z I! Loaded outputs: cloudwatchlogs
2020-11-09T16:31:03Z I! Tags enabled:
2020-11-09T16:31:03Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2020-11-09T16:31:03Z I! [inputs.socket_listener] Listening on udp://[::]:25888
2020-11-09T16:31:03Z I! [inputs.socket_listener] Listening on tcp://[::]:25888
2020-11-09T16:31:03Z I! [logagent] starting
2020-11-09T16:31:03Z I! [logagent] found plugin cloudwatchlogs is a log backend
This is what the app container logs:
// Log stream: app/app/227b1b1f66744e318edb2d5e9bb57e2d
Received default dimensions {
LogGroup: 'redactedLogGroupName',
ServiceName: 'my-service-name',
ServiceType: 'NodeJS-API'
}
Sending {} events to socket. 1
opening connection with socket in state: closed
TcpClient connected. { host: '0.0.0.0', port: 25888, protocol: 'tcp:' }
Write succeeded
However, the metric gets logged to another log stream (not the one in AWS_EMF_LOG_STREAM_NAME) and it does not include the AWS_EMF_SERVICE_TYPE or AWS_EMF_SERVICE_NAME.
// Log stream: arn_aws_ecs_eu-north-1_redactedAccountId_task/redactedClusterName/227b1b1f66744e318edb2d5e9bb57e2d
{
"Endpoint": "POST /authentication/v5/login/email",
"Method": "POST",
"Path": "/authentication/v5/login/email",
"StatusCode": 200,
"IP": "redactedIp",
"UserAgent": "Amazon CloudFront",
"containerId": "ip-redactedIp.eu-north-1.compute.internal",
"createdAt": "2020-11-09T16:31:04.591709491Z",
"startedAt": "2020-11-09T16:31:05.226439Z",
"image": "redactedAccountId.dkr.ecr.eu-north-1.amazonaws.com/redactedImage",
"cluster": "arn:aws:ecs:eu-north-1:redactedAccountId:cluster/redactedClusterName",
"taskArn": "arn:aws:ecs:eu-north-1:redactedAccountId:task/redactedClusterName/227b1b1f66744e318edb2d5e9bb57e2d",
"_aws": {
"Timestamp": 1604996779353,
"LogGroupName": "redactedLogGroupName",
"CloudWatchMetrics": [
{
"Dimensions": [
[
"Endpoint"
]
],
"Metrics": [
{
"Name": "Latency",
"Unit": "Milliseconds"
},
{
"Name": "Success",
"Unit": "Count"
}
],
"Namespace": "redactedNamespace"
}
]
},
"Latency": 1415.994263,
"Success": 1
}
Currently, if putMetric
is called > 100 times, it will fail silently on the backend. We should automatically flush client-side if this limit is hit.
I noticed many messages like the following on a production ECS system, after Upgrading from node 14 to node 16.15.0
Roundabout 1/3 of all writes fail with this message.
{ "message": "Cannot call write after a stream was destroyed", "name": "Error", "stack": "Error [ERR_STREAM_DESTROYED]: Cannot call write after a stream was destroyed\n at new NodeError (node:internal/errors:372:5)\n at _write (node:internal/streams/writable:321:11)\n at Socket.Writable.write (node:internal/streams/writable:334:10)\n at /thingregistry/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:58:56\n at new Promise (<anonymous>)\n at TcpClient.<anonymous> (/thingregistry/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:53:19)\n at Generator.next (<anonymous>)\n at fulfilled (/thingregistry/node_modules/aws-embedded-metrics/lib/sinks/connections/TcpClient.js:19:58)\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)", "code": "ERR_STREAM_DESTROYED" }
I could not find any obvious issue in the code.
The createCopyWithContext
function does not copy over the shouldUseDefaultDimensions
property to the new context, resulting in the default dimensions being included when they should not be.
Either, the default dimensions should be copied to the new context depending on the value of shouldUseDefaultDimensions
public createCopyWithContext(): MetricsContext {
return new MetricsContext(
this.namespace,
Object.assign({}, this.properties),
Object.assign([], this.dimensions),
this.shouldUseDefaultDimensions == true ? this.defaultDimensions : [],
);
}
Or also copy the shouldUseDefaultDimensions into the new context
public createCopyWithContext(): MetricsContext {
return new MetricsContext(
this.namespace,
Object.assign({}, this.properties),
Object.assign([], this.dimensions),
this.defaultDimensions,
this.shouldUseDefaultDimensions,
);
}
I noticed the release of an undocumented change as new major version on npm in the last 24 hours.
dist
.tarball: https://bahnhub.tech.rz.db.de:443/artifactory/api/npm/default-npm-3rdparty/aws-embedded-metrics/-/aws-embedded-metrics-3.0.0.tgz
.shasum: 0ecbd9e1411195ceef289109853ea4ac9e71626d
.integrity: sha512-4SsOynlnrdT9C8NzQFLqyIz/5g/sYPYsKF1yh+VcnIIMliXHt1CHZS4Gw0Gd0IDZLPS+sWrL6jyAzPvsaog7sg==
Securitywise the change does not look critical, still having a release with a change not mentioned in the Release Notes seems to be a red flag. https://github.com/awslabs/aws-embedded-metrics-node/releases
I'd like to block the SDK from trying to communicate with a CloudWatch Agent during certain unit tests. I don't see a documented way to do this right now.
Ideally, I'd also be able to run assertions against the information that the SDK would have logged. (Did the dimensions, properties, and metrics logged during this test match my expectations?) So one possible approach to this problem could be to "report" logs to some in-memory object that can be inspected, instead of the CloudWatch Agent. At a minimum, though, I'd like to be able turn off Agent communication, instead of having to mock the module's API whenever I want to run my code without an Agent.
I've found that it's possible to accidentally call putMetric
with a string value, and this value is saved and outputted to cloudwatch as a quoted string, which of course doesn't work as a metric.
The typescript code has type checking that putMetric
is called with a number, but this doesn't help if you're writing Javascript, and of course doesn't help at runtime.
It can be subtle and hard to spot what is going on, as if you accidentally push a numeric metric that is typed as a string, then it just ends up quoted in the generated JSON, which is hard to spot until it doesn't work, e.g. putMetric('someKey', '200')
.
The library should explicitly convert all values it's sent to Number()
so that if someone accidentally sends string values then they'll be dealt with correctly if they can be easily converted to numeric.
START RequestId: 6210176a-a746-425b-843b-ac8e3bba7eb4 Version: $LATEST
2021-10-27T14:14:17.648Z 6210176a-a746-425b-843b-ac8e3bba7eb4 INFO {"level":"info","msg":"Starting request","context":{"callbackWaitsForEmptyEventLoop":true,"functionVersion":"$LATEST","functionName":"HelloWorldFunction","memoryLimitInMB":"128","logGroupName":"aws/lambda/HelloWorldFunction","logStreamName":"$LATEST","invokedFunctionArn":"","awsRequestId":"6210176a-a746-425b-843b-ac8e3bba7eb4"}}
2021-10-27T14:14:17.652Z 6210176a-a746-425b-843b-ac8e3bba7eb4 INFO https://httpstat.us/200?sleep=0
2021-10-27T14:14:18.037Z 6210176a-a746-425b-843b-ac8e3bba7eb4 INFO {"level":"info","msg":"called","urlWithParams":"https://httpstat.us/200?sleep=0","duration":383}
} ServiceType: 'AWS::Lambda::Function'6a-a746-425b-843b-ac8e3bba7eb4 INFO Received default dimensions {
2021-10-27T14:14:18.044Z 6210176a-a746-425b-843b-ac8e3bba7eb4 INFO {"url":"https://httpstat.us/200?sleep=0","status":"200","executionEnvironment":"AWS_Lambda_nodejs14.x","memorySize":"128","functionVersion":"$LATEST","logStreamId":"$LATEST","_aws":{"Timestamp":1635344057646,"CloudWatchMetrics":[{"Dimensions":[["url","status"]],"Metrics":[{"Name":"Size","Unit":"Bytes"},{"Name":"Success","Unit":"Milliseconds"}],"Namespace":"yt-emf1"}]},"Size":26,"Success":383}
END RequestId: 6210176a-a746-425b-843b-ac8e3bba7eb4
REPORT RequestId: 6210176a-a746-425b-843b-ac8e3bba7eb4 Init Duration: 0.21 ms Duration: 755.33 ms Billed Duration: 800 ms Memory Size: 128 MB Max Memory Used: 128 MB
{"statusCode":200,"body":"{\"message\":{\"code\":200,\"description\":\"OK\"}}"
I do not understand why size is missing from my namespace. Any ideas? Thank you in advance
Documentation for flush()
flush()
Flushes the current MetricsContext to the configured sink and resets all properties, dimensions and metric values. The namespace and default dimensions will be preserved across flushes.
Namespace is set on the context and at the end of flush, context is set back to empty.
Either the documentation or the behavior of flush should be updated.
Canary currently runs on Node.js v10 but needs to be running on v16.
Hi,
This is more of a question than a bug.
Context:
We have added embedded metrics to our lambda stack that's using serverless
and serverless-offline
to run locally.
Issue:
When run locally (without AWS_LAMBDA_FUNCTION_NAME
set) it defaults to DefaultEnvironment
and kills the serverless-offline
thread when .flush()
is called because of unhandled promise rejection on another thread. It seems that default environment expect an AgentSink
to be present. Is that expected?
Steps to reproduce:
const defaultEnv = new DefaultEnvironment();
const defaultLogger = createLogger(() => Promise.resolve(defaultEnv));
defaultLogger.flush()
When running in a test environment (Which is similar to a setup on EC2 with a CloudwatchAgent) the whole node.js application crashes when it's unable to push metrics to the Agent and I'm unable to catch the error.
Steps to reproduce
Expected behavior
Actual behavior
Root Cause
I tried to narrow down the problem and it seems that
System Overview
I'm exploring using embedded metrics and would like to have some metrics sourced from a CLI tool that we have built for our dev community (deployment metrics, and cli tool usage etc).
The default AgentSink is presumably auto-resolved on EC2's and detects a Lambda env to write to STDOUT.. however, is it a feasible use case to submit these outside AWS Services?
I'm able to submit metrics using the aws-cli, but I keep getting connection refused for obvious reasons: TCP Client received error Error: connect ECONNREFUSED 0.0.0.0:25888
For context I'm on Direct Connect.
Would there need to be a custom Sink that uses the put-log-events API to achieve this? Or is there a tcp endpoint that I could configure?
I'm using the following code to have only one dimension.
metrics.setDimensions();
In some cases for my Lambda there will not be any metrics. However, I have noticed in my log that an empty metric is published. For example:
{"executionEnvironment":"AWS_Lambda_nodejs12.x","memorySize":"1792","functionVersion":"$LATEST","logStreamId":"2020/07/12/[$LATEST]9a0092bb2cf76b6b90c46bf429a32aef","traceId":"Root=1-dc99d00f-c079a84d433534434534ef0d;Parent=91ed514f1e5c03b2;Sampled=1","_aws":{"Timestamp":1594571647673,"CloudWatchMetrics":[{"Dimensions":[],"Metrics":[],"Namespace":"MyApp"}]}}
Should this library check for dimension/metric presence before sending the output?
See #32. Prior to releasing new versions, we need to run an extended bake test to validate there are no performance regressions.
Currently, when I set AWS_EMF_ENVIRONMENT=Local
and run the app, the agent logs to the stdout which is the expected behaviour.
For small logs this is fine, but when it comes to bigger projects the local sandbox terminal is flooded with tons of serialised mammoth JSON objects.
As a developer, I'd like to propose a new configuration that defines which properties of the JSON object can be logged by the agent so that I can dynamically customise the amount of details printed on the stdout. This functionality should be only available in development (i.e. when AWS_EMF_ENVIRONMENT
is set to Local
).
Let's say we have a log structure like the following:
{
"PageType": "player",
"event": {
"id": "9bac0a47-1623-410d-bcd2-f03aa1283669",
"source": "server",
"trigger": "user",
"type": "page-request"
},
"logTrace": [
"[UpstreamName.apifeed]: Empty data, returning with graceful degradation."
],
"processPid": 872,
"requestPath": "/path/to/the/resource",
"requestHeaders": {
},
"requestHeadersList": [
],
"hasCookie": false,
"cookieLength": 0,
"cookieList": [],
"responseStatus": 200,
"responseHeaders": {
},
"upstreams": [
{
"name": "UpstreamName",
"endpoint": "apifeed",
"attempts": [
{
"cache": {
"hit": true,
"miss": false,
"stale": false,
"error": false,
"timeout": false,
"revalidate": false,
"revalidateError": false
},
"response": {
"headers": {
},
"body": {
},
"status": 200,
"time": 86
},
"id": 1
}
],
"attemptCount": 1,
"retryCount": 0,
"requestCount": 0,
"requestErrorCount": 0,
"response5xxCount": 0,
"response4xxCount": 0,
"response3xxCount": 0,
"response2xxCount": 0,
"response1xxCount": 0,
"responseInvalidCount": 0,
"cacheAudit": [
[
"hit"
]
],
"cacheHitCount": 1,
"cacheMissCount": 0,
"cacheStaleCount": 0,
"cacheErrorCount": 0,
"cacheTimeoutCount": 0,
"cacheRevalidateCount": 0,
"cacheRevalidateErrorCount": 0,
"responseTime": 10
}
],
"imageId": "ami-someid",
"instanceId": "i-someid",
"instanceType": "some.instancetype",
"privateIP": "127.0.0.1",
"availabilityZone": "some-aws-region",
"_aws": {
"Timestamp": 2693848470655,
"LogGroupName": "/example/live/player/app",
"CloudWatchMetrics": [
{
"Dimensions": [
[
"PageType"
]
],
"Metrics": [
{
"Name": "RequestCount",
"Unit": "Count"
},
{
"Name": "ResponseTime",
"Unit": "Milliseconds"
},
{
"Name": "ErrorCount",
"Unit": "Count"
},
{
"Name": "PageNotFoundCount",
"Unit": "Count"
}
],
"Namespace": "/example/live/player/app"
}
]
},
"RequestCount": 1,
"ResponseTime": 211,
"ErrorCount": 0,
"PageNotFoundCount": 0
}
and I'm running the app locally. This object is clearly big to print out and if you think that it is serialised and logged once every time a new requests is performed by the user, you can imagine how busy the terminal will look like.
[UPDATE] the following section has been "quoted" to highlight that an amendment of the following requests has been added in the comments. The section remains here in the description for visibility.
Let's say I only want to log some textual information and not everything else. If you notice, among all these useful in production (but noisy in development) properties, there is an array called
logTrace
, I'd like to be able to do something similar to:// in process const { Configuration } = require("aws-embedded-metrics"); Configuration.somePropertyName = ['logTrace']; // environment AWS_EMF_SOME_PROPERTY_NAME="logTrace"and on the terminal, printing something like the following:
{ "logTrace": [ "[UpstreamName.apifeed]: Empty data, returning with graceful degradation." ] }
Details
The new configuration should have the following requirements:
- We can define multiple properties (comma separated for the environment variable, or using an array for the in-code variable)
- We can define nested properties a-la lodash#get (e.g. "event.type") by using the dot-notation.
- The agent will print a flattened object where all properties appear at the root and the key is the name of the property. If a selected key is nested (e.g.
event.type
) the property will use the dot-notation as a key name.- The new configuration is only active when
AWS_EMF_ENVIRONMENT=Local
so that it only apply in development.- If only one property is selected and its value is a string, only log the string. This way if you have a property that you use to log as you used to do before it will be pretty much similar to what you had.
I'm not particularly opinionated on the name of the config. I can give a couple examples but I'm open to other suggestions:
AWS_EMF_ALLOWED_PROPERTIES
,AWS_EMF_FILTERED_PROPERTIES
,AWS_EMF_LOCAL_LOG_STRUCTURE
,AWS_EMF_LOCAL_LOG_PROPERTIES
(or some permutation). The last two environment variables suggest what they are about, and only work locally.
I have a use case where I want to enable high resolution metrics in CloudWatch. This is done via setting the 'StorageResolution' parameter in put-metric-data API to a value from 1 to 60. Currently putMetric only has these parameters (metricName, value, unit) and support for publishing custom metrics is missing.
There is currently no way to set MetricsContext.meta.Timestamp
via metricsScope
.
Assuming the rules for PutMetricData also apply, it is valid to submit metrics that are not equal to new Date()
.
I am currently working around this be reimplementing metricScope()
, instantiating MetricsContext
and MetricsLogger
directly, and passing MetricsContext
to the given handler, so they can do context.meta.Timestamp = ...
.
For my purposes passing a single Date
to metricScope
would suffice.
I'd be happy to put together a PR if you think setting the Timestamp should be possible.
Are there plans to publish an ESM and tree shaken version of the package?
EMF is ~300kb without compression with 200kb just for the new validator
dependency, most of which is not being used.
With SDK V3 being supported in Node18 on Lambda, EMF is now 90% of my bundle.
https://kubernetes.io/docs/concepts/services-networking/service/#discovering-services
Example environment variables for an eks pod named eks-demo.
KUBERNETES_SERVICE_PORT: '443',
KUBERNETES_PORT: 'tcp://10.100.0.1:443',
EKS_DEMO_PORT_80_TCP_PORT: '80',
NODE_VERSION: '10.16.0',
HOSTNAME: 'eks-demo-55f57f865b-l7tcs',
EKS_DEMO_PORT_80_TCP_PROTO: 'tcp',
YARN_VERSION: '1.16.0',
SHLVL: '1',
HOME: '/root',
EKS_DEMO_PORT_80_TCP: 'tcp://10.100.122.110:80',
AWS_EMF_ENABLE_DEBUG_LOGGING: 'true',
KUBERNETES_PORT_443_TCP_ADDR: '10.100.0.1',
PATH:
'/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
AWS_EMF_AGENT_ENDPOINT: 'tcp://127.0.0.1:25888',
KUBERNETES_PORT_443_TCP_PORT: '443',
KUBERNETES_PORT_443_TCP_PROTO: 'tcp',
EKS_DEMO_SERVICE_HOST: '10.100.122.110',
KUBERNETES_PORT_443_TCP: 'tcp://10.100.0.1:443',
KUBERNETES_SERVICE_PORT_HTTPS: '443',
KUBERNETES_SERVICE_HOST: '10.100.0.1',
EKS_DEMO_SERVICE_PORT: '80',
EKS_DEMO_PORT: 'tcp://10.100.122.110:80',
PWD: '/app/src'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.