Debugging serverless apps is different and difficult.
If you have not properly prepared, debugging some requests may be impossible with your current level of instrumentation and tooling.
So what is the best way to debug serverless apps and serverless requests?
The blog post describes how we debug the serverless backend for our SenseDeep Serverless Developer Studio service and what tools we use with our NodeJS serverless apps.
Some of the suggestions here may not be suitable for all sites, but I hope they give you ideas to improve your ability to debug your serverless apps.
But first, some background.
Serverless Computing Changes Things
The way enterprises design, debug, and ship applications changed forever when serverless computing arrived on the scene. The introduction of serverless allowed developers to build and ship much faster, which in turn allows them to concentrate on coding rather than maintenance, auto-scaling, and server provisioning.
Serverless is now mainstream and it offers many compelling advantages, but debugging serverless apps is still a tough problem as the required tools have not kept pace.
What is different about serverless debugging?
Debugging a server-based app (monolith) is well understood and assisted by a suite of refined tools that have been created over decades of evolution. Debugging can be performed using IDEs either locally or remotely over SSH and other protocols. Importantly, server state is typically persisted after requests to permit live or after-the-fact inspection and debugging.
Serverless is not so lucky
Serverless is different. It is often more complex with multiple interacting, loosely coupled components. Instead of a monolith, it is more like a spiders web and consequently harder to follow a single request with no ability for live debugging or setting breakpoints to intercept code execution.
Debugging serverless via remote shell or execution is not possible. Serverless requests are ephemeral and request state is not available after execution. Once a request has completed, there is no state left behind to examine.
So when a serverless request errors, it often fails silently or any record is buried in a mountain of log data in CloudWatch. You will not be proactively notified and if you are informed by the end user, finding the failed request is a "needle in a haystack" type of problem.
Furthermore, comprehensive local emulation of cloud environments is either not possible or has severe limitations. As cloud services become more evolved, local emulation is becoming increasingly limited in scope and application.
So we need a different technique to debug serverless compared to server-based app debugging.
Serverless Debugging Technique
The primary means of debugging serverless apps is via detailed, intelligent request and state logging, paired with log management to correlate log data and quickly query to locate events of interest.
Whether your code is under development, is being tested as part of unit or integration tests, or in production, detailed logging is the foundation for you to see and understand what is actually happening in your code for each request.
Detailed, intelligent logging is the foundation for serverless observability.
Observability is the ability to understand your system's unknown unknowns. i.e. the ability to understand not just expected errors, but also unforeseeable conditions. It is not enough to just log simple, plain text error messages.
Our understanding of all the possible the failure modes of serverless apps is limited by the extent of our knowledge today. Serverless apps can fail in many surprising ways that we may not yet understand and cannot foresee.
So to meet the debugging needs of tomorrow, you need to log much more information and context than you might realize. Your app must emit sufficient state and information to permit debugging a wide range of unexpected conditions in the future without having to stop and redeploy code.
Fortunately, this is easy to do.
How to Log? Extensively.
Often developers add
diagnostic logging as an afterthought via sprinkled
console.log to emit simple messages is almost useless in achieving
true Observability. Instead, what is required is the ability to emit log
messages in a structured way with extra context that captures detailed request
Every time a developer uses
console.logan angel cries in heaven.
In addition to the log message, log events should emit additional context information. To achieve this, the output format must be structured. JSON is an ideal format as most log viewers can effectively parse JSON and understand the context emitted.
This would emit something like:
There are many log libraries that are capable of emitting structured log context data. For SenseDeep, we use the SenseLogs library which is an exceptionally fast logging library designed for serverless. It has a flexible, simple syntax that makes adding detailed log events easy.
SenseLogs emits log messages with context and stack traces in JSON. It can also be dynamically controlled as to which log messages are emitted at runtime without having to redeploy code.
Here is a mock Lambda which initializes SenseLogs and emits various sample log messages:
messages into channels such as
error. These channels can be enabled or disabled via Lambda
environment variables. In this way, SenseLogs can dynamically scale up or down
the volume of log data according.
Catch All Errors
It is very important to catch all errors and ensure they are reported. Lambdas should catch all exceptions and not rely on the default Lambda and language runtimes to catch errors. Employing a top-level catch can ensure that the exception report includes request state and correlation IDs.
All log data has a cost and extensive logging can significantly impact the performance of your Lambdas and excessive log storage can cause rude billing surprises. Logging extensive state for all requests is typically prohibitively expensive.
A better approach is to enable a solid baseline of log data and to scale up and/or focus logging when required without redeploying code. This is called Dynamic Logging and it is important to be able to do this without the risk of redeploying modified code.
implements Dynamic Logging by listening to a Lambda
LOG_FILTER environment variable. This
variable may be set to a list of log channels to enable. By default, it is set
to enable the following channels:
If the Lambda
LOG_FILTER variable is modified, the next and subsequent Lambda invocations will use
the adjusted channel filter settings. In this manner, you can scale up and down
the volume and focus of your log messages without redeploying code.
You can use
custom channels via
log.emit to provide log coverage for specific paths in your code
and then enable those channels on demand.
SenseLogs has two other environment variables which give more control over the log messages that are emitted.
LOG_OVERRIDE is a convenient way to temporarily boost log messages that will automatically revert to the base level of logging when the duration completes. LOG_OVERRIDE is set to a Unix epoch timestamp (seconds since Jan 1 1970) when the override will expire, followed by a list of channels.
LOG_SAMPLE is set to a percentage of requests to sample followed by a list of channels to add to the list of LOG_FILTER for sampled requests.
See the SenseLogs Documentation for more details.
You can modify environment variables with the AWS console:
But our SenseDeep serverless developer studio provides a much more convenient interface for to manage Lambda LOG_FILTER settings for a single Lambda or for multiple Lambdas.
Your serverless apps should emit a good base of request state for all requests.
This should include the request parameters, request body and other high priority state information. The volume of log data should be low enough so that the cost of log ingestion and storage is not a significant burden.
code modules or less mature code, we include additional logging on custom
channels that correspond to each module or feature. This can then be enabled
For a small percentage
of requests, we log the full request state so that we always have some record
of all the state values. We use a custom SenseLogs channel called
complete and we
sample those requests at a low percentage rate.
Serverless apps often have multiple Lambda functions or services that cooperate to respond to a single client request. Requests that originate with a single client request may traverse through many services and these services may be in different AWS regions or accounts.
A single request will often traverse through multiple AWS services such as: API Gateway, one more Lambda functions, Kinesis streams, SQS queues, SNS messages, EventBridge events and other AWS services. The request may fan out to multiple Lambdas and results may be combined back into a single response.
To trace and debug an individual requests, we add a request ID to all our log events. This way we can filter and view a complete client request that flows through multiple Lambda services.
We add the trace IDs by using the SenseLogs API:
SenseLogs will extract the API Gateway request ID, Lambda request ID and X-Ray trace ID. SenseLogs will map these IDs to x-correlation-NAME SenseLogs context variables suitable for logging. The following variables are automatically detected and mapped:
· x-correlation-api — API Gateway requestId
· x-correlation-lambda — Lambda requestId
· x-correlation-trace — X-Ray X-Amzn-Trace-Id header
· x-correlation-extended — AWS extended request ID
SenseLogs will define a special variable 'x-correlation-id' that can be used as a stable request ID. It will be initialized to the value of the X-Correlation-ID header and if not defined, SenseLogs will use (in order) the API Gateway request or X-Ray trace ID.
SenseLogs also supports adding context state to the logger so you don't have to specify it on each logging API call. Thereafter, each log call will add the additional context to the logged event output.
Finally we define a client ID that is passed from the end user so we can track a "user request" through all AWS services that are invoked.
Log Viewing and Querying
Once you have added detailed logging with correlation IDs to your Lambdas, you need an effective log viewer to display and query log events.
CloudWatch just won't cut it.
For our logging needs, we need a log viewer that can:
Our product, SenseDeep has such a log viewer that we use to monitor and debug the SenseDeep backend service.
Manually monitoring logs for app errors and signs of trouble is not scalable or reliable. An automated alarm mechanism when paired with detailed logging is the ideal platform to provide 24x7 oversight of your serverless apps.
We use such an automated approach. We configure multiple SenseDeep alarms to search for specific flags in log data that indicate errors or potential issues. If an alarm is triggered, an alert will immediately notify the relevant developer with the full context of the app event.
We instrument our code to not only capture and log errors, but also with "asserts" that test expected conditions and will emit log data that will trip alarms if unexpected conditions arise.
To reliably trigger alarms, it is necessary to have a unique property value that the alarm mechanism can detect.
The SenseLogs logging library supports a flagging mechanism where log events to specific log channels can add a unique matchable "flag" to the log data. Alarms can then reliably test for the presence of such a flag in log data.
SenseLogs to add property flags for errors and asserts. By default, SenseLogs
will add a
FLAG_ERROR property for log events
For other log channels, the flag option can be set to a map of channels, and the nominated channels will be flagged with the associated value string. For example:
This will emit:
We then create a SenseDeep alarm to match
FLAG_ERROR and other
We configure alarms for generic errors and exceptions as well as specific asserts and state conditions.
CloudWatch Alarms do not have the ability to efficiently monitor log data for specific log data patterns.
Fortunately, SenseDeep was built for this purpose.
SenseDeep developers create alarms to detect flagged errors when errors and unexpected conditions are encountered.
As SenseDeep ingests log data, it runs configured alarms to match app errors in real-time.
When an alarm triggers it generates an alert to notify the developer via email, SMS or other notification means.
The alert contains the full details of the flagged condition.
Clicking on the notification link will launch SenseDeep and display full details of the alert that triggered the alarm.
To see the
log entries for the invocation, click on
If you want to
see the preceding or subsequent log entries, click
All Logs which will launch the log
viewer homed to the invocation log data.
Using this serverless debugging pattern, we can detect problems early, typically before users are even aware of an issue.
The automated monitoring offloads the burden of "baby-sitting" a service and gives developers the tools they need to ensure their apps are performing correctly.
CloudWatch Insights and X-Ray
You may wonder if you can use the native AWS services: CloudWatch and Xray to implement this serverless debugging pattern.
The short answer is you cannot.
CloudWatch does not aggregate log streams or groups. It is slow and only offers primitive query capabilities.
CloudWatch Insights does have the ability to search for correlation IDs in logs in a single region. But it is very, very slow and you cannot then see the request in context with the requests before or after the event. CloudWatch Insights is most useful for custom one-off queries, but not for repeated debugging efforts.
CloudWatch alarms cannot trigger alerts based on log data patterns. You can pair it with CloudWatch Insights, but this is a slow one-off solution and does not scale to process all log data for matching events.
Similarly, X-Ray has many good use cases, but because it only samples 1 request per second and 5% of additional requests, it cannot be used to consistently debug serverless applications. It is great for observing complete outages when an entire service or component is offline, but not for failing individual requests.