SenseDeep — a case study on building a secure web service
This overview describes how SenseDeep is implemented. It is an example of how you can build a secure web service for a small to medium sized site. If you see things we could improve, please let us know. We are always learning.
There is nothing remarkable or revolutionary in how SenseDeep is designed and implemented. We have tried to keep the design simple and have used well proven patterns. This helps to reduce the attack surface and eliminate mistakes.
Before starting, if you want an overview of SenseDeep, please read Inside SenseDeep. You may also like to read our Web Developer Checklist which summarizes many of the important items in building a secure web service.
SenseDeep Security Service
SenseDeep is a security service that identifies vulnerabilities in your site and helps you eliminate those vulnerabilities. It detects attacks and compromises in real-time and automatically invokes defenses. It stores important event information and records log files for forensic analysis if required.
The SenseDeep security service manages your AWS site and any on-premises servers you may have. It uses deep hooks into many AWS services in order to provide real-time automated security management.
SenseDeep Site Composition
The SenseDeep service is implemented on an AWS EC2 cluster running over multiple availability zones. The site runs in a single AWS region but uses AWS CloudFront as a CDN.
The service uses CloudFlare in bypass mode as a DNS provider. In the event of a Denial of Service (DoS) Attack, it is enabled as a proxy to provide a good measure of DoS protection for the duration of the attack.
We use AWS AutoScale and custom monitoring to create a reliable, master-less Docker cluster from EC2 instances. We did not choose to use AWS ECS or other Docker cluster providers, as we wanted a bit more control over the scale and placement of our services. As these providers mature their offerings, we may utilize them in the future.
SenseDeep server instances use AWS instance IAM roles that define and constrain the abilities of the services running on those instances. All instances use SenseDeep and Cloud Watch logs for log capture. Cloud Trail is actively managed to monitor account changes.
The SenseDeep service is decomposed into microservices implemented via Docker containers. We don't believe in very small microservices. However, some of our services are currently a little larger than we would like and we will probably split them up a little more in the future.
The SenseDeep microservices are:
- SenseDeep Web App — supports the browser based SenseDeep application.
- SenseDeep Agent Manager — manages the server agents.
- SenseDeep Watch — receives real-time AWS CloudWatch events.
- SenseDeep Audit — performs scheduled and on-demand account configuration audits.
- SenseDeep Admin — performs account and database maintenance.
Each of these microservices is run on our EC2 cluster over multiple server instances spread over several availability zones for reliability and availability.
The SenseDeep database uses MySQL hosted via an AWS Aurora cluster using running in multiple availability zones. We chose Aurora for its strong availability and performance characteristics coupled with excellent MySQL compatibility.
We use Aurora encryption at rest for physical security (such as for when drives are decommissioned). To protect sensitive data against breaches via remote database access, we use column level encryption for important data such as access tokens, billing details and email addresses. Passwords are hashed using bcrypt.
SenseDeep App Portal
The SenseDeep app is the primary user interface to the SenseDeep service and it provides a security status overview and manager interface. It our primary security dashboard for us to monitor the service.
SenseDeep aggregates the security status for each server and service and provides an overall account security status. This is presented as an Attacks gauge which indicates if the service is being or has been attacked and compromised. The Threats gauge represents latent threats and indicates if the site is vulnerable to future attacks. These status gauges are automatically updated in real-time in response to any changes on the site.
Nginx + NodeJS + TypeScript + Express + Aurelia
The SenseDeep application is a NodeJS Express application written in TypeScript and running in a Docker container on an EC2 cluster.
The Nginx server is responsible for serving static content and for proxying requests to the Node applications. Static files are then cached via the AWS CloudFront CDN.
Content is minified and pre-gzipped as part of the build process. We use the Expansive Static Site Generator for static content preparation.
The Nginx servers are run from Docker containers behind an AWS ALB that terminates TLS client connections. The containers are scaled using AutoScale.
Security HTTP Headers
The Nginx server and Node application define the following HTTP security headers to minimize the degrees of freedom for clients:
- Set-Cookie SameSite HttpOnly Secure
Node and Express
The Node application is written using Typescript in an ES6 2017 subset. We extensively use the "async wait" pattern and have found it dramatically simplifies node programs. In our case, the performance cost is well worth it, and we expect async wait to get faster as implementations are optimized. We believe the simplified calling sequence which avoids callback hell results in a more secure transparent application.
We use a limited and audited set of packages and we closely monitor our dependencies. Instead of using an ORM package, we use a custom ORM that also performs extensive data validation and encryption services. It also handles JSON object conversions.
We apply rate limiting on slow APIs using express-rate-limit and we apply canary checks on our APIs to detect illegal or abnormal requests.
Log files from all the microservices are captured and stored centrally in Cloud Watch Logs via the SenseDeep agent. The agent will capture all system log data from the SystemD Journal. We also capture the agent log file and Docker container logs.
We use a one year lifespan on log data. Conveniently, cloud Watch will automatically purge old log events after they expire.
Building the Site
We believe strongly in the benefits of Immutable Infrastructure. This means that once a server is deployed, it is never modified, patched or upgraded. It is merely replaced with a new updated instance if required. The benefits of this approach mean that we can immediately detect unauthorized modifications to our infrastructure. Our EC2 instances and Docker containers are immutable. A secondary benefit is that it greatly simplifies our implementation — we never need to do live patching or upgrading.
All infrastructure is created via Terraform. We do not use the AWS console for creating or modifying any cloud configuration. Infrastructure should be defined as "code" and should recreated at the push of a button.
The Terraform configuration files define:
- VPC networks, peers and routing tables
- Security Groups
- IAM users, roles and policies
- Databases and Redis clusters
- EC2 AutoScale groups and launch configurations
- ALB load balancers and target groups
- EC2 instances
- SNS topics
- CloudFlare DNS endpoints
By using an immutable infrastructure as code paradigm, we can audit our cloud configuration for any changes and rapidly regenerate any component without fear of human error.
Using terraform makes it trivial to replicate production environments for staging and test. To reduce cost, we spin these up and down as required. We automatically turn off all unused servers after-hours.
We do not expose any SSH endpoints on any servers. If really required, we temporarily add a "Support" security group to a specific instance. The SenseDeep service will notice this and elevate the security status while it remains. In any case, if this SSH access is forgotten, the next time we deploy and Terraform is run, it will automatically remove this security group from the instance.
Our Security Incident Plan
We have one ;-)