AWS Cloud Cost Checklist V2
For many companies cloud computing is transformational. The advantages are compelling: improved flexibility, increased responsiveness, and let’s not forget, reduced capital expenditure.
However, the ease and speed of creating servers, databases, load balancers and containers in the cloud often leads to a loss of control and increased costs — sometimes with rude sticker shock.
This is version 2 of the checklist. It has been re-organized from Version 1 with new items to track the rapidly evolving cloud landscape. Thank you to those who contributed items. I try to keep the list tight and focused, but please comment if you have an item that should be added to the list.
Periodically audit your resources. Take inventory and check if you need all the resources and services you have created. It is easy to start things in the cloud and then lose track. This includes: EC2 instances, RDS databases, ELBs, snapshots, ECS tasks, VPCs, security groups, etc. Howto.
Run development, test and staging environments in their own accounts -- separate from production. This makes it much easier to control cloud resources and is also more secure. Use AWS Organizations to consolidate billing. Howto.
Prices can vary considerably across regions. For example: an AWS On-Demand M4.large is $73/month in us-east-1 and $91/month in ap-southeast-2. Choose the cheapest region that is closest to your customers. Howto.
Run AWS Trusted Advisor regularly (perhaps quarterly) to check for excess capacity and security issues. Howto.
Enable billing alerts at 25%, 50% and 75% of your expected monthly budget. That way you’ll quickly be alerted when something gets out of control. Howto.
Choose and re-evaluate the instance type for each application. Instance types can vary in price by orders of magnitude. Howto.
Migrate to the newer instance types. AWS often encourages movement to newer instances types by price. For example: M5.large is $70 in us-east-1 whereas M4.large is $73 in the same region.
Monitor your application performance by CPU, memory and disk to locate excess capacity and the opportunity to downsize the instance type and attached storage requirements.
Spot instances are usually the cheapest instances available and can be up to 80% less than the On-Demand price. But Spot servers are ephemeral and can be terminated with little warning. Use Spot instances for variable, non-base capacity. Spot pricing is cheapest after hours and on weekends in most regions. Be prepared for AWS to reclaim all your Spot instances. Howto, Howto.
Your unvarying On-Demand production capacity can be on reserved instances if not using Spot instances. Pre-pay if possible to lock in the lowest price. Check your bill to make sure you are using all your purchased reserved instance capacity. Howto.
Alternatively, use an AWS Savings Plan to reduce costs up to 70% in exchange for a 1-3 year term commitment on EC2 or Fargate. HowTo.
Scale your Auto Scale groups and database replicas based on load rather than over-provisioning. Consider scaling up if CPU is greater than 60% for 5 minutes and scale down if less than 30% for 20 minutes. Howto, Howto.
Power down all idle resources. Evaluate when your dev, test and staging environments are not required. You can save up to 70% off your cloud bill via this step alone.
Terminate unused ELBs. Use Terraform to destroy and re-create as required.
If using containers, you can reduce your total instance count by packing containers to maximize your CPU and memory utilization on server/cluster instances.
Use serverless patterns for event based services that have peaky load. You don't pay for idle time with serverless or for managing the server.
Don't use serverless for constant, heavy loads. Serverless can be up to 3 times more expansive to operate than a dedicated server for such workloads. Howto.
ELBs are expensive especially if you use one ELB per microservice. You can share a single ALB over multiple services by using different target rules and it works for SSL with multiple certificates. Howto.
Design your deployments to minimize inter-region and inter-zone traffic if you have very high outbound network data rates. Using multiple AZs is essential for availability, but try to localize traffic within the AZs. Howto.
S3 storage can grow over time to be a significant cost. Have policies to regularly examine unwanted S3 storage.
Scan for orphaned EBS snapshots and detached EBS volumes. These often accumulate over time as unattached volumes are not automatically deleted when some instances are terminated.
Set an expiry limit for all CloudWatch logs. The default is to never expire. Howto.
Invest early in tools and automation -- people are expensive. Using the console is great for prototyping, but the payoff for automation in the cloud is large. Automate resource creation, auditing, centralized logging, and security scanning.
Use an automation tool such as CloudFormation or Terraform instead of creating infrastructure manually. This is not only much more secure, but it makes controlling and auditing resources much easier. Howto.
Leverage the 90+ AWS services before reinventing the wheel. AWS building blocks can often be used to get you a solution more quickly and at lower cost than you could by building it yourself.
Know when to leave the cloud. At large consistent scale, on-premises hosting may be preferable to cloud hosting for some some services. Dropbox and Stack Overflow migrated services from the cloud for this reason.
Our SenseDeep Serverless Developer Studio uses this technique in a Watcher Lambda that monitors your Lambdas, runs alarms and ingests log data. We needed the Watcher to be exceptionally fast and not wait for any REST/HTTP API requests. So the Watcher uses this non-waiting technique when sending status back to the SenseDeep service.
To learn more about the SenseDeep Serverless Developer Studio and try it for free, please go to: https://www.sensedeep.com.
Please let me know if you have any comments using similar or different techniques. firstname.lastname@example.org.