RSS Feed Subscribe to RSS Feed


AWS Best Practices Architecting for the Cloud – Concise Summary

The following is a concise summary of Architecting for the Cloud: AWS Best Practices. The original is about 13,000 words; There is an abridged version of about 4,000 words, and this is an even more concise version, at about 1,500 words.


To make the most of the elasticity and agility possible with cloud computing, engineers will have to evolve their architectures to take advantage of the AWS capabilities.

The Cloud Computing Difference

With cloud computing, IT Assets Become Programmable Resources. Resources such as servers, databases and storage, can be instantiated and disposed of within seconds.

And these resources should be Global, Available, and Unlimited Capacity.

Global: you can deploy your application to the AWS Region that best meets your requirements

Available: reduce latency to end users around the world by using CloudFront CDN. You can operate across multiple data centers to achieve high availability and fault tolerance

Unlimited Capacity:  there is virtually unlimited on-demand capacity available to AWS customers


To further that, consider using Higher Level Managed Services. Some services are managed by AWS, which can reduce the dependency on in-house specialized skills, and lower operational complexity and risk.

Design Principles


A scalable architecture should support growth in users, traffic, or data size with no drop in performance. It should provide that scale in a linear manner where adding extra resources results in at least a proportional increase in ability to serve additional load.
There are generally two ways to scale: vertically and horizontally.

Scaling Vertically means increasing the specifications of an individual resource. e.g., resizing to a larger instance type. Not limitless, but easy to implement.


Scaling Horizontally, a.k.a elasticity means increasing in the number of resources e.g., adding more servers, but not all architectures are designed to distribute their workload to multiple resources.

A Stateless Application needs no knowledge of previous interactions and stores no session information. It can scale horizontally since any request can be serviced by any resources. Those resources do not need to be aware of the presence of their peers.

Load can be distributed to stateless nodes in a push or pull manner:

Push model: Use a load balancing solution e.g. ELB or DNS round robin (a la Route 53).

Pull model: Tasks can be streamed or stored in a queue, and asynchronously consumed in a distributed fashion.


In practice, most applications need to maintain some kind of state but you can still make a portion of these architectures stateless using Stateless Components.

e.g., store session info in a cookie that is passed to the server each time. The cookie can contain a unique session identifier, which maps to more detailed user session information server-side. e.g. in a database.


Inevitably, there will be layers of your architecture that are stateful however and will need Stateful Components. You might still be able to scale those components horizontally by distributing load to multiple nodes with “session affinity” – binding all transactions of a session to a specific compute resource.


Use cases that involve processing of very large amounts of data require a Distributed Processing approach. A task and its data are divided into small fragments of work, and each executed on any available compute resource.

Disposable Resources Instead of Fixed Servers

You can take advantage of the dynamically provisioned nature of cloud computing where servers and other components are temporary resources, launched and terminated as needed.

Immutable infrastructure: A server is never updated but instead replaced with a new one that has the latest configuration. Avoids “configuration drift” and ensures resources are always in a consistent (and tested) state.

When it comes to Instantiating Compute Resources, there are a few approaches on how to achieve an automated and repeatable process for the set up of new resources and their configuration and code:

Bootstrapping: Scripts that install software or copy data to bring a new resource to a particular state.

Golden Images: Certain AWS resource types (e.g., EC2 & RDS DB instances) can be launched from an AMI. Compared to the bootstrapping approach, this results in faster start times.

Containers: A Docker Image is a standardized unit for software development, containing everything the software needs to run.

Hybrid: It is possible to use a combination where some parts of the configuration are captured in a golden image, while others are configured dynamically through a bootstrapping action.


You can make your whole infrastructure reusable, maintainable, extensible, and testable by treating Infrastructure as Code. AWS CloudFormation templates, stored in VC, provide a way to create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable fashion.


With AWS there is opportunity for automation to improve reactions to a variety of events:

– Amazon EC2 Auto recovery: You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers it if it becomes impaired.

– Auto Scaling: scale EC2 capacity up or down automatically according to conditions you define.

– Amazon CloudWatch Alarms: You can create a CloudWatch alarm when a particular metric goes beyond a threshold

– Amazon CloudWatch Events: Delivers a stream of system events that describe changes in AWS resources.

Loose Coupling

A desirable attribute of an IT system is that it can be broken into smaller, loosely coupled components. A failure in one component should not cascade to other components.

Reduce interdependencies by allowing components to interact only through specific, technology agnostic Well-Defined Interfaces.

There needs to be a way for each service to be addressed without prior knowledge of their network topology details. Loose coupling is crucial if you want to take advantage of the elasticity of cloud computing. Service discovery can be achieved through ELBs, with their stable hostnames, or using a service registration to allow look up the IP/port of any service e.g. HashiCorp Consul.

Rather than direct point-to-point, Asynchronous Integration can be used where one component generates events and another consumes them. Failure means a message can still be read later and a less scalable service can be protected from spikes.

Another way to increase loose coupling is to build applications in such a way that they support Graceful Failure – handle component failure in a graceful manner. e.g. retries, respond with cached content of failover.

Services, Not Servers

Consider using managed services, or serverless architectures.


Relational databases can scale vertically (e.g., larger instance), or horizontally with read replicas. Data partitioning (aka sharding) can be used to scale write capacity beyond a single DB instance.

If your application primarily indexes and queries data with no need for joins or complex transactions, consider NoSQL databases instead, which trade some of the query and transaction capabilities of relational databases for a more flexible data model that scales horizontally.

For analysis and decision-making from large amounts of data, consider a Data Warehouse, which is a specialized type of relational database, combining data from disparate sources. Scalability and efficiencies are achieved through a combination of massively parallel processing (MPP), columnar data storage, and data compression.

Removing Single Points of Failure

Redundancy (multiple resources for the same task) can be implemented in either standby (failover takes time) or active (requests are distributed; on failure, the rest absorb a larger share) mode.

Data replication introduces redundant copies of data. Synchronous replication acknowledges a transaction after it has been durably stored in both primary replicas. Asynchronous replication decouples the primary node from its replicas at the expense of lag.

Optimize for Cost

Consider both the cheapest AWS instance type needed versus fewer instances of a larger instance type. Benchmark and select the right instance type depending on how your workload utilizes CPU, RAM, network, storage size, and I/O.

Plan to implement Auto Scaling for as many Amazon EC2 workloads as possible, so that you horizontally scale up when needed and scale down and automatically reduce your spend when you don’t need all that capacity anymore. Also, consider which compute workloads you could do serverless.

There are several ways to pay for Amazon EC2 instances that can help you reduce spend.

  • On-Demand
  • Spot Instances
  • Reserved Instances
  • Dedicated Hosts


Consider using Application Data Caching for I/O-intensive database queries or the outcome of computationally intensive processing to improve latency for end users and reduces load on back end systems.

Consider Edge Caching (e.g. via CloudFront CDN) for copies of static and dynamic content  to allow content to be served by the closest infrastructure and to lower latency.


Utilize AWS Features for Defense in Depth, including

  • A VPC topology that isolates parts of the infrastructure through the use of subnets, security groups, and routing controls.
  • A web application firewall (e.g. AWS WAF) to protect your web applications from SQL injection and other vulnerabilities
  • IAM for access control

Utilize Security as Code to script your security policies and reliably deploys it so that it becomes part of your continuous integration pipeline. Perform Real-Time Auditing of your environment by using services like AWS Config, Amazon Inspector, and AWS Trusted Advisor to continually monitor for compliance or vulnerabilities.


Make the most of the AWS platform by considering important principles and design patterns: from how to select the right database for your application, to architecting applications that can scale horizontally and with high availability.

Tags: ,

Leave a Reply