RSS Feed Subscribe to RSS Feed


AWS Best Practices Architecting for the Cloud – Abridged

The following is an abridged version of Architecting for the Cloud: AWS Best Practices.This is essentially a cut and paste of the most salient parts (the original is about 13,000 words; this is about 4,000). For an even more concise version, see the concise summary (about 1,500 words).



The whitepaper provides architectural patterns and advice on how to design systems that are secure, reliable, high performing, and cost efficient. It includes a discussion on how to take advantage of attributes that are specific to the dynamic nature of cloud computing (elasticity, infrastructure automation, etc.).


Migrating applications to AWS, even without significant changes (“lift and shift”), provides a secured and cost-efficient infrastructure. However, to make the most of the elasticity and agility possible with cloud computing, engineers will have to evolve their architectures to take advantage of the AWS capabilities.

This paper will highlight the principles to consider whether you are migrating existing applications to AWS or designing new applications for the cloud.

The Cloud Computing Difference

This section reviews how cloud computing differs from a traditional environment and why those new best practices have emerged.

IT Assets Become Programmable Resources

In a non-cloud environment you provision capacity based on a guess of a theoretical maximum peak. This can result in periods where resources are idle or of insufficient capacity. With cloud computing, you can access as much or as little as you need, and dynamically scale to meet actual demand, while only paying for what you use.

On AWS, resources such as servers, databases and storage, can be instantiated and disposed of within seconds.

Global, Available, and Unlimited Capacity

Using the global infrastructure of AWS, you can deploy your application to the AWS Region that best meets your requirements.

For global applications, you can reduce latency to end users around the world by CloudFront CDN.

It is also easier to operate production applications and databases across multiple data centers to achieve high availability and fault tolerance.

And there is virtually unlimited on-demand capacity available to AWS customers.

Higher Level Managed Services

AWS services reduce dependency on in-house specialized skills, and can lower operational complexity, risk and cost.

Security Built In

The AWS cloud provides governance capabilities that enable continuous monitoring of configuration changes.
Your security policy can be formalized and embedded with the design of your infrastructure. Architects can leverage a plethora of native AWS security and encryption features that can help achieve higher levels of data protection and compliance.

Design Principles


A scalable architecture should support growth in users, traffic, or data size with no drop in performance. It should provide that scale in a linear manner where adding extra resources results in at least a proportional increase in ability to serve additional load.
There are generally two ways to scale: vertically and horizontally.

Scaling Vertically

Increase the specifications of an individual resource. On Amazon EC2, this can be achieved by stopping an instance and resizing it to an instance type that has more RAM, CPU, IO, or networking capabilities. This way of scaling can eventually hit a limit. However, it is very easy to implement and can be sufficient for many use cases especially in the short term.

Scaling Horizontally

Scaling horizontally takes place through an increase in the number of resources (e.g., adding more servers, or more hard drives to a storage array). aka elasticity. Not all architectures are designed to distribute their workload to multiple resources, so let’s examine some of the possible scenarios.

Stateless Applications

An application that needs no knowledge of previous interactions and stores no session information. e.g., given the same input, provides the same response to any user. A stateless application can scale horizontally since any request can be serviced by any resources. Those resources do not need to be aware of the presence of their peers.

How to distribute load to multiple nodes

Push model: Use a load balancing solution like the Elastic Load Balancing (ELB) service. ELBs route requests across multiple EC2 instances. An alternative approach would DNS round robin (e.g., with Amazon Route 53).

Pull model: Asynchronous event-driven workloads can use a pull model instead. Tasks that need to be performed can be stored as messages in a queue using SQS or as a streaming data solution like Amazon Kinesis. Multiple compute nodes can then pull and consume those messages, processing them
in a distributed fashion.

Stateless Components

In practice, most applications need to maintain some kind of state.
e.g., track whether a user is signed in. You can still make a portion of these architectures stateless by not storing anything in the local file system that needs to persist for more than a single request.

For example, web applications can use HTTP cookies to store information about a session. The browser passes that information back to the server at each subsequent request so that the application does not need to store it.
1) The content of the HTTP cookies needs to be treated as untrusted data and validated.
2) HTTP cookies are transmitted with every request, so their size needs to be kept to a minimum

Consider only storing a unique session identifier in a HTTP cookie and storing more detailed user session information server-side. e.g. in a database.

Stateful Components

Inevitably, there will be layers of your architecture that are stateful. e.g., by definition, databases are stateful. In addition, many legacy applications were designed to run on a single server due to relying on local compute resources.

You might still be able to scale those components horizontally by distributing load to multiple nodes with “session affinity” – binding all transactions of a session to a specific compute resource. But there are of course limitations:
a) Existing sessions do not directly benefit from the newly launched compute nodes.
b) Session affinity cannot be guaranteed e.g. server crash

How to implement session affinity

For HTTP/S traffic, session affinity can be achieved through the “sticky sessions” feature of ELB4 where it will attempt to use the same server for that user for the duration of the session.

Another option is to use clientside load balancing. This adds extra complexity but can needed if for example you are using a protocol not supported by ELB or you might need full control on how users are assigned to servers.

Distributed Processing

Use cases that involve processing of very large amounts of data (e.g., anything that can’t be handled by a single compute resource in a timely manner) require a distributed processing approach. By dividing a task and its data into many small fragments of work, you can execute each of them in any available compute resource.

How to implement distributed processing

Offline batch jobs can be horizontally scaled by using a distributed data processing engine like Apache Hadoop. On AWS, you can use the Amazon Elastic MapReduce (Amazon EMR) service to run Hadoop workloads on top of a fleet of EC2 instances. For real-time processing of streaming data, Amazon Kinesis partitions data in multiple shards that can then be consumed by multiple resources.

Disposable Resources Instead of Fixed Servers

In a traditional infrastructure environment, you have to work with fixed resources with long lead times.
Common practices include manually logging in to servers to configure
software (resulting in “configuration drift”) or fix issues and hardcoding IP addresses.
With AWS, you can take advantage of the dynamically provisioned nature of cloud computing.
You can think of servers and other components as temporary resources, launched and terminated as needed.

Use the immutable infrastructure pattern: A server is never updated throughout its lifetime. Instead, when needed, it is replaced with a new one that has the latest configuration. In this way, resources are always in a consistent (and tested) state and rollbacks become easier to perform.

Instantiating Compute Resources

It is important to automate the set up of new resources and their configuration and code.  There are a few approaches on how to achieve an automated and repeatable process.


When you launch an AWS resource, you start with a default configuration. You can then execute automated bootstrapping actions. That is, scripts that install software or copy data to bring that resource to a particular state. You can parameterize configuration details that vary between different environments (e.g., production, test, etc.) so that the same scripts can be reused without modifications.

Golden Images

Certain AWS resource types (e.g., EC2 & RDS DB instances, EBS volumes) can be launched from a golden image: a snapshot of a particular state of that resource. When compared to the bootstrapping approach, a golden image results in faster start times and removes dependencies to configuration services or third-party repositories.

You can customize an Amazon EC2 instance and then save its configuration by creating an AMI. Each time you want to change your configuration you will need to create a new golden image.


Another option popular with developers is Docker. Docker allows you to package a piece of software in a Docker Image, which is a standardized unit for software development, containing everything the software needs to run.


It is possible to use a combination where some parts of the configuration are captured in a golden image, while others are configured dynamically through a bootstrapping action.

Bootstrapping vs golden image

Golden image: Items that do not change often or that introduce external dependencies will typically be part of your golden image. e.g., your web server is best placed in an image rather than being downloaded from a third-party repository each time you launch an instance.

Bootstrapping: Items that change often or differ between your various environments can e.g., database config new versions of your application (creating a new AMI might be overkill)

AWS Elastic Beanstalk follows the hybrid model. It provides preconfigured run time environments from an AMI but allows you to run bootstrap actions (through configuration files).

Infrastructure as Code

The application of all these principles does not have to be limited to the individual resource level. Since AWS assets are programmable, you can make your whole infrastructure reusable, maintainable, extensible, and testable.
AWS CloudFormation templates provide a way to create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable fashion. Your CloudFormation templates can live with your app in VCS, allowing architectures to be reused and production environments to be reliably cloned for testing.


In a traditional IT infrastructure, you would often have to manually react to a variety of events. With AWS there is opportunity for automation to improve both your system’s stability and the efficiency of your organization:

– AWS Elastic Beanstalk: A simple way to get an application up and running. Upload the application code and the service automatically handles all the details, such as resource provisioning, load balancing, auto scaling, and monitoring.

– Amazon EC2 Auto recovery: You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers it if it becomes impaired. A recovered instance is identical to the original instance.

– Auto Scaling: You can maintain application availability and scale your Amazon EC2 capacity up or down automatically according to conditions you define.

– Amazon CloudWatch Alarms: You can create a CloudWatch alarm that sends an SNS message when a particular metric goes beyond a threshold for a specified number of periods.

– Amazon CloudWatch Events: Delivers a stream of system events that describe changes in AWS resources. Using simple rules, you can easily route each type of event to one or more targets: Lambda, Kinesis, Amazon SNS etc.

– AWS OpsWorks Lifecycle events: AWS OpsWorks supports continuous configuration through lifecycle events that automatically
update your instances’ configuration to adapt to environment changes.

– AWS Lambda Scheduled events: These events allow you to create a Lambda function and direct AWS Lambda to execute it on a regular

Loose Coupling

A desirable attribute of an IT system is that it can be broken into smaller, loosely coupled components. A failure in one component should not cascade to other components.

Well-Defined Interfaces

A way to reduce interdependencies is to allow components to interact with each other only through specific, technology agnostic interfaces e.g., RESTful APIs. Technical implementation detail is hidden so that teams can modify the underlying implementation without affecting other components. As long as those interfaces maintain backwards compatibility, deployments of difference components are decoupled.

Service Discovery

Because services can be running across multiple compute resources, there needs to be a way for each service to be addressed. Traditionally, you would hardcode IP addresses. But if those services are meant to be loosely coupled, they should be able to be consumed without prior knowledge of their network topology details. This hides complexity and allows infrastructure details to change at any time. Loose coupling is crucial if you want to take advantage of the elasticity of cloud computing.

How to implement service discovery

A simple way to achieve service discovery is through the Elastic Load Balancing service. Because each load balancer gets its own hostname you now have the ability to consume a service through a stable endpoint.

Another option would be to use a service registration and discovery method to allow retrieval of the endpoint IP addresses and port number of any given service. Example open source tools include Netflix Eureka, Airbnb Synapse, or HashiCorp Consul.

Asynchronous Integration

Asynchronous integration is another form of loose coupling between services. Suitable for any interaction that does not need an immediate response and where an acknowledgement that a request has been registered will suffice. One component generates events and another consumes them. Rather than direct point-to-point, the two components integrate through an intermediate durable storage layer (e.g., SQS queue or a streaming data platform like Kinesis).

This approach decouples the two components and introduces additional resiliency. e.g., failure means a message can still be read later; A less scalable service can be protected from spikes e.g. don’t need to scale your database to accommodate an occasional peak of write queries if OK to eventually process those queries asynchronously. Finally, by moving slow operations off of interactive request paths you can also improve the end-user experience.

Graceful Failure

Another way to increase loose coupling is to build applications in such a way that they handle component failure in a graceful manner.

Graceful failure in practice

A request that fails can:

  • Be retried with an exponential backoff and Jitter strategy
  • Be stored in a queue for later processing.
  • Get a response of alternative or cached content e.g. on db failure.
  • Failed over to a backup site or service e.g. The Amazon Route 53 DNS failover

Services, Not Servers

Developing, managing, and operating applications—especially at scale—requires a wide variety of underlying technology components. With traditional IT infrastructure, companies would have to build and operate all those components.

AWS includes services that provide building blocks that developers
can consume to power their applications. Another approach that can reduce the operational complexity of running applications is that of the serverless architectures.


The following questions can help you take decisions on which database solutions to utilize:

  • Is this a read-heavy, write-heavy, or balanced workload?
  • How much data will you need to store and for how long? Is there an upper limit in the foreseeable  future?
  • What are the requirements in terms of durability of data?
  • What are your latency requirements?
  • What is your data model and how are you going to query the data?
  • Do you need strong integrity controls or are you looking for more flexibility (e.g., schema-less data
  • Are your developers more familiar with relational databases than

Relational Databases

Relational databases normalize data into well-defined tabular structures known as tables, which consist of rows and columns. They provide a powerful query language, flexible indexing capabilities,
strong integrity controls, and the ability to combine data from multiple tables in a fast and efficient manner.


Relational databases can scale vertically (e.g., by upgrading to a larger instance or adding more/faster storage).

For read-heavy applications, you can also horizontally scale beyond
the capacity constraints of a single DB instance by creating one or more read replicas.

Workloads that need to scale their write capacity beyond the constraints of a single DB instance require data partitioning aka sharding. Data is split across multiple database schemas which introduces some complexity to the application. The application’s data access layer will need to be modified to have awareness of how data is split so that it can direct queries to the right instance. In addition, any schema changes will have to be performed across multiple database schemas so it is worth automating this process.

High Availability

For any production relational database, we recommend the use of the Amazon RDS Multi-AZ deployment feature, which creates a synchronously replicated standby instance in a different Availability Zone (AZ). In case of failure of the primary node, Amazon RDS performs an automatic failover to the standby without the need for manual administrative intervention.


If your application primarily indexes and queries data with no need for joins or complex transactions, consider a NoSQL database instead. If you have large binary files (audio, video, and image), consider S3.

NoSQL Databases

NoSQL describes databases that trade some of the query and transaction capabilities of relational databases for a more flexible data model that scales horizontally. NoSQL databases utilize a variety of data models, including graphs, key-value pairs, and JSON documents.

Data Warehouse

A data warehouse is a specialized type of relational database, optimized for analysis and reporting of large amounts of data. It can be used to combine transactional data from disparate sources, making them available for analysis and decision-making.


Amazon Redshift achieves efficient storage and performance through a combination of massively parallel processing (MPP), columnar data
storage, and targeted data compression encoding schemes. It is particularly suited to analytic and reporting workloads against very large data sets.


A search service can be used to index and search both structured and free text format and can support functionality that is not available in other databases, such as customizable result ranking, faceting for filtering, synonyms, stemming, etc. On AWS, you have the choice between Amazon CloudSearch and Amazon Elasticsearch Service (Amazon ES).

Removing Single Points of Failure

A system is highly available when it can withstand the failure of an
individual or multiple components. This section discusses high availability design patterns.


Single points of failure can be removed by introducing redundancy, which is having multiple resources for the same task. Redundancy can be implemented in either standby (failover takes time) or active (requests are distributed; on failure, the rest absorb a larger share) mode.

Detect Failure

You should aim to build as much automation as possible in both detecting and reacting to failure. You can use services like ELB and Amazon Route53 to configure health checks and mask failure by routing traffic to healthy endpoints. In addition, Auto Scaling can be configured to automatically replace unhealthy nodes.

Durable Data Storage

It is crucial that your architecture protects both data availability and integrity. Data replication is the technique that introduces redundant copies of data. Replication can take place in a few different modes.

Synchronous replication only acknowledges a transaction after it has been durably stored in both the primary location and its replicas.

Asynchronous replication decouples the primary node from its replicas at the expense of introducing replication lag.

Optimize for Cost

This section discusses the main principles of optimizing for cost with AWS cloud computing.

Right Sizing

In some cases, you should select the cheapest AWS instance type that suits your workload’s requirements. In other cases, using fewer instances of a larger instance type might result in lower total cost or better performance. You should benchmark and select the right instance type depending on how your workload utilizes CPU, RAM, network, storage size, and I/O.

Cost optimization is an iterative process. Your application and its usage will evolve through time. In addition, AWS iterates frequently and regularly releases new options.


Another way you can save money with AWS is by taking advantage of the platform’s elasticity. Plan to implement Auto Scaling for as many Amazon EC2 workloads as possible, so that you horizontally scale up when needed and scale down and automatically reduce your spend when you don’t need all that capacity anymore. In addition, you can automate turning off non-production workloads when not in use34. Ultimately, consider which compute workloads you could implement on AWS Lambda so that you never pay for idle or redundant resources.

Purchasing Options

(Note this section has been updated from the original whitepaper, which seems out of date. Instead I have referenced

There are several ways to pay for Amazon EC2 instances that can help you reduce spend.


With On-Demand instances, you pay for compute capacity by per hour or per second depending on which instances you run. No longer-term commitments or upfront payments are needed. You can increase or decrease your compute capacity depending on demands.

Spot Instances

Amazon EC2 Spot instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price.

Spot instances are recommended for applications that:

  • have flexible start and end times
  • are only feasible at very low compute prices

Reserved Instances

Reserved Instances provide you with a significant discount (up to 75%) compared to On-Demand instance pricing.

Reserved Instances are recommended for:

  • Applications with steady state usage
  • Applications that may require reserved capacity
  • Customers that can commit to using EC2 over a 1 or 3 year term to reduce their total computing costs

Dedicated Hosts

A Dedicated Host is a physical EC2 server dedicated for your use. Dedicated Hosts can help you reduce costs by allowing you to use your existing server-bound software licenses e.g., Windows Server, SQL Server, and can also help you meet compliance requirements.

Can be purchased On-Demand (hourly) or as a Reservation for up to 70% off the On-Demand price.


Caching is a technique that stores previously calculated data for future use.

Application Data Caching

Applications can be designed so that they store and retrieve information from fast, managed, in-memory caches. Cached information may include the results of I/O-intensive database queries or the outcome of computationally intensive processing. When the result set is not found in the cache, the application can calculate it or retrieve it from a database and store it in the cache for subsequent requests. Caches can improve latency for end users and reduces load on back end systems.

Edge Caching

Copies of static and dynamic content  can be cached at Amazon CloudFront, which is a content delivery network (CDN) consisting of multiple edge locations around the world. Edge caching allows content to be served by infrastructure that is closer to viewers, lowering latency and giving you the high, sustained data transfer rates. Requests for your content are carried back to Amazon S3 or your origin servers. If the origin is running on AWS then requests will be transferred over optimized network paths for a more reliable and consistent experience. Amazon CloudFront can be used to deliver your entire website, including non-cachable content.

The benefit in that case is that Amazon CloudFront reuses existing connections between the Amazon CloudFront edge and the origin server reducing connection setup latency for each origin request.


AWS is a platform that allows you to formalize the design of security controls in the platform itself, making your environment much easier to audit in a continuous manner. This section gives you a high-level overview of AWS security best practices.

Utilize AWS Features for Defense in Depth

AWS provides a wealth of features that can help architects build defense in depth.
Starting at the network level you can build a VPC topology that isolates parts of the infrastructure through the use of subnets, security groups, and routing controls. Services like AWS WAF, a web application firewall, can help protect your web applications from SQL injection and other vulnerabilities in your application code. For access control, you can use IAM to define a granular set of policies and assign them to users, groups, and AWS resources.

Offload Security Responsibility to AWS

With the AWS shared security responsibility model, where AWS is responsible for the security of the underlying cloud infrastructure and you are responsible for securing the workloads you deploy, you can reduce the scope of your responsibility and focus on your core competencies through the use of AWS managed services.

Reduce Privileged Access

Treating servers as programmable resources brings benefits in the security space as well. If an instance experiences an issue you can
automatically or manually terminate and replace it.

Another common source of security risk is the use of service accounts. In a traditional environment, service accounts would often be assigned long-term credentials stored in a configuration file. On AWS, you can instead use IAM roles to grant permissions to applications running on Amazon EC2 instances through the use of short-term credentials. Those credentials are automatically distributed and rotated.

Security as Code

Traditional security frameworks, regulations, and organizational policies define security requirements related to things such as firewall rules, network access controls, internal/external subnets, and operating system hardening.

You can create an AWS CloudFormation script that captures your security policy and reliably deploys it. Security best practices can now be reused among multiple projects and become part of your continuous integration pipeline. You can perform security testing as part of your release cycle, and automatically discover application gaps and drift from your security policy.

Real-Time Auditing

Testing and auditing your environment is key to moving fast while staying safe. On AWS, it is possible to implement continuous monitoring and automation of controls to minimize exposure to security risks. Services like AWS Config, Amazon Inspector, and AWS Trusted Advisor continually monitor for compliance or vulnerabilities.

AWS CloudTrail is a web service that records API calls to supported AWS services in your AWS account and delivers a log file to your Amazon S3 bucket.


This whitepaper provides guidance for designing architectures that make the most of the AWS platform by covering important principles and design patterns: from how to select the right database for your application, to architecting applications that can scale horizontally and with high availability. As each use case is unique, you will have to evaluate how those can be applied to your implementation.


Leave a Reply