The Server Labs Blog Rotating Header Image

Persistence Strategies for Amazon EC2

At The Server Labs, we often run into a need that comes naturally with the on-demand nature of Cloud Computing. Namely, we’d like to keep our team’s AWS bill in check by ensuring that we can safely turn off our Amazon EC2 instances when not in use. In fact, we’d like to take this practice one step further and automate the times when an instance should be operational (e.g. only during business hours). Stopping an EC2 instance is easy enough, but how do we ensure that our data and configuration persist across server restarts? Fortunately, there are a number of possible approaches to solve the persistence problem, but each one brings its own pitfalls and tradeoffs. In this article, we analyze some of the major persistence strategies and discuss their strengths and weaknesses.

A Review of EC2

Since Amazon introduced their EBS-backed AMIs in late 2009 [1], there has been a great deal of confusion around how this AMI type impacts the EC2 lifecycle operations, particularly in the area of persistence. In this section, we’ll review the often-misunderstood differences between S3 and EBS-backed EC2 instances, which will be crucial as we prepare for our discussion of persistence strategies.

Not All EC2 Instances are Created Equal

An Amazon EC2 instance can be launched from one of two types of AMIs: the traditional S3-backed AMI and the new EBS-backed AMI [2]. These two AMIs exhibit a number of differences, for example in their lifecycle [3] and data persistence characteristics [4]:

Characteristic Amazon EBS Instance (S3) Store
Lifecycle Supports stopping and restarting of instance by saving state to EBS. Instance cannot be stopped; it is either running or terminated.
Data persistence Data persists in EBS on instance failure or restart. Data can also be configured to persist when instance is terminated, although it does not do so by default. Instance storage does not persist on instance shutdown or failure. It is possible to attach non-root devices using EBS for data persistence as needed.

As explained in the table above, EBS-backed EC2 instances introduce a new stopped state, unavailable for S3-backed instances. It is important to note that while an instance is in a stopped state, it will not incur any EC2 running costs. You will, however, continue to be billed for the EBS storage associated with your instance. The other benefit over S3-backed instances is that a stopped instance can be started again while maintaining its internal state. The following diagrams summarize the lifecycles of both S3 and EBS-backed EC2 instances:

s3-lifecycle

Lifecycle of an S3-backed EC2 Instance.

ebs-lifecycle

Lifecycle of an EBS-backed EC2 Instance.

Note that while the instance ID of a restarted EBS-backed instance will remain the same, it will be dynamically assigned a new set of public and private IP and DNS addresses. If you would like assign a static IP address to your instance, you can still do so by using Amazon’s Elastic IP service [5].

Persistence Strategies

With an understanding of the differences between S3 and EBS-backed instances, we are well equipped to discuss persistence strategies for each type of instance.

Persistence Strategy 1: EBS-backed Instances

First, we’ll start with the obvious choice: EBS-backed instances. When this type of instance is launched, Amazon automatically creates an Amazon EBS volume from the associated AMI snapshot, which then becomes the root device. Any changes to the local storage are then persisted in this EBS volume, and will survive instance failures and restarts. Note that by default, terminating an EBS-backed instance will also delete the EBS volume associated with it (and all its data), unless explicitly configured not to do so [6].

Persistence Strategy 2: S3-backed Instances

In spite of their ease of use, EBS-backed instances present a couple of drawbacks. First, not all software and architectures are supported out-of-the-box as EBS-backed AMIs, so the version of your favorite OS might not be available. Perhaps more importantly, the EBS volume is mounted as the root device, meaning that you will also be billed for storage of all static data such as operating systems files, etc., external to your application or configuration.

To circumvent these disadvantages, it is possible to use an S3-backed EC2 instance that gives you direct control over what files to persist. However, this flexibility comes at a price. Since S3-backed instances use local storage as their root device, you’ll have to manually attach and mount an EBS volume for persisting your data. Any data you write directly to your EBS mount will be automatically persisted. Other times, configuration files exist at standard locations outside of your EBS mount where you will still want to persist your changes. In such situations, you would typically create a symlink on the root device to point to your EBS mount.

For example, assuming you have mounted your EBS volume under /ebs, you would run the following shell commands to persist your apache2 configuration:

# first backup original configuration
mv /etc/apache2/apache2.conf{,.orig}
# use your persisted configuration from EBS by creating a symlink
ln –s /ebs/etc/apache2/apache2.conf /etc/apache2/apache2.conf

Once your S3-backed instance is terminated, any local instance storage (including symlinks) will be lost, but your original data and configuration will persist in your EBS volume. If you would then like to recover the state persisted in EBS upon launching a new instance, you will have to go through the process of recreating any symlinks and/or copying any pertinent configuration and data from your EBS mount to your local instance storage.

Synchronizing between the data persisted in EBS and that in the local instance storage can become complex and difficult to automate when launching new instances. In order to help with these tasks, there are a number of third-party management platforms that provide different levels of automation. These are covered in more detail in the next section.

Persistence Strategy 3: Third-party Management Platforms

In the early days of AWS, there were few and limited third-party platforms available for managing and monitoring your AWS infrastucture. Moreover, in order to manage and monitor your instances for you, these types of platforms necessarily need access to your EC2 instance keys and AWS credentials. Although a reasonable compromise for some, this requirement could pose an unacceptable security risk for others, who must guarantee the security and confidentiality of their data and internal AWS infrastructure.

Given these limitations, The Server Labs developed its own Cloud Management Framework in Ruby for managing EC2 instances, EBS volumes and internal SSH keys in a secure manner. Our framework automates routine tasks such as attaching and mounting EBS volumes when launching instances, as well providing hooks for the installation and configuration of software and services at startup based on the data persisted in EBS. It even goes one step further by mounting our EBS volumes using an encrypted file system to guarantee the confidentiality of our internal company data.

Today, companies need not necessarily develop their homegrown frameworks, and can increasingly rely on third-party platforms. An example of a powerful commercial platform for cloud management is Rightscale. For several of our projects, we rely on Rightscale to automatically attach EBS volumes when launching new EC2 instances. We also make extensive use of scripting to install and configure software onto our instances automatically at boot time using Rightscale’s Righscript technology [7]. These features make it easy to persist your application data and configuration in EBS, while automating the setup and configuration of new EC2 instances associated with one or more EBS volumes.

Automating Your Instance Uptimes

Now that we have discussed the major persistence strategies for Amazon EC2, we are in a good position to tackle our original use case. How can we schedule an instance in Amazon so that it is only operational during business hours? After all, we’d really like to avoid getting billed for instance uptime during times when it is not really needed.

To solve this problem, we’ll have to address two independent considerations. First, we’ll have to ensure that all of our instance state (including data and configuration) is stored persistently. Second, we’ll have to automate the starting and stopping of our instance, as well as restoring its state from persistent storage at boot time.

Automation Strategy 1: EBS-backed Instances

By using an EBS-backed instance, we ensure that all of its state is automatically persisted even if the instance is restarted (provided it is not terminated). Since the EBS volume is mounted as the root device, no further action is required to restore any data or configuration. Last, we’ll have to automate starting and stopping of the instance based on our operational times. For scheduling our instance uptimes, we can take advantage of the Linux cron service. For example, in order to schedule an instance to be operational during business hours (9am to 5pm, Monday-Friday), we could create the following two cron jobs:

0 9 * * 1-5 /opt/aws/bin/ec2-start.sh i-10a64379
0 17 * * 1-/opt/aws/bin/ec2-stop.sh i-10a64379

The first cron job will schedule the EBS-backed instance identified by instance ID i-10a64379 to be started daily from Monday to Friday at 9am. Similarly, the second job schedules the same instance to be stopped at 5pm Monday through Friday. The cron service invokes the helper scripts ec2-start.sh and ec2-stop.sh to facilitate the configuration of the AWS command-line tools according to your particular environment. You could run this cron job from another instance in the cloud, or you could have a machine in your office launch it.

The following snippet provides sample contents for ec2-start.sh, which does the setup necessary to invoke the AWS ec2-start-instances command. Note that this script assumes that your EBS-backed instance was previously launched manually and you know its instance ID.

#!/bin/bash
# Name: ec2-start.sh
# Description: this script starts the EBS-backed instance with the specified Instance ID
# by invoking the AWS ec2-start-instances command
# Arguments: the Instance ID for the EBS-backed instance that will be started.

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20
export EC2_HOME=/opt/aws/ec2-api-tools-1.3
export EC2_PRIVATE_KEY=/opt/aws/keys/private-key.pem
export EC2_CERT=/opt/aws/keys/cert.pem
# uncomment the following line to use Europe as the default Zone
#export EC2_URL=https://ec2.eu-west-1.amazonaws.com
PATH=$PATH:${EC2_HOME}/bin
INSTANCE_ID=$1

echo "Starting EBS-backed instance with ID ${INSTANCE_ID}"
ec2-start-instances ${INSTANCE_ID}

Similarly, ec2-stop.sh would stop your EBS-backed instance by invoking ec2-stop-instances followed by your instance ID. Note that the instance ID of EBS-backed instances will remain the same across restarts.

Automation Strategy 2: S3-backed Instances

Amazon instances backed by S3 present the additional complexity that the local storage is not persistent and will be lost upon terminating the instance. In order to persist application data and configuration changes independently of the lifecycle of our instance, we’ll have to rely on EBS. Additionally, we’ll have to carefully restore any persisted state upon launching a new EC2 instance.

The Server Labs Cloud Manager allows us to automate these tasks. Among other features, it automatically attaches and mounts a specified EBS volume when launching a new EC2 instance. It also provides hooks to invoke one or more startup scripts directly from EBS. These scripts are specific to the application, and can be used to restore instance state from EBS, including any appropriate application data and configuration.

If you must use S3-backed instances for your solution, you’ll either have to develop your own framework along the lines of The Server Labs Cloud Manager, or rely on third-party management platforms like Rightscale. Otherwise, EBS-backed instances provide the path of least resistance to persisting your instance data and configuration.

Automation Strategy 3: Rightscale

Rightscale provides a commercial platform with support for boot time scripts (via Righscripts) and automatic attachment of EBS volumes. In addition, Rightscale allows applications to define arrays of servers that grow and shrink based on a number of parameters. By using the server array schedule feature, you can define how an alert-based array resizes over the course of a week [8], and thus ensure a single running instance of your server during business hours. In addition, leveraging boot time scripts and the EBS volume management feature enables you to automate setup and configuration of new instances in the array, while persisting changes to your application data and configuration. Using these features, it is possible to build an automated solution for a server that operates during business hours, and that can be shutdown safely when not in use.

Conclusion

This article describes the major approaches to persisting state in Amazon EC2. Persisting state is crucial to building robust and highly-available architectures with the capacity to scale. Not only does it promote operational efficiency by only consuming resources when a need exists; it also protects your application state so that it if your instant fails or is accidentally terminated you can automatically launch a new one and continue where you left off. In fact, these same ideas can also enable your application to scale seamlessly by automatically provisioning new EC2 instances in response to a growth in demand.

References

[1] New Amazon EC2 Feature: Boot from Elastic Block Store. Original announcement from Amazon explaining the new EC2 boot from EBS feature.

[2] Amazon Elastic Compute Cloud User Guide: AMI Basics. Covers basic AMI concepts for S3 and EBS AMI types.

[3] The EC2 Instance Life Cycle: excellent blog post describing major lifecycle differences between S3 and EBS-backed EC2 instances.

[4] Amazon Elastic Compute Cloud User Guide: AMIs Backed by Amazon EBS. Learn about EBS-backed AMIs and how they work.

[5] AWS Feature Guide: Amazon EC2 Elastic IP Addresses. An introduction to Elastic IP Addresses for Amazon EC2.

[6] Amazon Elastic Compute Cloud User Guide: Changing the Root Volume to Persist. Learn how to configure your EBS-backed EC2 instance so that the associated EBS volume is not deleted upon termination.

[7] RightScale User Guide: RightScripts. Learn how to write your own RightScripts.

[8] RightScale User Guide: Server Array Schedule. Learn how to create an alert-based array to resize over the course of the week.

3 Comments

  1. […] changes to their network their servers occasionally restart and will take your instances with them. This page has some great information about persisting data across restarts, specifically, the section called […]

  2. […] and moving data from a backup). If you’re concerned about EBS vs S3 backed instances, check out this article from theserverlabs, they detail the pros and cons of each and offer a number of usage cases. My guess is that most […]

  3. […] changes to their network their servers occasionally restart and will take your instances with them. This page has some great information about persisting data across restarts, specifically, the section called […]

Leave a Reply