The Server Labs Blog Rotating Header Image

Cloud Computing

Persistence Strategies for Amazon EC2

At The Server Labs, we often run into a need that comes naturally with the on-demand nature of Cloud Computing. Namely, we’d like to keep our team’s AWS bill in check by ensuring that we can safely turn off our Amazon EC2 instances when not in use. In fact, we’d like to take this practice one step further and automate the times when an instance should be operational (e.g. only during business hours). Stopping an EC2 instance is easy enough, but how do we ensure that our data and configuration persist across server restarts? Fortunately, there are a number of possible approaches to solve the persistence problem, but each one brings its own pitfalls and tradeoffs. In this article, we analyze some of the major persistence strategies and discuss their strengths and weaknesses.

A Review of EC2

Since Amazon introduced their EBS-backed AMIs in late 2009 [1], there has been a great deal of confusion around how this AMI type impacts the EC2 lifecycle operations, particularly in the area of persistence. In this section, we’ll review the often-misunderstood differences between S3 and EBS-backed EC2 instances, which will be crucial as we prepare for our discussion of persistence strategies.

Not All EC2 Instances are Created Equal

An Amazon EC2 instance can be launched from one of two types of AMIs: the traditional S3-backed AMI and the new EBS-backed AMI [2]. These two AMIs exhibit a number of differences, for example in their lifecycle [3] and data persistence characteristics [4]:

Characteristic Amazon EBS Instance (S3) Store
Lifecycle Supports stopping and restarting of instance by saving state to EBS. Instance cannot be stopped; it is either running or terminated.
Data persistence Data persists in EBS on instance failure or restart. Data can also be configured to persist when instance is terminated, although it does not do so by default. Instance storage does not persist on instance shutdown or failure. It is possible to attach non-root devices using EBS for data persistence as needed.

As explained in the table above, EBS-backed EC2 instances introduce a new stopped state, unavailable for S3-backed instances. It is important to note that while an instance is in a stopped state, it will not incur any EC2 running costs. You will, however, continue to be billed for the EBS storage associated with your instance. The other benefit over S3-backed instances is that a stopped instance can be started again while maintaining its internal state. The following diagrams summarize the lifecycles of both S3 and EBS-backed EC2 instances:

s3-lifecycle

Lifecycle of an S3-backed EC2 Instance.

ebs-lifecycle

Lifecycle of an EBS-backed EC2 Instance.

Note that while the instance ID of a restarted EBS-backed instance will remain the same, it will be dynamically assigned a new set of public and private IP and DNS addresses. If you would like assign a static IP address to your instance, you can still do so by using Amazon’s Elastic IP service [5].

Persistence Strategies

With an understanding of the differences between S3 and EBS-backed instances, we are well equipped to discuss persistence strategies for each type of instance.

Persistence Strategy 1: EBS-backed Instances

First, we’ll start with the obvious choice: EBS-backed instances. When this type of instance is launched, Amazon automatically creates an Amazon EBS volume from the associated AMI snapshot, which then becomes the root device. Any changes to the local storage are then persisted in this EBS volume, and will survive instance failures and restarts. Note that by default, terminating an EBS-backed instance will also delete the EBS volume associated with it (and all its data), unless explicitly configured not to do so [6].

Persistence Strategy 2: S3-backed Instances

In spite of their ease of use, EBS-backed instances present a couple of drawbacks. First, not all software and architectures are supported out-of-the-box as EBS-backed AMIs, so the version of your favorite OS might not be available. Perhaps more importantly, the EBS volume is mounted as the root device, meaning that you will also be billed for storage of all static data such as operating systems files, etc., external to your application or configuration.

To circumvent these disadvantages, it is possible to use an S3-backed EC2 instance that gives you direct control over what files to persist. However, this flexibility comes at a price. Since S3-backed instances use local storage as their root device, you’ll have to manually attach and mount an EBS volume for persisting your data. Any data you write directly to your EBS mount will be automatically persisted. Other times, configuration files exist at standard locations outside of your EBS mount where you will still want to persist your changes. In such situations, you would typically create a symlink on the root device to point to your EBS mount.

For example, assuming you have mounted your EBS volume under /ebs, you would run the following shell commands to persist your apache2 configuration:

# first backup original configuration
mv /etc/apache2/apache2.conf{,.orig}
# use your persisted configuration from EBS by creating a symlink
ln –s /ebs/etc/apache2/apache2.conf /etc/apache2/apache2.conf

Once your S3-backed instance is terminated, any local instance storage (including symlinks) will be lost, but your original data and configuration will persist in your EBS volume. If you would then like to recover the state persisted in EBS upon launching a new instance, you will have to go through the process of recreating any symlinks and/or copying any pertinent configuration and data from your EBS mount to your local instance storage.

Synchronizing between the data persisted in EBS and that in the local instance storage can become complex and difficult to automate when launching new instances. In order to help with these tasks, there are a number of third-party management platforms that provide different levels of automation. These are covered in more detail in the next section.

Persistence Strategy 3: Third-party Management Platforms

In the early days of AWS, there were few and limited third-party platforms available for managing and monitoring your AWS infrastucture. Moreover, in order to manage and monitor your instances for you, these types of platforms necessarily need access to your EC2 instance keys and AWS credentials. Although a reasonable compromise for some, this requirement could pose an unacceptable security risk for others, who must guarantee the security and confidentiality of their data and internal AWS infrastructure.

Given these limitations, The Server Labs developed its own Cloud Management Framework in Ruby for managing EC2 instances, EBS volumes and internal SSH keys in a secure manner. Our framework automates routine tasks such as attaching and mounting EBS volumes when launching instances, as well providing hooks for the installation and configuration of software and services at startup based on the data persisted in EBS. It even goes one step further by mounting our EBS volumes using an encrypted file system to guarantee the confidentiality of our internal company data.

Today, companies need not necessarily develop their homegrown frameworks, and can increasingly rely on third-party platforms. An example of a powerful commercial platform for cloud management is Rightscale. For several of our projects, we rely on Rightscale to automatically attach EBS volumes when launching new EC2 instances. We also make extensive use of scripting to install and configure software onto our instances automatically at boot time using Rightscale’s Righscript technology [7]. These features make it easy to persist your application data and configuration in EBS, while automating the setup and configuration of new EC2 instances associated with one or more EBS volumes.

Automating Your Instance Uptimes

Now that we have discussed the major persistence strategies for Amazon EC2, we are in a good position to tackle our original use case. How can we schedule an instance in Amazon so that it is only operational during business hours? After all, we’d really like to avoid getting billed for instance uptime during times when it is not really needed.

To solve this problem, we’ll have to address two independent considerations. First, we’ll have to ensure that all of our instance state (including data and configuration) is stored persistently. Second, we’ll have to automate the starting and stopping of our instance, as well as restoring its state from persistent storage at boot time.

Automation Strategy 1: EBS-backed Instances

By using an EBS-backed instance, we ensure that all of its state is automatically persisted even if the instance is restarted (provided it is not terminated). Since the EBS volume is mounted as the root device, no further action is required to restore any data or configuration. Last, we’ll have to automate starting and stopping of the instance based on our operational times. For scheduling our instance uptimes, we can take advantage of the Linux cron service. For example, in order to schedule an instance to be operational during business hours (9am to 5pm, Monday-Friday), we could create the following two cron jobs:

0 9 * * 1-5 /opt/aws/bin/ec2-start.sh i-10a64379
0 17 * * 1-/opt/aws/bin/ec2-stop.sh i-10a64379

The first cron job will schedule the EBS-backed instance identified by instance ID i-10a64379 to be started daily from Monday to Friday at 9am. Similarly, the second job schedules the same instance to be stopped at 5pm Monday through Friday. The cron service invokes the helper scripts ec2-start.sh and ec2-stop.sh to facilitate the configuration of the AWS command-line tools according to your particular environment. You could run this cron job from another instance in the cloud, or you could have a machine in your office launch it.

The following snippet provides sample contents for ec2-start.sh, which does the setup necessary to invoke the AWS ec2-start-instances command. Note that this script assumes that your EBS-backed instance was previously launched manually and you know its instance ID.

#!/bin/bash
# Name: ec2-start.sh
# Description: this script starts the EBS-backed instance with the specified Instance ID
# by invoking the AWS ec2-start-instances command
# Arguments: the Instance ID for the EBS-backed instance that will be started.

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20
export EC2_HOME=/opt/aws/ec2-api-tools-1.3
export EC2_PRIVATE_KEY=/opt/aws/keys/private-key.pem
export EC2_CERT=/opt/aws/keys/cert.pem
# uncomment the following line to use Europe as the default Zone
#export EC2_URL=https://ec2.eu-west-1.amazonaws.com
PATH=$PATH:${EC2_HOME}/bin
INSTANCE_ID=$1

echo "Starting EBS-backed instance with ID ${INSTANCE_ID}"
ec2-start-instances ${INSTANCE_ID}

Similarly, ec2-stop.sh would stop your EBS-backed instance by invoking ec2-stop-instances followed by your instance ID. Note that the instance ID of EBS-backed instances will remain the same across restarts.

Automation Strategy 2: S3-backed Instances

Amazon instances backed by S3 present the additional complexity that the local storage is not persistent and will be lost upon terminating the instance. In order to persist application data and configuration changes independently of the lifecycle of our instance, we’ll have to rely on EBS. Additionally, we’ll have to carefully restore any persisted state upon launching a new EC2 instance.

The Server Labs Cloud Manager allows us to automate these tasks. Among other features, it automatically attaches and mounts a specified EBS volume when launching a new EC2 instance. It also provides hooks to invoke one or more startup scripts directly from EBS. These scripts are specific to the application, and can be used to restore instance state from EBS, including any appropriate application data and configuration.

If you must use S3-backed instances for your solution, you’ll either have to develop your own framework along the lines of The Server Labs Cloud Manager, or rely on third-party management platforms like Rightscale. Otherwise, EBS-backed instances provide the path of least resistance to persisting your instance data and configuration.

Automation Strategy 3: Rightscale

Rightscale provides a commercial platform with support for boot time scripts (via Righscripts) and automatic attachment of EBS volumes. In addition, Rightscale allows applications to define arrays of servers that grow and shrink based on a number of parameters. By using the server array schedule feature, you can define how an alert-based array resizes over the course of a week [8], and thus ensure a single running instance of your server during business hours. In addition, leveraging boot time scripts and the EBS volume management feature enables you to automate setup and configuration of new instances in the array, while persisting changes to your application data and configuration. Using these features, it is possible to build an automated solution for a server that operates during business hours, and that can be shutdown safely when not in use.

Conclusion

This article describes the major approaches to persisting state in Amazon EC2. Persisting state is crucial to building robust and highly-available architectures with the capacity to scale. Not only does it promote operational efficiency by only consuming resources when a need exists; it also protects your application state so that it if your instant fails or is accidentally terminated you can automatically launch a new one and continue where you left off. In fact, these same ideas can also enable your application to scale seamlessly by automatically provisioning new EC2 instances in response to a growth in demand.

References

[1] New Amazon EC2 Feature: Boot from Elastic Block Store. Original announcement from Amazon explaining the new EC2 boot from EBS feature.

[2] Amazon Elastic Compute Cloud User Guide: AMI Basics. Covers basic AMI concepts for S3 and EBS AMI types.

[3] The EC2 Instance Life Cycle: excellent blog post describing major lifecycle differences between S3 and EBS-backed EC2 instances.

[4] Amazon Elastic Compute Cloud User Guide: AMIs Backed by Amazon EBS. Learn about EBS-backed AMIs and how they work.

[5] AWS Feature Guide: Amazon EC2 Elastic IP Addresses. An introduction to Elastic IP Addresses for Amazon EC2.

[6] Amazon Elastic Compute Cloud User Guide: Changing the Root Volume to Persist. Learn how to configure your EBS-backed EC2 instance so that the associated EBS volume is not deleted upon termination.

[7] RightScale User Guide: RightScripts. Learn how to write your own RightScripts.

[8] RightScale User Guide: Server Array Schedule. Learn how to create an alert-based array to resize over the course of the week.

The Server Labs at CeBIT 2010

Come and see us at CeBIT 2010 where we will have a booth on Amazon’s Stand in the Main Exhibition Hall: Hall 2, Stand B26. We will be there on March the 4th and March the 5th.

We will be happy to chat to you about our added value in bringing IT Architecture solutions to the Cloud.

We will also be presenting the work on HPC we have been doing for the European Space Agency, at 13:30 on March the 4th, and at 11:00 on March the 5th, both in the Theatre at the back of the Amazon stand.


Complex low-cost HPC Data Processing in the Cloud

With the maturing of cloud computing, it is now feasible to run even the most complex HPC applications in the cloud. Data storage and high performance computing resources - fundamental for these applications – can be outsourced thus leveraging scalability, flexibility and high availability at a fraction of the cost of traditional in-house data processing. This presentation evaluates Amazon’s EC2/S3 suitability for such a scenario, by running a distributed astrometric process The Server Labs developed for the European Space Agency’s Gaia mission in Amazon EC2
.

Eating our own Dog Food! – The Server Labs moves its Lab to the Cloud!

cloudlab

After all these years dealing with servers, switches, routers and virtualisation technologies we think it´s time to move our lab into the next phase, the Cloud, specifically the Amazon EC2 Cloud.

We are actively working in the Cloud now for different projects, as you´ve seen in previous blog posts. We believe and feel this step is not only a natural one but also takes us in the right direction towards a more effective management of resources and higher business agility. This fits with the needs of a company like ours and we believe it will also fit for many others of different sizes and requirements.
Cloud computing is not only a niche for special projects with very specific needs. It can be used by normal companies to have a more cost effective It infrastructure, at least in certain areas.

In our lab we had a mixture of server configurations, comprising Sun and Dell servers running all kinds of OSs, using VMWare and Sun virtualisation technology. The purpose of our Lab is to provide an infrastructure for our staff, partners and customers to perform specific tests, prototypes, PoC´s, etc… Also, the Lab is our R & D resource to create new architecture solutions.

Moving our Lab to the cloud will provide an infrastructure that will be more flexible, manageable, powerful, simple and definitely more elastic to setup, use and maintain, without removing any of the features we currently have. It will also allow us to concentrate more in this new paradigm, creating advanced cloud architectures and increasing the overall know-how, that can be injected back to customers and the community.

In order to commence this small project the first thing to do was to perform a small feasibility study to identify the different technologies to use inside the cloud to maintain confidentiality and secure access primarily, but also to properly manage and monitor that infrastructure. Additionally, one of the main drivers of this activity was to reduce our monthly hosting cost, so we needed to calculate, based on the current usage, the savings of moving to the cloud.

Cost Study

Looking at the cost for moving to the cloud we performed an inventory of the required CPU power, server instances, storage (for both Amazon S3 and EBS) and the estimated data IO. Additionally, we did an estimation of the volume of data between nodes and between Amazon and the external world.

We initially thought to automatically shutdown and bring up those servers that are only needed during working hours to save more money. In the end, we will be using Amazon reserved instances, that give a even lower per-hour price similar to the one that we would get using on-demand servers.

Based on this inventory and estimations, and with the help of the Amazon Cost Calculator, we reached a final monthly cost that was aprox. 1/3 of our hosting bill!.

This cost is purely considering the physical infrastructure. We need to add on top of this the savings we have on hardware renewal, pure system administration and system installation. Even if we use virtualization technologies, sometimes we´ve had to rearrange things as our physical infrastructure was limited. All these extra costs mean savings on the cloud.

Feasibility Study

Moving to the cloud gives a feeling to most IT managers that they lose control and most importantly, they lose control of the data. While the usage of hybrid clouds can permit the control of the data, in our case we wanted to move everything to the cloud. In this case, we are certainly not different and we are quite paranoid with our data and how would be stored in Amazon EC2. Also, we still require secure network communication between or nodes in the Amazon network and the ability to give secure external access to our staff and customers.

There are a set of open-source technologies that have helped us to materialize these requirements into a solution that we feel comfortable with:

  • Filesystem encryption for securing data storage in Amazon EBS.
  • Private network and IP range for all nodes
  • Cloud-wide encrypted communication between nodes within a private IP network range via OpenVPN solution
  • IPSec VPN solution for external permanent access to the Cloud Lab, as for instance connection of private cloud/network to public EC2 Cloud
  • Use of RightScale to manage and automate the infrastructure
Overview of TSL Secure Cloud deployment

Overview of TSL Secure Cloud deployment

Implementation and Migration

The implementation of our Cloud Lab solution has gone very smoothly and it is working perfectly.
One of the beneficial side effects you get when migrating different systems into the cloud is that it forces you to be much more organised as the infrastructure is very focused on reutilisation and automatic recreation of the different servers.

We have all our Lab images standardized, taking prebuilt images available in Amazon and customising them to include the security hardening, standard services and conventions we have defined. We can in a matter of seconds deploy new images and include them into our Secure VPN-Based CloudLab network ready to be used.

Our new Cloud Lab is giving us a very stable, cost-effective, elastic and secure infrastructure, which can be rebuilt in minutes using EBS snapshots.

Java HelloWorld @ the Cloud with Amazon EC2

This post is about how to run a simple java application installed in one of your Amazon AMI’s

Thanks to the GridGain guys who inspired me when I read this page about running GridGain in Amazon EC2

I am going to assume that you know how to create an Amazon AMI. For information on how to create one,iphone 6s plus remplacement écran please look here it.

In my scenario I took one of the Ubuntu 8.04 images, and I created a user helloworld as I do not want to run the process as root.

Once you have logged in your running instance follow the steps.

Step 1

I assume you have created your helloworld user.

su - helloworld

Step 2
Create your HelloWorld.java file

vi HelloWorld.java

Then type your java favorite source code ever.


public class HelloWorld {

    public static void main(String[] args) {

        System.out.println("[Data passed to the AMI instance:] "+System.getProperty("userData"));
    }

}

You can see in the source code that I am reading a system property userData. This is because we are going to send that data to the running EC2 instance at boot time.

Then compile the code

javac HelloWorld.java

And test it. You should get a null as output. This is because we have not defined the property -DuserData

java HelloWorld

Step 3
Create your personalized AMI

If you want to run a java process at boot time. Before creating the AMI, opne the /etc/rc.local and add at the end the command that executes your java process

echo "Running java process " > /tmp/rc.local.log
su - helloworld /home/helloworld/runHelloWorld.sh 

The runHelloWorld.sh might look like this


#!/bin/bash
clear
export JAVA_HOME=${HOME}/software/jdk1.6.0_14
export PATH=${JAVA_HOME}/bin:${PATH}
export USER_DATA=`GET http://169.254.169.254/2007-03-01/user-data`
echo ">>> [USER_DATA] >>> "${USER_DATA}
echo
java ${USER_DATA} HelloWorld >> ${LOG_FILE} 2>&1

As you see, I am calling an Amazon web service to get the data I have passed to the AMI. That data is an string with the JMV args. So what I am going to pass is the string “-DuserData=Amazon_says_Hello_World”
passing the user data to the JVM as arguments.

Step 4
Once you have your AMI ready and registered, you only have to run it.

ec2-run-instances ${MY_AMI} -n ${NODES} -K ${EC2_PRIVATE_KEY} -C ${EC2_CERT} -g ${MY_SECURITY_GROUP} -z ${ZONE}  -t ${INSTANCE_TYPE} -d "-DuserData=\"Amazon_says_Hello_World\""

This line runs the instance in the Amazon Cloud. If you log in into the running instance and changes the user from root to helloworld. You can check the log file to see that the HelloWorld was executed. But you can also run it manually just by executing the runHelloWorld.sh script which you created before.

Even you can query the amazon web service. You only have to type

GET http://169.254.169.254/2007-03-01/user-data

You should see at the prompt “-DuserData=Amazon_says_Hello_World”
If you run the helloWorld script you will see “[Data passed to the AMI instance:] Amazon_says_Hello_World”

Amazon allow you also to pass files at the ec2-run-instances so you could implement many ways of passing data, or configuration arguments to your processes.

Even if you have installed subversion you could check out for the latest configuration file of your process. You could ftp a server, etc… It is up to you the way you do it.

With this approach of passing an String, you can face some problems. For instance if you pass several JVM arguments such
“-Xms4g\${IFS}-Xmx7g\${IFS}-XX:MaxPermSize=512M\${IFS}-Xnoclassgc”
you cannot put a white space between the different arguments because the ec2-run-instances command thinks that those are also parameters for him. The work around for this, only if you want to pass the data as an String, is to use the ${IFS} variable, which by default is the white space. Then in the script that runs the HelloWorld class, aster calling the web service for the data you have to add an extra line.

USER_DATA=`eval "echo ${USER_DATA}"`

This makes the $IFS to be evaluated, but this approach is a bit tricky and weird 😉

The Server Labs @ the Cloud Computing Expo Europe 09

We are glad to announce that we are going to publish a paper and give a talk at the Cloud Computing Expo Europe

Paper Title: Cloud Science: Astrometric Processing in Amazon EC2/S3

Paper Abstract:

With the maturing of cloud computing, it is now feasible to run scientific applications in the cloud. Data storage and high performance computing resources are fundamental for scientific applications. Outsourcing these services leverages scalability, flexibility, high availability at lower prices compared with traditional in-house data processing. This article evaluates Amazon’s EC2/S3 suitability for this scenario, by running a distributed astrometric process developed for the European Space Agency’s Gaia mission in Amazon EC2. The aim is to demonstrate how cloud computing systems can be a cost-effective solution for HPC applications.

We hope to see you there. Do not hesitate to contact us!!