The Server Labs Blog Rotating Header Image

Architecture

ALSB/OSB customization using WLST

One of the primary tasks in release management is environment promotion. From development to test or from test to production, environment promotion is a step which should be as much automated as possible.

We can use the service bus MBeans in WLST scripts to automate promotion of AquaLogic/Oracle Service Bus configurations from development environments through testing, staging, and finally to production environments.

Each environment has particularities which may need changes in configuration of the software. These are usually centralized in property files, database tables, environment variables or any other place to facilitate environment promotion.

In AquaLogic/Oracle Service Bus there is the concept of environment values:

Environment values are certain predefined fields in the configuration data whose values are very likely to change when you move your configuration from one domain to another (for example, from test to production). Environment values represent entities such as URLs, URIs, file and directory names, server names, e-mails, and such. Also, environment values can be found in alert destinations, proxy services, business services, SMTP Server and JNDI Provider resources, and UDDI Registry entries.

For these environment values, we have different standard operations

  • Finding and Replacing Environment Values
  • Creating Customization Files
  • Executing Customization Files

However, these operations are limited to the ‘predefined fields whose values are very likely to change’… and what happens if we need to modify one of the considered ‘not very likely’? A different story is whether to consider SAP client connection parameters ‘not very likely’ to change in a environment promotion from test to production…

In order to automate these necessary changes, one option is to modify directly the exported configuration prior to importing it to the destination environment but in our case, we want to maintain the philosophy of the customization after the importing, keeping the exported package untouched. We will try to use a WLST script instead of a customization file, as the later doesn’t satisfy our needs.

The first thing we have to do for using WLST is to add several service bus jar files to the WLST classpath. For example, if we have a Windows platform we add the following at the beginning of wlst.cmd file (I’m sure *nix people will know how to proceed in their case)

For Aqualogic Service Bus 3.0:

SET ALSB_HOME=c:\bea\alsb_3.0
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-api.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-common.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-resources.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-impl.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\..\modules\com.bea.common.configfwk_1.1.0.0.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\..\modules\com.bea.alsb.statistics_1.0.0.0.jar

For Oracle Service Bus 10gR3:

SET ALSB_HOME=c:\bea\osb_10.3
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-api.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-common.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-resources.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\lib\sb-kernel-impl.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\..\modules\com.bea.common.configfwk_1.2.1.0.jar
SET CLASSPATH=%CLASSPATH%;%ALSB_HOME%\..\modules\com.bea.alsb.statistics_1.0.1.0.jar

In our example, we will try to change the HTTP timeout in the normalLoanProcessor business service present in ALSB/OSB examples server.

normalLoanProcessor

normalLoanProcessor configuration

For that, we will first connect to the bus from WLST and open a session using SessionManagementMBean

from com.bea.wli.sb.management.configuration import SessionManagementMBean
connect("weblogic", "weblogic", "t3://localhost:7021")
domainRuntime()
sessionMBean = findService(SessionManagementMBean.NAME, SessionManagementMBean.TYPE)
sessionName = "mysession"
sessionMBean.createSession(sessionName)
mysession

mysession shown in sbconsole

Nothing new until now. Next thing we need is a reference to the component you want to modify. We chose to use a BusinessServiceQuery like:

from com.bea.wli.sb.management.query import BusinessServiceQuery
from com.bea.wli.sb.management.configuration import ALSBConfigurationMBean
bsQuery = BusinessServiceQuery()
bsQuery.setLocalName("normalLoanProcessor") 
bsQuery.setPath("MortgageBroker/BusinessServices")
alsbSession = findService(ALSBConfigurationMBean.NAME + "." + sessionName, ALSBConfigurationMBean.TYPE)
refs = alsbSession.getRefs(bsQuery)
bsRef = refs.iterator().next()

After this we have a reference to the business service we want to modify. Now is when fun begins.

There is an undocumented service bus ServiceConfigurationMBean (not to be confused with old com.bea.p13n.management.ServiceConfigurationMBean) whose description is ‘MBean for configuring Services’.

ServiceConfiguration.mysession as shown in jconsole

Among the different methods, we find one with an interesting name: getServiceDefinition

getServiceDefinition as shown in jconsole

It looks that we can use the getServiceDefinition method with our previous reference to the business service for obtaining exactly what its name states.

from com.bea.wli.sb.management.configuration import ServiceConfigurationMBean
servConfMBean = findService(ServiceConfigurationMBean.NAME + "." + sessionName, ServiceConfigurationMBean.TYPE)
serviceDefinition = servConfMBean.getServiceDefinition(bsRef)

This is the result of printing serviceDefinition variable:


  
    
    
      
      
        NormalLoanApprovalServiceSoapBinding
        http://example.org
      
    
    
      5
    
    
      normal
    
    
      wsdl-policy-attachments
    
  
  
    http
    false
    
      http://localhost:7021/njws_basic_ejb/NormalSimpleBean
    
    
      none
      0
      30
      true
    
    
      
        POST
        0
      
    
  

Surprised? It’s exactly the same definition written in .BusinessService XML files. In fact, the service definition implements XMLObject.

Now it’s time to update the business service definition with our new timeout value (let’s say 5000 milliseconds) using XPath and XMLBeans. We must also take care of defining namespaces in XPath the same way that are defined in .BusinessService XML files.

nsEnv = "declare namespace env='http://www.bea.com/wli/config/env' "
nsSer = "declare namespace ser='http://www.bea.com/wli/sb/services' "
nsTran = "declare namespace tran='http://www.bea.com/wli/sb/transports' "
nsHttp = "declare namespace http='http://www.bea.com/wli/sb/transports/http' "
nsIWay = "declare namespace iway='http://www.iwaysoftware.com/alsb/transports' "
confPath = "ser:endpointConfig/tran:provider-specific/http:outbound-properties/http:timeout"
confValue = "5000"
confElem = serviceDefinition.selectPath(nsSer + nsTran + nsHttp + confPath)[0]
confElem.setStringValue(confValue)

We are almost there. First we update the service.

servConfMBean.updateService(bsRef, serviceDefinition)

Modified mysession shown in sbconsole

And finally, we activate the session (see NOTE) like we would do in bus console.

sessionMBean.activateSession(sessionName, "Comments")

mysession changes shown in sbconsole

Task details of mysession

Updated normalLoanProcessor configuration

With this approach, it could be possible to build a framework that allows to customize ALL fields as needed.

NOTE:
If you get the exception below when activating changes, please update your WebLogic Server configuration as described in Deploy to Oracle Service Bus does not work

Traceback (innermost last):
  File "", line 1, in ?
com.bea.wli.config.deployment.server.ServerLockException: Failed to obtain WLS Edit lock; it is currently held by user weblogic. This indicates that you have either started a WLS change and forgotten to activate it, or another user is performing WLS changes which have yet to be activated. The WLS Edit lock can be released by logging into WLS console and either releasing the lock or activating the pending WLS changes.
        at com.bea.wli.config.deployment.server.ServerDeploymentInitiator.__serverCommit(Unknown Source)
        at com.bea.wli.config.deployment.server.ServerDeploymentInitiator.access$200(Unknown Source)
        at com.bea.wli.config.deployment.server.ServerDeploymentInitiator$1.run(Unknown Source)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
        at weblogic.security.service.SecurityManager.runAs(Unknown Source)
        at com.bea.wli.config.deployment.server.ServerDeploymentInitiator.serverCommit(Unknown Source)
        at com.bea.wli.config.deployment.server.ServerDeploymentInitiator.execute(Unknown Source)
        at com.bea.wli.config.session.SessionManager.commitSessionUnlocked(SessionManager.java:420)
        at com.bea.wli.config.session.SessionManager.commitSession(SessionManager.java:339)
        at com.bea.wli.config.session.SessionManager.commitSession(SessionManager.java:297)
        at com.bea.wli.config.session.SessionManager.commitSession(SessionManager.java:306)
        at com.bea.wli.sb.management.configuration.SessionManagementMBeanImpl.activateSession(SessionManagementMBeanImpl.java:47)
[...]

Smooks processing recipies

Introduction

In one of our customer projects we had a requirement to import CSV, fixed length and Excel files in different formats and store records in the database. We chose Smooks to accomplish this task.

Smooks is a Java framework to read, process and transform data from various sources (CSV, fixed length, XML, EDI, …) to various destinations (XML, Java objects, database). It convinced me because:

  • it brings out-of-the-box components to read CSV and fixed length files
  • it integrates smoothly with an ORM library (Hibernate, JPA)
  • processing is configured using an XML configuration file – you need only few lines of code to do the transformations
  • extensibility – implementing a custom Excel reader was relatively easy
  • low added filtering overhead – reading 100.000 CSV lines and storing them in the database using Hibernate took us less than 30 seconds

During the development we had to overcome some hurdles imposed by Smooks processing model. In this post I would like to share our practical experience we gained working with Smooks. First, I’m going to present a sample transformation use case with requirements similar to a real-world assignment. Then I will present solutions to these requirements in a ‘how-to’ style.

Use case

We are developing a ticketing application. The heart of your application is Issue class:

We have to write an import and conversion module for an external ticketing system. Data comes in the CSV format (for the sake of simplicity). The domain model of the external system is slightly different than ours; however, issues coming from the external issue tracker can be mapped to our Issues.

External system exchange format defines the following fields: description, priority, reporter, assignee, createdDate, createdTime, updatedDate, updatedTime. They should be mapped to our Issue in the following manner:

  • description property – description field
  • This is a simple Smooks mapping. No big issue.

  • project property – there is no project field. Project should be assigned manually
  • A constant object (from our domain model) must be passed to Smooks engine to be used in Java binding.
    See Assign constant object to a property.

  • priority property – priority field; P1 and P2 priorities should be mapped to Priority.LOW, P3 to Priority.MEDIUM, P4 and P5 to Priority.HIGH
  • This mapping could be done using an MVEL expression. However, we want to encapsulate this logic in a separate class that can be easily unit-tested. See Use external object to calculate property value

  • involvedPersons property – reporter field plus assignee field if not empty (append assignee using ‘;’ separator)
  • Set compound properties in Java binding will show how to achieve it.

  • created property – merge createdDate and createdTime fields
  • updated property – merge updatedDate and updatedTime fields
  • In Set datetime from date and time fields, two strategies will be presented.

    Before diving into details, I’m going to present the final Smooks configuration and the invocation code of the transformation (as JUnit 4 test). Later on, in each recipe, I will explain the XML configuration and Java code fragments relevant to that recipe.

    The remaining classes (Issue, Priority, Project, IssuePrioritizer) are not included in the text. You can browse online the source code in GitHub. To get your local copy, clone the Git repository:


    git clone git://github.com/mgryszko/blog-smooks-recipies.git


    smooks-config.xml

    
    
    
        
            SAX
        
    
        
    
        
            
            
            
            
        
    
        
            
        
    
        
            
            
            
                prioritizer.assignPriorityFromCode(_VALUE)
            
            
                transformedProps["reporter"]
                    + (org.apache.commons.lang.StringUtils.isNotBlank(transformedProps["assignee"]) ? ";" + transformedProps["assignee"] : "")
            
            
                yyyy-MM-dd
            
            
                HH:mm
            
            
                updated = transformedProps["updatedDate"] + " " + transformedProps["updatedTime"];
                new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm").parse(updated)
            
        
    
    


    SmooksRecipiesTest

    package com.tsl.smooks;
    
    // imports hidden
    
    public class SmooksRecipiesTest {
    
        private static final SimpleDateFormat DATETIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm");
    
        private Source importedIssues = new StringSource(
            "description,priority,reporter,assignee,createdDate,createdTime,updatedDate,updatedTime\n"
                + "Added phased initialization of javabean cartridge,P1,Ataulfo,Teodorico,2010-10-01,13:10,2010-10-10,20:01\n"
                + "Processing recursive tree like structures with the Javabean Cartridge,P3,Eurico,,2010-10-02,07:15,2010-11-15,09:45"
        );
    
        private Smooks smooks;
        private ExecutionContext executionContext;
    
        private List expIssues;
        private Project expProject = new Project("Smooks");
    
        @Before
        public void initSmooks() throws Exception {
            smooks = new Smooks(getResourceFromClassLoader("smooks-config.xml"));
            executionContext = smooks.createExecutionContext();
            executionContext.getBeanContext().addBean("project", expProject);
            executionContext.getBeanContext().addBean("prioritizer", new IssuePrioritizer());
        }
    
        private InputStream getResourceFromClassLoader(String name) {
            return getClass().getClassLoader().getResourceAsStream(name);
        }
    
        @Before
        public void createExpIssues() throws Exception {
            expIssues = Arrays.asList(
                new Issue("Added phased initialization of javabean cartridge", expProject, Priority.LOW,
                    "Ataulfo;Teodorico", DATETIME_FORMAT.parse("2010-10-01 13:10"), DATETIME_FORMAT.parse("2010-10-10 20:01")
                ),
                new Issue(
                    "Processing recursive tree like structures with the Javabean Cartridge", expProject, Priority.MEDIUM,
                    "Eurico", DATETIME_FORMAT.parse("2010-10-02 07:15"), DATETIME_FORMAT.parse("2010-11-15 09:45")
                )
            );
        }
    
        @Test
        public void process() throws Exception {
            smooks.filterSource(executionContext, importedIssues);
    
            List issues = (List) executionContext.getBeanContext().getBean("issues");
            assertEquals(expIssues, issues);
        }
    }
    

    Assign a constant object (from your domain model) to a property

    According to the Smooks manual, bean context is the place where JavaBean cartridge puts newly created beans. We can add our own bean (Project):

    executionContext = smooks.createExecutionContext();
    executionContext.getBeanContext().addBean("project", new Project("Smooks"));
    

    … and reference it in the Java binding configuration:

    
        ....
        
        ...
    
    

    Use an external object to calculate property value

    Similar to the previous tip we add an additional bean (IssuePrioritizer) to the bean context:

    executionContext = smooks.createExecutionContext();
    executionContext.getBeanContext().addBean("prioritizer", new IssuePrioritizer());
    

    … and define an MVEL expression for the property. The MVEL expression uses the bean and references the value being processed (in this case coming from the CSV reader) by the implicit _VALUE variable:

    
        ....
        
            prioritizer.assignPriorityFromCode(_VALUE)
        
        ...
    
    

    Set compound properties in Java binding

    It is not possible to map directly two source fields to a Java bean property. Java bindings with and are executed on a SAX visitAfter event bound to to a single XML element/CSV field. We have to define a binding for a helper Map bean with the fields we want to merge:

    
        
        
        ...
    
    

    … and use an MVEL expression that concatenates two fields using the helper map bean (transformedProps):

    
        ...
        
            transformedProps["reporter"]
                + (org.apache.commons.lang.StringUtils.isNotBlank(transformedProps["assignee"]) ? ";" + transformedProps["assignee"] : "")
        
        ...
    
    

    Set datetime from date and time fields

    In this transformation we have to both merge and convert values of two fields.

    In the first solution, we create a separate setter for the date and time part in the target Issue class (Smooks uses setters in Java binding):

    public class Issue {
        ...
        public void setCreatedDatePart(Date createdDatetime) {
            createCreatedIfNotInitialized();
            copyDatePart(createdDatetime, created);
        }
    
        public void setCreatedTimePart(Date createdDatetime) {
            createCreatedIfNotInitialized();
            copyTimePart(createdDatetime, created);
        }
        ...
    }
    

    … and then use a standard value binding with date decoder:

    
        ...
        
            yyyy-MM-dd
        
        
            HH:mm
        
        ...
    
    

    The advantage of this approach is that you make use of Smooks decoder infrastructure. You can configure the transformation with your own decoders (e.g. custom java.util.Date allowing to specify multiple date formats). If you are using the built-in DateDecoder, you can catch and handle a standard DataDecodeException.

    The disadvantage is that you have to change your domain model code. New methods add complexity and must be unit tested, especially in cases when only one of partial setter is called.

    In the second solution, you define a binding for a helper Map bean with the date and time fields. In the right binding you use an MVEL expression concatenating date and time strings and converting them to Date (e.g. using a java.text.SimpleDateFormat instance):

    
        ...
        
            updated = transformedProps["updatedDate"] + " " + transformedProps["updatedTime"];
            new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm").parse(updated)
        
        ...
    
    

    The advantages of the first solution are disadvantages of the second one. You don’t touch your Java classes. It is simple – you have to specify only the Smooks configuration. In case of handling of many date/time formats and their combinations, the MVEL expression defining the conversion can become complicated. In case of an exception, you won’t get DataDecodeException, but an ugly, generic ExpressionEvaluationException.

    Conclusions

    Smooks is a great library that will save you writing a lot of code in case of processing many different formats. With a few lines of code and the XML configuration you will be able to read a file and persist its contents in the database using your favourite ORM framework.

    However, Smooks processing model and its usage in built-in cartridges make sometimes difficult to configure the transformation for a real world requirement. The information provided in the user guide is sometimes scarce and unclear. I hope these practical cases will help you use Smooks Java bean mappings and MVEL expressions more effectively.

Genomic Processing in the Cloud

Introduction

Over the last decade, a new trend has manifested itself in the field of genomic processing. With the advent of the new generation of DNA sequencers has come an explosion in the throughput of DNA sequencing, resulting in the cost per base of generated sequences falling dramatically. Consequently, the bottleneck in sequencing projects around the world has shifted from obtaining DNA reads to the alignment and post-processing of the huge amount of read data now available. To minimize both processing time and memory requirements, specialized algorithms have been developed that trade off speed and memory requirements with the sensitivity of their alignments. Examples include the ultrafast memory-efficient short read aligners BWA and Bowtie, both based on the Burrows-Wheeler transform.

The need to analyse increasingly large amounts of genomics and proteomics data has meant that research labs allocate an increasing part of their time and budget provisioning, managing and maintaining their computational infrastructure. A possible solution to meet their needs for on-demand computational power is to take advantage of the public cloud. With its on-demand operational model, labs can benefit from considerable cost-savings by only paying for hardware when needed for their experiments. Procurement of new hardware is also simplified and more easily justified, without the need to expand in-house resources.

Not only does the cloud reduce the time to provision new hardware; it also provides significant time-savings by automating the installation and customization of the software that runs on top of the hardware. A controlled computational environment for the post-processing of experiments allows results to be more easily reproduced, a key objective to researchers across all disciplines. Results can also be easily shared among researchers, as cloud-based services facilitate the publishing of data over the internet, while allowing researches control over their access. Finally, data storage in the cloud was designed from the ground-up with high-availability and durability as key objectives. By storing their experiment data in the cloud, researchers can ensure their data is safely replicated among data centres. These advantages free researchers from time-consuming operational concerns, such as in-house backups and the provisioning and management of servers from which to share their experiment results.

Given the vast potential benefits of the cloud, The Server Labs is working with the Bioinformatics Department at the Spanish National Cancer Research Institute (CNIO) to develop a cloud-based solution that would meet their genomic processing needs.

An Environment for Genomic Processing in the Cloud

The first step towards carrying out genomic processing in the cloud is identifying a suitable computational environment, including hardware architecture, operating system and genomic processing tools. CNIO helped us identify the following software packages employed in their typical genomic processing workflows:

  • Burrows-Wheeler Alignment Tool (BWA): BWA aligns short DNA sequences (reads) to a sequence database such as the human reference genome.
  • Novoalign: Novoalign is a DNA short read mapper implemented by Novocraft Technologies. The tool uses spaced-seed indexing to align either single or paired-end reads by means of Needleman-Wunsch algorithm. The source code is not available for download. However, anybody may download and use these programs free of charge for their research and any other non-profit activities as long as results are published in open journals.
  • SAM tools: After reads alignment, one might want to call variants or view the alignments against the reference genome. SAM tools is an open-source package of software aplications which includes an alignments viewer and a consensus base caller tool to provide lists of variants (somatic mutations, SNPs and indels).
  • BEDTools: This software facilitates common genomics tasks for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (.BED) format. BEDTools supports the comparison of sequence alignments allowing the user to compare next-generation sequencing data with both public and custom genome annotation tracks. BEDTools source code in freely available.

Note that, except for Novoalign, all software packages listed above are open source and freely available.

One of the requirements of these tools is that the underlying hardware architecture is 64-bit. For our initial proof of concept, we decided to run a base image with Ubuntu 9.10 for 64-bit on an Amazon EC2 large instance with 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) and 850 GB of local instance storage. Once we had selected the base image and instance type to run on the cloud, we proceeded to automate the installation and configuration of the software packages. Automating this step ensures that no additional setup tasks are required when launching new instances in the cloud, and provides a controlled and reproducible environment for genomic processing.

By using the RightScale cloud-management platform, we were able to separate out the selection of instance type and base image from the installation and configuration of software specific to genomic processing. First, we created a Server definition for the instance type and operating system specified above. We then scripted the installation and configuration of genomic processing tools, as well as any OS customizations, so that these steps can be executed automatically after new instances are first booted.

Once our new instances were up and running and the software environment finalized, we executed some typical genomic workflows suggested to us by CNIO. We found that for their typical workflow with a raw data input between 3 and 20 GB, the total processing time on the cloud would range between 1 and 4 hours, depending on the size of the raw data and whether the type of alignment was single or paired-end. With an EC2 instance pricing at 38 cents per hour for large instances, and ignoring additional time required for customization of the workflow, the cost of pure processing tasks totalled less than $2 for a single experiment.

We also found the processing times to be comparable to running the same workflow in-house on similar hardware. However, when processing on the cloud, we found that transferring the raw input data from the lab to the cloud data centre could become a bottleneck, depending on the bandwidth available. We were able to work around this limitation by processing our data on Amazon’s European data centre and avoiding peak-hours for our data uploads.

Below, we include an example workflow for paired-end alignment that we successfully carried out in the Amazon cloud:

Example single-end workflow executed in the cloud.

Example single-end alignment workflow executed in the cloud.

Maximizing the Advantages of the Cloud

In the first phase, we demonstrated that genomic processing in the cloud is feasible and cost-effective, while providing a performance on par with in-house hardware. To truly realize the benefits of the cloud, however, what we need is an architecture that allows tens or hundreds of experiment jobs to be processed in parallel. This would allow researchers, for instance, to run algorithms with slightly different parameters to analyse the impact on their experiment results. At the same time, we would like a framework which incorporates all of the strengths of the cloud, in particular data durability, publishing mechanisms and audit trails to make experiment results reproducible.

To meet these goals, The Server Labs is developing a genomic processing platform which builds on top of RightScale’s RightGrid batch processing system. Our platform facilitates the processing of a large number of jobs by leveraging Amazon’s EC2, SQS, and S3 web services in a scalable and cost efficient manner to match demand. The framework also takes care of scheduling, load management, and data transport, so that the genomic workflow can be executed locally on experiment data available to the EC2 worker instance. By using S3 to store the data, we ensure that any input and result data is highly available and persisted between data centres, freeing our users from the need to backup their data. It also ensures that data can be more easily shared with the appropriate level of access control among institutions and researchers. In addition, the monitoring and logging of job submissions provides a convenient mechanism for the production of audit trails for all processing tasks.

The following diagram illustrates the main components of The Server Labs Genomic Processing Cloud Framework:

The Server Labs Genomic Processing Framework

The Server Labs Genomic Processing Cloud Framework.

The Worker Daemon is based on The Server Labs Genomic Processing Server Template, which provides the software stack for genomic processing. It automatically pulls experiment tasks from the SQS input queue along with input data from S3 to launch the genomic processing workflow with the appropriate configuration parameters. Any intermediate and final results are stored in S3 and completed jobs are stored in SQS for auditing.

Cost Analysis

Given a RightGrid-based solution for genomic processing, we would like to analyse how much it would cost to run CNIO’s workflows on the Amazon cloud. Let us assume for the sake of our analysis that CNIO runs 10 experiments in the average month, each of which generate an average of 10 GB of raw input data and produce an additional 20 GB of result data. For each of these experiments, CNIO wishes to run 4 different workflows, with an average running time of 2 hours on a large EC2 instance. In addition, we assume that the experiment results are downloaded once to the CNIO in-house data center. We also assume that the customer already has a RightScale account, the cost of which is not included in the analysis.

Amazon Service Cost
SQS Negligible
S3
  • Data transfer in: $0.10 per GB * 10 GB per workflow = $1 per workflow
  • Data transfer out: 1 download per workflow * 20 GB per download * $0.15 per GB = $3 per workflow
  • Storage: 30 GB per workflow * $0.14 per GB = $4.20 per workflow
  • Total cost: $8.20 per workflow
EC2 $0.38 per hour * 2 hours per workflow = $0.76 per workflow
All services Total cost: $8.20 + $0.76 = $8.96 per workflow

Total Cost:
10 experiments per month * 4 workflows per experiment * $8.96 per workflow =
$358.40 per month or $4300 per year

Towards an On-demand Genomic Processing Service

By building on the RightGrid framework, The Server Labs is able to offer a robust cloud-based platform on which to perform on-demand genomic processing tasks, at the same time enabling experiment results to be more easily reproduced, stored and published. To make genomic processing even simpler on the cloud, the on-demand model can be taken even one step forward by providing a pay-as-you-go software as a service. In such a model, researchers are agnostic to the fact that the processing of their data is done in the cloud. Instead, they would interact with the platform via a web interface, where they would be able to upload their experiment’s raw input data, select their workflow of choice, and choose whether or not to share their result data. They would then be notified asynchronously via email once the processing of their experiment data has been completed.

References

Low Cost, Scalable Proteomics Data Analysis Using Amazon’s Cloud Computing Services and Open Source Search Algorithms

How to map billions of short reads onto genomes

The RightGrid batch-processing framework

SCA Async/Conversational services Part 2: Non-SCA Web Service client

Following my previous post on the internals of asynchronous and conversational services in Tuscany SCA, which options are available for consuming these services when you cannot use a Tuscany SCA runtime on the client?.

Depending on the transport binding used you would expect to find a certain level of standarisation on conversational/asynchronous services implementation, allowing interoperable solutions. However, it is difficult, if not impossible, to find interoperable solutions for these types of services, even for highly standarised bindings such as SOAP Web Services. We have seen in the previous post how Tuscany SCA handles these services for WS and JMS bindings. We saw that at least for Web Services, some WS-* standards were used (i.e. WS-Addressing), but there are still solution-specific details that do not allow interoperability. This reflects how difficult it is for the standarisation community to define detailed standards and mechanisms to enable interoperable solutions for these kind of services.

In this post I present an option for this situation, using an standard JAX-WS client, attaching the appropriate WS-Addressing headers using JAXB. For that, I extend the pure Tuscany SCA client-server sample of the previous post with a non-SCA client.

You can find the extended source code of the maven projects here.

Overview

The two main pieces to setup on a JAX-WS client are:

  1. Proper WS-Addressing headers containing the Tuscany’s specific information (i.e callbackLocation, conversationID and optionally callbackID).
  2. A web service implementing the callback interface, listening in callbackLocation URI.

JAX-WS WS-Addressing headers setup

As for a normal WSDL-first client development, the first step is to generate the JAX-WS client classes from the Counter Service WSDL. This is done using the maven jaxws-maven-plugin on the wsdl file generated for the service (i.e. http://localhost:8086/CounterServiceComponent?wsdl).

To setup the SOAP WS-Addressing headers, I follow the recommendations from the JAX-WS guide and use WSBindingProvider interface, which offers a much better control on how headers can be added.

Therefore, in the constructor of CounterServiceClient.java, the WS-Addressing “TuscanyHeader” is added:

     public CounterServiceClient() {
        super();
        
        // Initialise JAX-WS Web Service Client.
        service = new CounterServiceService();
        servicePort = service.getCounterServicePort();
  
        // Set the WS-Addressing headers to use by the client.
        WSBindingProvider bp = (WSBindingProvider) servicePort;
        // Generate UUID for Tuscany headers
        conversationId = UUID.randomUUID().toString();
        callbackId = UUID.randomUUID().toString();
        ReferenceParameters rp = new ReferenceParameters(conversationId);
        rp.setCallbackId(callbackId);
        TuscanyHeader header = new TuscanyHeader(CALLBACKURI, rp);        
        bp.setOutboundHeaders(Headers.create((JAXBRIContext) jaxbContext,
                header));
        
        ...
    }        

The ReferenceParameters and TuscanyHeader classes are JAXB classes with the required information to map the TuscanyHeader Java object to the WS-Addressing XML header. For instance, the ReferenceParameters class, which includes the Tuscany SCA parameters, has the following JAXB definitions:

public class ReferenceParameters {
    private static final String TUSCANYSCA_NS = "http://tuscany.apache.org/xmlns/sca/1.0";
    /** The conversation id. */
    private String conversationId = "";
    /** The callback id. */
    private String callbackId = null;

    ...
    public void setCallbackId(String callbackId) {
        this.callbackId = callbackId;
    }

    public void setConversationId(String conversationId) {
        this.conversationId = conversationId;
    }

    @XmlElement(name = "CallbackID", namespace = TUSCANYSCA_NS, required = false)
    public String getCallbackId() {
        return callbackId;
    }

    @XmlElement(name = "ConversationID", namespace = TUSCANYSCA_NS, required = true)
    public String getConversationId() {
        return conversationId;
    }
}

JAX-WS callback service setup

The callback service needs to be created on the client side. For that we define a simple web service (CounterServiceCallbackImpl.java) with the CounterServiceCallback as the contract.

/**
 * The CounterServiceCallback implementation.
 * The Web service namespace must be the same as the one defined in the Server Side.
 */
@WebService(targetNamespace = "http://server.demo.tuscany.sca.tsl.com/")
public class CounterServiceCallbackImpl implements CounterServiceCallback {
	private int count;

	@Override
	public void receiveCount(int count, int end) {
		System.out.println("CounterServiceCallback --- Received Count: " + count + " out of " + end);
		this.count = count;
	}
        ...
}

In this sample project, the web service is also started during the instantiation of the JAX-WS Client. We use the embedded Sun httpserver to publish the web service instead of relying in other web application servers as Jetty or Tomcat:

     public CounterServiceClient() {
        super();
        ...
        
        // Setup Receiver Callback Web Service
        System.out.println("Starting Callback service in " + CALLBACKURI);
        callback = new CounterServiceCallbackImpl();
        callbackServiceEp = Endpoint.create(callback);
        callbackServiceEp.publish(CALLBACKURI);
    }        

Testing the client

As in the previous post, in order to run the client tests you need to:

  • Run the server side, executing the run-server.sh under the counter-callback-ws-service directory. The server component should start and wait for client requests.
  • Go to the client project, counter-callback-ws-client-SCA and execute the tests with:
    mvn clean test
    

    This runs both the SCA and non-SCA client tests.

  • In case you want to run two clients at the same time, you need to define two separate CallbackURIs. For that purpose the property sca.callbackuri has been defined to configure the URI to use. Therefore to run two clients in parallel execute
    mvn  test -Dsca.callbackuri="http://localhost:1999/callback" -Dtest=com.tsl.sca.tuscany.demo.client.WebServiceNonSCAClientTest 
    and another client with 
    mvn  test -Dsca.callbackuri="http://localhost:2999/callback" -Dtest=com.tsl.sca.tuscany.demo.client.WebServiceNonSCAClientTest 
    

    Once running you should see the different service calls and conversations IDs interleaved in the server side console:

    setCounterServiceCallback on thread Thread[Thread-2,5,main] 
    ConversationID = 81e257cd-af3b-4ea6-ae69-4b8e0d9db8a9. Starting count 
    Sleeping ...
    setCounterServiceCallback on thread Thread[Thread-5,5,main]                 
    ConversationID = 70e86dcf-436c-4df1-ab6b-165b3f9070a4. Starting count   
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    ConversationID = 81e257cd-af3b-4ea6-ae69-4b8e0d9db8a9. Stopping count
    Sleeping ...
    ConversationID = 70e86dcf-436c-4df1-ab6b-165b3f9070a4. Stopping count
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    ConversationID = 81e257cd-af3b-4ea6-ae69-4b8e0d9db8a9. Restarting count
    Sleeping ...
    ConversationID = 70e86dcf-436c-4df1-ab6b-165b3f9070a4. Restarting count
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Thread interrupted.
    Thread interrupted.
    

Conclusions

I hope that these posts have shed some light on how SCA and conversational/async services are implemented taking Tuscany as SCA runtime reference. I also believe that it is important to know the available options for consuming a SCA service without an SCA runtime, and how we can do it in a simple manner.

SCA Async/Conversational services Part 1: Internals of Tuscany SCA

Sometime ago I wrote about developing applications with SCA on a JBI-based infrastructure, using simple SCA services for that.

I’m coming back again with two separate SCA blog posts discussing the usage of more complex services, asynchronous and conversational services:

  • In this post, I provide an example client-server project that implements a conversational and asynchronous service in Tuscany SCA, digging into the internals of the implementation for handling these services using Web Services and JMS bindings. I found this information quite difficult to find.
  • The second post looks into the situation where the usage of a Tuscany SCA runtime is not possible in the client side, but we still need a way to make use of these type of services from a more standard web service client. For the specific case of a web service binding, and knowing the internals of Tuscany I will use a standard jax-ws client to consume a Tuscany SCA conversational and asynchronous service.

Both posts use a common example project, a Counter service that notifies to the client each time the count is updated (every second) via an asynchronous callback. The service is conversational and allows clients to stop the count and restart it through separate independent service calls. Also, multiple clients can be run in parallel, each one with its own counter service instance. Version 1.6 of Tuscany SCA is used in this sample.

The source code of the sample counter service for this post can be found here.

SCA asynchronous and conversational service definition

Let’s refresh how to define a service in Tuscany SCA that is both conversational and defines a callback interface for asynchronous calls to the client. Good references for this are the SCA specifications, the Tuscany SCA documentation and the sample projects included in the distribution. However, while I could find several callback projects on the Tuscany SCA samples there are none that exercise conversations.

Below, you find the interface definition of the SCA service in Java for the CounterService (counter-callback-ws-service maven project).

I use the @Callback and @Conversational annotations to define our service. For the @Callback annotation we need also to define the callback interface class. Therefore, for our counter service we have two interfaces, one for the service which implementation is done in the server side, and another one for the callback which implementation is done in the client side.

/**
 * The remote service that will be invoked by the client
 */
@Remotable
@Callback(CounterServiceCallback.class)
@Conversational
public interface CounterService {

    @OneWay
    void startCounting(int maxcount);
    
    @OneWay
    void stopCounting();

    @OneWay
    void restartCounting();    
    
    @OneWay
    void shutdownCounting();     
}

The Callback interface is as simple as:

/**
 * The callback interface for {@link CounterService}.
 */
@Remotable
public interface CounterServiceCallback {

    void receiveCount(int count, int end);
}

The server side implementation of the service looks like:

/**
 * This class implements CounterService and uses a callback.
 */
@Service(CounterService.class)
@Scope("CONVERSATION")
public class CounterServiceImpl implements CounterService {
	@ConversationID
	protected String conversationID;

	/**
	 * The setter used by the runtime to set the callback reference
	 * 
	 * @param counterServiceCallback
	 */
	@Callback
	public void setMyServiceCallback(CounterServiceCallback counterServiceCallback) {
		System.out.println("setCounterServiceCallback on thread "
				+ Thread.currentThread());
		this.counterServiceCallback = counterServiceCallback;
	}

On the implementation, the @Service SCA annotation is used to identify the SCA service being implemented and the @Scope which specifies a visibility and lifecycle contract an implementation has with the SCA runtime. Possible values of @Scope are STATELESS, REQUEST, CONVERSATIONAL and COMPOSITE. For this specific service we need to use CONVERSATIONAL, instructing Tuscany that conversation state must be kept in order to correlate interactions between a client and the service. For a description of the rest of scopes, refer to the SCA Java Annotations and API reference specification.

Last, we use the @Callback annotation to instruct Tuscany SCA where to inject the Callback reference to be used by the service.

The server side SCA composite is very similar than the Tuscany SCA official callback-ws-client and you can find it under the “src/main/resources/counterws.composite“. The binding used for the sample is a Web Service binding.

Tuscany SCA Counter Service Client

On the client side (counter-callback-ws-client-SCA maven project) we need to define a client interface that makes use of the counter service. To make it simple, the interface has a method that returns the reference to the Counter Service and an additional helper method to get the current count recorded in the client based on the service callbacks. As in the server side, we have to make sure to define the interface as @Conversational and the implementation with @Scope(“CONVERSATION”) so Tuscany populates the service call with conversation information.

Also, to simplify the sample code, the shared interfaces between client and server (i.e. CounterService and CounterServiceCallback) are included in both projects separately. Ideally, these interfaces would be on a separate interfaces library used by both projects.

/**
 * The client component interface.
 */
@Conversational
public interface CounterServiceClient {
    
    public CounterService getCounterService();
    
    public int getCount();
    
}

The client composite is as listed below. It defines the reference to the counter Service and the associated callback. The url defined in the callback is used by Tuscany SCA in the client side to start the listening callback web service:



    
        
        
	        
	        
	        
	            
            
        
    

For the testing, a unit test is setup (WebServiceSCAClientTest.java), exercising the counter service, callbacks and conversations.

Running the counter service sample project

To run the counter service sample, extract the source code and:

  • Run the server side, executing the run-server.sh script in the counter-callback-ws-service directory. The server component should start and wait for client requests.
    ...
    Nov 8, 2010 4:11:23 PM org.apache.coyote.http11.Http11Protocol start
    INFO: Starting Coyote HTTP/1.1 on http-8086
    Callback server started (press enter to shutdown)
    Nov 8, 2010 4:11:23 PM org.apache.tuscany.sca.http.tomcat.TomcatServer addServletMapping
    INFO: Added Servlet mapping: http://localhost:8086/CounterServiceComponent
    
  • Go to the client project, counter-callback-ws-client-SCA and execute the tests with:
    mvn clean test
    
  • On the client side you should see how the client starts the counting service, receives the callbacks and how the counting is stopped and later restarted, continuing the count where it was left.
    ...
    Nov 8, 2010 4:11:28 PM org.apache.tuscany.sca.http.tomcat.TomcatServer addServletMapping
    INFO: Added Servlet mapping: http://localhost:1999/callback
    Starting Count and waiting 5 seconds for counts...
    CounterServiceCallback --- Received Count: 0 out of 30
    CounterServiceCallback --- Received Count: 1 out of 30
    CounterServiceCallback --- Received Count: 2 out of 30
    CounterServiceCallback --- Received Count: 3 out of 30
    CounterServiceCallback --- Received Count: 4 out of 30
    Stopping Count and waiting 5 seconds for no counts...
    Restarting Count and waiting 5 seconds for counts...
    CounterServiceCallback --- Received Count: 5 out of 30
    CounterServiceCallback --- Received Count: 6 out of 30
    CounterServiceCallback --- Received Count: 7 out of 30
    CounterServiceCallback --- Received Count: 8 out of 30
    Stopping the Client Node
    Nov 8, 2010 4:11:44 PM org.apache.tuscany.sca.node.impl.NodeImpl stop
    ...
    
  • On the server side, you should see the client call and the associated conversationID.
    ...
    INFO: Added Servlet mapping: http://jmatute-laptop:8086/CounterServiceComponent
    setMyServiceCallback on thread Thread[Thread-2,5,main]
    ConversationID = d8a05f47-064b-4832-9cca-7ad035bd36ee. Starting count
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    ConversationID = d8a05f47-064b-4832-9cca-7ad035bd36ee. Stopping count
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    ConversationID = d8a05f47-064b-4832-9cca-7ad035bd36ee. Restarting count
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Sleeping ...
    Thread interrupted.
    

Having reached this point, let’s look into how Tuscany SCA is handling the conversations and callbacks and how the required information is included in the underling transport bindings (WS or JMS).

Internals of conversational and asynchronous services in Tuscany SCA

Tuscany SCA conversational services (i.e. marked with @Conversational) make use of conversation IDs in order to keep track and correctly map multiple invocations associated with a certain conversation.

SCA and Tuscany asynchronous services are similar to those defined, for instance, in BPEL where the forward and callback service calls are two complete separate service invocation calls. Therefore, Tuscany also requires additional information to know where the callback needs to be send and optionally provide a application correlation id for the callback call.

Summarising, the below table contains the information used by Tuscany SCA for conversation and callback services:

Tuscany information Description
conversationID UUID identifying the current conversation. This is used by Tuscany SCA to associate accordingly the service instance associated to that conversation.
callbackLocation The location of the callback service. This, depending on the binding used might be a URI for Web Service binding or a Queue name for JMS.
callbackID Application-specified callback ID, that is used by the application to correlate callbacks with the related application state.

The above information needs to be mapped to the specific binding and here is where no standarisation exists, making difficult to have interoperable solutions:

  • WebService Binding : for the Web Service binding, WS-Addressing headers are used in order to store the Tuscany SCA conversational/async required information. Below it is shown an example of a Tuscany Web Service conversational and asynchronous/bidrectional invocation:
    • The WS-Addressing Address contains the Callback URI to be used for the callback invocation. This is normally setup by the client to notify the server where to send the callbacks.
    • The WS-Addressing ReferenceParameters contains the other two information fields under specific Tuscany Namespace, the CallbackID and the ConversationID.
    
    
        
            
                
    http://localhost:1999/callback
    45a963da-2074-4bb2-b9ee-f721e2ec753b 309b8322-1dc2-4c51-a4db-73d65edae391
    30
  • JMS Binding : For JMS, Tuscany SCA uses JMS Message properties to store the required information. A screenshot (see below) of ActiveMQ console shows an Tuscany SCA JMS Message for conversational and async/bidrectional service.
    1. CallbackID, scaConversationID and scaCallbackQueue, Tuscany SCA propietary JMS message properties to hold the information.

    Tuscany SCA JMS Message

    Tuscany SCA JMS Message

Conclusion

In this first post I have presented a Tuscany SCA example that covers both conversational and asynchronous scenarios, not currently available in the official Tuscany SCA samples, and looked into the internals of Tuscany SCA used to handle these services. This provides the basis for the next post, where I will be using this information to extend the project code with a non-SCA JAXWS-based web service client as an approach to consume these services without a Tuscany SCA runtime.