Smooks processing recipies

Introduction

In one of our customer projects we had a requirement to import CSV, fixed length and Excel files in different formats and store records in the database. We chose Smooks to accomplish this task.

Smooks is a Java framework to read, process and transform data from various sources (CSV, fixed length, XML, EDI, …) to various destinations (XML, Java objects, database). It convinced me because:

  • it brings out-of-the-box components to read CSV and fixed length files
  • it integrates smoothly with an ORM library (Hibernate, JPA)
  • processing is configured using an XML configuration file – you need only few lines of code to do the transformations
  • extensibility – implementing a custom Excel reader was relatively easy
  • low added filtering overhead – reading 100.000 CSV lines and storing them in the database using Hibernate took us less than 30 seconds

During the development we had to overcome some hurdles imposed by Smooks processing model. In this post I would like to share our practical experience we gained working with Smooks. First, I’m going to present a sample transformation use case with requirements similar to a real-world assignment. Then I will present solutions to these requirements in a ‘how-to’ style.

Use case

We are developing a ticketing application. The heart of your application is Issue class:

We have to write an import and conversion module for an external ticketing system. Data comes in the CSV format (for the sake of simplicity). The domain model of the external system is slightly different than ours; however, issues coming from the external issue tracker can be mapped to our Issues.

External system exchange format defines the following fields: description, priority, reporter, assignee, createdDate, createdTime, updatedDate, updatedTime. They should be mapped to our Issue in the following manner:

  • description property – description field
  • This is a simple Smooks mapping. No big issue.

  • project property – there is no project field. Project should be assigned manually
  • A constant object (from our domain model) must be passed to Smooks engine to be used in Java binding.
    See Assign constant object to a property.

  • priority property – priority field; P1 and P2 priorities should be mapped to Priority.LOW, P3 to Priority.MEDIUM, P4 and P5 to Priority.HIGH
  • This mapping could be done using an MVEL expression. However, we want to encapsulate this logic in a separate class that can be easily unit-tested. See Use external object to calculate property value

  • involvedPersons property – reporter field plus assignee field if not empty (append assignee using ‘;’ separator)
  • Set compound properties in Java binding will show how to achieve it.

  • created property – merge createdDate and createdTime fields
  • updated property – merge updatedDate and updatedTime fields
  • In Set datetime from date and time fields, two strategies will be presented.

    Before diving into details, I’m going to present the final Smooks configuration and the invocation code of the transformation (as JUnit 4 test). Later on, in each recipe, I will explain the XML configuration and Java code fragments relevant to that recipe.

    The remaining classes (Issue, Priority, Project, IssuePrioritizer) are not included in the text. You can browse online the source code in GitHub. To get your local copy, clone the Git repository:


    git clone git://github.com/mgryszko/blog-smooks-recipies.git


    smooks-config.xml

    <?xml version="1.0"?>
    <smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
        xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.3.xsd"
        xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.3.xsd">
     
        <params>
            <param name="stream.filter.type">SAX</param>
        </params>
     
        <csv:reader fields="description,priorityCode,reporter,assignee,createdDate,createdTime,updatedDate,updatedTime" skipLines="1"/>
     
        <jb:bean beanId="transformedProps" class="java.util.HashMap" createOnElement="csv-record">
            <jb:value property="@reporter" data="csv-record/reporter" />
            <jb:value property="@assignee" data="csv-record/assignee" />
            <jb:value property="@updatedDate" data="csv-record/updatedDate"/>
            <jb:value property="@updatedTime" data="csv-record/updatedTime"/>
        </jb:bean>
     
        <jb:bean beanId="issues" class="java.util.ArrayList" createOnElement="csv-set">
            <jb:wiring beanIdRef="issue" />
        </jb:bean>
     
        <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
            <jb:value property="description" data="csv-record/description" />
            <jb:wiring property="project" beanIdRef="project" />
            <jb:expression property="priority" execOnElement="csv-record/priorityCode">
                prioritizer.assignPriorityFromCode(_VALUE)
            </jb:expression>
            <jb:expression property="involvedPersons" execOnElement="csv-record">
                transformedProps["reporter"]
                    + (org.apache.commons.lang.StringUtils.isNotBlank(transformedProps["assignee"]) ? ";" + transformedProps["assignee"] : "")
            </jb:expression>
            <jb:value property="createdDatePart" decoder="Date" data="csv-record/createdDate">
                <jb:decodeParam name="format">yyyy-MM-dd</jb:decodeParam>
            </jb:value>
            <jb:value property="createdTimePart" decoder="Date" data="csv-record/createdTime">
                <jb:decodeParam name="format">HH:mm</jb:decodeParam>
            </jb:value>
            <jb:expression property="updated" execOnElement="csv-record">
                updated = transformedProps["updatedDate"] + " " + transformedProps["updatedTime"];
                new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm").parse(updated)
            </jb:expression>
        </jb:bean>
    </smooks-resource-list>


    SmooksRecipiesTest

    package com.tsl.smooks;
     
    // imports hidden
     
    public class SmooksRecipiesTest {
     
        private static final SimpleDateFormat DATETIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm");
     
        private Source importedIssues = new StringSource(
            "description,priority,reporter,assignee,createdDate,createdTime,updatedDate,updatedTime\n"
                + "Added phased initialization of javabean cartridge,P1,Ataulfo,Teodorico,2010-10-01,13:10,2010-10-10,20:01\n"
                + "Processing recursive tree like structures with the Javabean Cartridge,P3,Eurico,,2010-10-02,07:15,2010-11-15,09:45"
        );
     
        private Smooks smooks;
        private ExecutionContext executionContext;
     
        private List<Issue> expIssues;
        private Project expProject = new Project("Smooks");
     
        @Before
        public void initSmooks() throws Exception {
            smooks = new Smooks(getResourceFromClassLoader("smooks-config.xml"));
            executionContext = smooks.createExecutionContext();
            executionContext.getBeanContext().addBean("project", expProject);
            executionContext.getBeanContext().addBean("prioritizer", new IssuePrioritizer());
        }
     
        private InputStream getResourceFromClassLoader(String name) {
            return getClass().getClassLoader().getResourceAsStream(name);
        }
     
        @Before
        public void createExpIssues() throws Exception {
            expIssues = Arrays.asList(
                new Issue("Added phased initialization of javabean cartridge", expProject, Priority.LOW,
                    "Ataulfo;Teodorico", DATETIME_FORMAT.parse("2010-10-01 13:10"), DATETIME_FORMAT.parse("2010-10-10 20:01")
                ),
                new Issue(
                    "Processing recursive tree like structures with the Javabean Cartridge", expProject, Priority.MEDIUM,
                    "Eurico", DATETIME_FORMAT.parse("2010-10-02 07:15"), DATETIME_FORMAT.parse("2010-11-15 09:45")
                )
            );
        }
     
        @Test
        public void process() throws Exception {
            smooks.filterSource(executionContext, importedIssues);
     
            List<Issue> issues = (List<Issue>) executionContext.getBeanContext().getBean("issues");
            assertEquals(expIssues, issues);
        }
    }


    Assign a constant object (from your domain model) to a property

    According to the Smooks manual, bean context is the place where JavaBean cartridge puts newly created beans. We can add our own bean (Project):

    executionContext = smooks.createExecutionContext();
    executionContext.getBeanContext().addBean("project", new Project("Smooks"));

    … and reference it in the Java binding configuration:

    <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
        ....
        <jb:wiring property="project" beanIdRef="project" />
        ...
    </jb:bean>


    Use an external object to calculate property value

    Similar to the previous tip we add an additional bean (IssuePrioritizer) to the bean context:

    executionContext = smooks.createExecutionContext();
    executionContext.getBeanContext().addBean("prioritizer", new IssuePrioritizer());

    … and define an MVEL expression for the property. The MVEL expression uses the bean and references the value being processed (in this case coming from the CSV reader) by the implicit _VALUE variable:

    <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
        ....
        <jb:expression property="priority" execOnElement="csv-record/priorityCode">
            prioritizer.assignPriorityFromCode(_VALUE)
        </jb:expression>
        ...
    </jb:bean>


    Set compound properties in Java binding

    It is not possible to map directly two source fields to a Java bean property. Java bindings with and are executed on a SAX visitAfter event bound to to a single XML element/CSV field. We have to define a binding for a helper Map bean with the fields we want to merge:

    <jb:bean beanId="transformedProps" class="java.util.HashMap" createOnElement="csv-record">
        <jb:value property="@reporter" data="csv-record/reporter" />
        <jb:value property="@assignee" data="csv-record/assignee" />
        ...
    </jb:bean>

    … and use an MVEL expression that concatenates two fields using the helper map bean (transformedProps):

    <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
        ...
        <jb:expression property="involvedPersons" execOnElement="csv-record">
            transformedProps["reporter"]
                + (org.apache.commons.lang.StringUtils.isNotBlank(transformedProps["assignee"]) ? ";" + transformedProps["assignee"] : "")
        </jb:expression>
        ...
    </jb:bean>


    Set datetime from date and time fields

    In this transformation we have to both merge and convert values of two fields.

    In the first solution, we create a separate setter for the date and time part in the target Issue class (Smooks uses setters in Java binding):

    public class Issue {
        ...
        public void setCreatedDatePart(Date createdDatetime) {
            createCreatedIfNotInitialized();
            copyDatePart(createdDatetime, created);
        }
     
        public void setCreatedTimePart(Date createdDatetime) {
            createCreatedIfNotInitialized();
            copyTimePart(createdDatetime, created);
        }
        ...
    }

    … and then use a standard value binding with date decoder:

    <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
        ...
        <jb:value property="createdDatePart" decoder="Date" data="csv-record/createdDate">
            <jb:decodeParam name="format">yyyy-MM-dd</jb:decodeParam>
        </jb:value>
        <jb:value property="createdTimePart" decoder="Date" data="csv-record/createdTime">
            <jb:decodeParam name="format">HH:mm</jb:decodeParam>
        </jb:value>
        ...
    </jb:bean>

    The advantage of this approach is that you make use of Smooks decoder infrastructure. You can configure the transformation with your own decoders (e.g. custom java.util.Date allowing to specify multiple date formats). If you are using the built-in DateDecoder, you can catch and handle a standard DataDecodeException.

    The disadvantage is that you have to change your domain model code. New methods add complexity and must be unit tested, especially in cases when only one of partial setter is called.

    In the second solution, you define a binding for a helper Map bean with the date and time fields. In the right binding you use an MVEL expression concatenating date and time strings and converting them to Date (e.g. using a java.text.SimpleDateFormat instance):

    <jb:bean beanId="issue" class="com.tsl.smooks.model.Issue" createOnElement="csv-record">
        ...
        <jb:expression property="updated" execOnElement="csv-record">
            updated = transformedProps["updatedDate"] + " " + transformedProps["updatedTime"];
            new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm").parse(updated)
        </jb:expression>
        ...
    </jb:bean>

    The advantages of the first solution are disadvantages of the second one. You don’t touch your Java classes. It is simple – you have to specify only the Smooks configuration. In case of handling of many date/time formats and their combinations, the MVEL expression defining the conversion can become complicated. In case of an exception, you won’t get DataDecodeException, but an ugly, generic ExpressionEvaluationException.

    Conclusions

    Smooks is a great library that will save you writing a lot of code in case of processing many different formats. With a few lines of code and the XML configuration you will be able to read a file and persist its contents in the database using your favourite ORM framework.

    However, Smooks processing model and its usage in built-in cartridges make sometimes difficult to configure the transformation for a real world requirement. The information provided in the user guide is sometimes scarce and unclear. I hope these practical cases will help you use Smooks Java bean mappings and MVEL expressions more effectively.

5 Comments on “Smooks processing recipies”

  1. #1 Vikas Kumar
    on May 3rd, 2011 at 11:27 pm

    Excellent Blog. Helped me a lot.
    Thanks,
    Vikas

  2. #2 Lavoisier Farias
    on Jun 19th, 2011 at 4:03 pm

    Great work!! This blog has helped very much.

    Congratulations for this excelente post!!

    Thanks,
    Lavoisier.

  3. #3 guptha
    on Dec 2nd, 2011 at 6:00 am

    i help me a lot can you include source code for this

    thank you

  4. #4 guptha
    on Dec 8th, 2011 at 11:59 am

    i am new to smooks i wish to save data in mysql which is coming from csv data using smooks.
    i have written smooks file as follow but i am getting error like

    No fault address defined for fault message! To: JMSEpr

    please help me.

    thank you

    
        SAX
        input.csv
        Workspace://helloworlda/esbcontent/person.csv
    
      INSERT INTO ctable VALUES(${Customer.firstName?string},${Customer.lastName?string},${Customer.city?string},${Customer.country?string}) 
    
  5. #5 yogalaxmi
    on Nov 27th, 2012 at 10:03 am

    i need to convert csv to edi. can anyone help me out.
    please send me the details on yogalaxmi.singh1@gmail.com

Leave a Comment