Jan 27, 2014

ETL Application Comparison, Part 5

The application that we are evaluating in this post is Jaspersoft ETL 5.2. This application provides a workflow-based designer similar to PDI, offering a large variety of components to use. The interesting aspect of Jaspersoft ETL is that the application dynamically generates procedural Java code from the visual design. One can view the Java code, but not edit it directly. The generated code is laborious and ugly, but viewing it provides transparency and can help to understand what is happening when problems arise or when something is not working as expected.

The values of most of the properties that can be configured on the components are based on Java expressions. This is similar to the way that Linx works with C# expressions and the consistency-aspect of this feature is appealing. These expressions are substituted in the generated code directly. There is no expression editor and that’s fine for users familiar with Java, but unforgiving for users having none or little Java experience.

It seems like Jaspersoft ETL does not feature any specialized data-manipulation components like PDI does. Data manipulation tasks are typically done through the Java-Row component which allows one to write any block of Java code.

This is what the data loading job that I’ve built in Jaspersoft ETL looks like:
The two light-blue rectangles above represent two sub-jobs. The topmost sub-job reads all the account files in the input folder and stores the data in the database. Once this sub-job completed running, the second sub-job is started, moving all the files in the input folder to the processed folder.

While the resulting job above looks fairly simple, I have experienced a number of problems while building it. Some components do not indicate what values they output, which means that the user has to look into the generated Java source code in order to find the names of the variables to use in following components. Furthermore, many components absorb incoming rows instead of passing them on to following components. Compared to PDI, Jaspersoft ETL seems to be far less agile in the way that rows are allowed to flow between the components (this actually makes sense considering the application's code-generation approach). Jaspersoft ETL is essentially terrible to use for new users of the application not yet familiar with all of its quirks.

Jaspersoft ETL facilitates scheduling a job through its Scheduler. After selecting the days and times when the job should run, the scheduler generates a crontab file that can be used with Cron on Linux and Unix, or with a program like cronw on Windows. So, whereas Jaspersoft ETL offers some help with the scheduling, it is not as seamless as Linx’s built-in scheduling support.

No comments:

Post a Comment