Feb 10, 2014

ETL Application Comparison, Part 7


With this post we are concluding the ETL Application Comparison series. In order to provide an overview of the different applications we have looked at, I have composed a table listing the most significant strengths and weaknesses of each tool. I hope that you have found the series insightful, and that it has helped to create a better idea of the software that is available for building automated ETL processes.
Strengths Weaknesses

PDI 5.0.1

  • Provides steps for quickly doing simple, common things, as well as steps for handling more advanced tasks
  • Agile flow of data rows
  • Step Metrics provide insight into the performance of the constituent steps in transformations
  • Cross-platform
  • Most text fields do not support expressions, forcing the user to introduce additional steps for making calculations
  • Offers no background-running process for facilitating scheduling or monitoring

Linx 5.0 Beta

  • Features a minimal set of versatile, but simple components
  • Widespread use of C# expressions
  • Visual editor provides a concise view of the process
  • Simple and clean way of scheduling and monitoring processes
  • The current beta version offers no component for writing custom C# code

SSIS

  • Offers some powerful components like the Script Component
  • Very developer-oriented; not so user-friendly

Jaspersoft ETL 5.2

  • Java code view offers transparency
  • Widespread use of Java expressions
  • Cross-platform
  • No expression editor
  • Flow of data rows between components is rigid
  • Frustrating experience for new users not familiar with all the quirks

AutoMate 9

  • Widespread use of BASIC expressions
  • Good expression editor
  • Easy to create and manage schedules
  • Program’s representation in the visual editor is overly verbose and long-winded
  • Runs slow for ETL tasks

Feb 6, 2014

Linx 4.0.709.3509 released

Changelog


  • Search includes assignments
  • Search includes DataIn and DataOut
  • Show path in search results
  • Get output tables from StoredProc component with temp tables in stored procedure
  • Fixed hanging when debugging by limiting rows in lists to 100
  • Fixed mapped fields getting lost when the name contains "/"
  • Add ReadOnly role to LinxServer users
  • Fixed Oracle timeout not applied
  • Changed layout of log in LinxServer
  • Add logging of changes to solutions and services to LinxServer
  • Add validation of references in SQL
  • Changed exceptions to include component and process names
  • Fixed StoredProc refresh of Table property asks for param values
  • Fixed cascading docking windows loses frame when only a single one docked
  • Fixed string add operation in LoopTable

Oracle

Please be aware that we are using the latest Oracle drivers. Older clients might have to be updated. Here is the latest one - http://www.oracle.com/technetwork/database/windows/downloads/index-090165.html.

Feb 3, 2014

ETL Application Comparison, Part 6


We are having a look at an application called AutoMate 9. AutoMate is another application that, like Linx, allows the user to build a program in a procedural style. AutoMate in fact works very similar to Linx, so we will highlight the differences between these two applications.


AutoMate features a large variety of Actions that can be used. This means that the user can do a lot more with the application but this comes with the trade-off that it becomes harder to find the more common and practical Actions. A plugin-based approach might have turned out better: include only the most common components by default, and offer an easy way of browsing a large directory of more specialized components to add to the library as they are needed.

Like Linx, AutoMate also allows the use of expressions in most of the Actions, and AutoMate features a good expression editor that aids users who are not too familiar with the language. AutoMate also features a Script Action which allows the user to write a block of code to execute. The language used for expressions and statements is BASIC, whereas Linx uses C#.
AutoMate features an appealing visual editor, which highlights the code block of a control structure when that control structure is selected. The editor is very verbose and displays very descriptive texts for the actions such as
Create a variable named "inputdirectory" with an initial value of "C:\Projects\ETL Framework Comparison\data\input".
Whereas this causes a program to be very descriptive, the program quickly becomes very long. When comparing an AutoMate program next to an equivalent Linx program, the AutoMate program looks very long-winded, whereas the Linx program appears much more concise. Linx allows the programmer to give each statement a custom name, which leaves it to the programmer to assign sufficiently descriptive names to each statement in order to make the program more readable.

Implementing the data loading task in AutoMate was fairly straightforward, except for reading the input text file. It looks like AutoMate offers no easy way to read a CSV file. Thankfully, there was a sample program that showed how to do it using SQL through the “Microsoft.Jet.OLEDB.4.0” OLE DB provider, but it is tricky. Here is what the program I’ve built looks like in the visual editor:

AutoMate takes a similar approach to scheduling tasks as Linx does with a Windows Service that starts with Windows. It comes with a separate application, Task Administrator, that facilitates the scheduling and monitoring of the available tasks. Automatic task runs are set up via “Triggers”, which include Schedule, File System, and Startup, among others.

AutoMate looks like a great tool, but has one critical impediment that I could not overcome with the data loading problem that I’ve implemented. The process takes a very long time to run. It takes about 3 minutes to import the 5 000 accounts, whereas Linx takes only 10 seconds. AutoMate does not advertise itself specifically as an ETL tool, so maybe AutoMate is intended for other types of automation tasks which might explain the inferior performance.