Nov 25, 2014

Linx 4.0.833.3906 released

Changelog

  • Increase execution speed of XmlParser component
  • Add RabbitMQ components (RabbitMQSend, RabbitMQReceive, RabbitMQService)
  • Add Json components (JsonParser, JsonFormatter)
  • SQLEditor textbox now allows very very large queries
  • Fix bug: ImageSplit component hangs on certain files
  • Fix bug: Change WebServiceData component to handle Int64 values as decimal
  • Fix bug: Change WebServiceData component to import embedded schemas
  • Fix bug: Change WebServiceData component to support union-structure in message response
  • Fix bug: SQLEditor textbox does not allow very very large queries
  • Fix bug: CommandTimeout not set on Database write components

Special mention goes to

Eduan for logging the XmlParser speed problem.
Brendon for testing the RabbitMQ and Json components.
Byron for pointing out that we cannot handle monster queries in the SqlEditor.
Kobus for trying to call a web service that returns everything and the kitchen sink.
Nedbank for putting our Image components through the wringer.
Bernhard for doing most of the work.

Sep 22, 2014

Read from and write to Excel with Database components

Our current Excel components use Excel Automation to work with Excel documents. This allows us to run macros and support almost any scenario but is not ideal for server environments. If you don't need to run any macros a better option will be to use the Database components to read from and write to Excel. Here are some tips to get you going.


  1. Download and install the Microsoft Access Database Engine Redistributable. This will install the OleDb drivers required to work with Excel through a database component.
  2. Use the DatabaseRead and DatabaseWrite components when reading or writing to an Excel document.
  3. The connectionstring for reads looks like "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=d:\temp\write.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1"
  4. For writing change IMEX=1 to IMEX=3.
  5. If you do not have headers then HDR=NO.
  6. For older files you can change the Extended Properties of the connectionstring.
  7. You can get more connectionstring information here.
  8. It seems as if a blank document must be initialised with a header for every row you want to write to. Just make it a space if you don't have headers.
  9. Read from a sheet: "select * from [Sheet1$]". Remember the $.
  10. Read from a range: "select * from MyRange" or "select * from [Sheet1$A1:B10]".
  11. Insert into a sheet with headers: "insert into [Sheet1$] (id,name) values('123','John')".
  12. If you don't have header use F1, F2, F... to refer to fields e.g. "select F1, F3 from [Sheet1$]".
  13. It supports other sql clauses like UPDATE and WHERE as illustrated here.
Some more info:

Sep 19, 2014

Linx 4.0.811.3839 released

Changelog

  • Allow upload of multiple solutions to server on modern browsers
  • Add "Statement narrative" field to NedbankCPS components
  • Change HTTPXComponent to work on untrusted https url
  • Increase the displayed number of recent solutions from 5 to 20
  • Return SQL string in exception when DatabaseRead/DatabaseWrite components fail
  • Return parameter names and values in exception when StoredProcedure components fail
  • Fix bug: SendMailSmtp to address does not allow semicolon separator or separator at end
  • Fix bug: ImageSplit output files are corrupt for certain images
  • Fix bug: Compilation of recursive process call fails
  • Fix bug: Moving process to folder breaks process event references
  • Fix bug: Upload of empty configuration file causes endless loading message on server

Upload multiple solutions to LinxServer


With newer browsers (IE10 or better) you can add or update multiple solutions to LinxServer with a single upload. Drag any number of solution and config files onto the upload area. Solutions and config files are matched according to name. If they exist on the server they are treated as an update else they are added.

Aug 20, 2014

Linx 4.0.798.3788 released

Changelog

  • Add SHA256 hashing to Encryption component
  • Add web service security to WebServiceData component
  • Update ReportingServices component to support latest version of Reporting Services
  • Fix bug: DateTime parameter conversion fails when calling REST service 
  • Fix bug: CPU statistics not displaying usage per processor
  • Fix bug: Linx services not starting on service host restart


Microsoft SQL Server Reporting Services

We now support more Reporting Service types
and Output types

Be aware that the output types are not necessarily available on your Reporting Server. Output type availability differs by Server version e.g. 2008, 2012 and Server type i.e. Express, Standard.

Aug 12, 2014

Linx 4.0.767.3718 released

Changelog


Version 4.0.767

  • Increase server host process alive timeout check to 60 seconds
  • Fix bug: DirectoryWatch service may miss events
  • Fix bug: CallRESTWebService with Xml output can't populate table with one row
  • Fix bug: Stored Procedure with blank column name fails during execute


Version 4.0.763

  • LinxServer: Show ProcessId, CPU and RAM usage
  • LinxServer: Fix admin pages for IE8
  • LinxServer: Log and Solutions default directory moved
  • LinxServer: Log and Solutions directory can be set via config
  • LinxServer: Kerberos/Windows authentication option added
  • Fix bug: Database read component throws error with "Get all data to client first" on Oracle


Version 4.0.757

  • Fix bug: Treeview scrollbar not showing on server administration site
  • Fix bug: XmlParser treats list of items as single item


Log and solutions default directory

LinxServer stores its logs and solutions in a directory on the server. This directory is now configurable as explained by Julian in the previous post. Do not configure this directory to point to a network drive. Bad things will happen.

Aug 5, 2014

Specifying a custom data location on Linx server

One of the new features that was introduced in the latest release of Linx 4 (4.0.763.3705), is the ability for the user to specify the data directory that is used by the server to store solution and log files.  In previous versions, these files were stored in subdirectories of the specified installation path (usually C:\Program Files (x86)\Twenty57 Linx 4\bin\Solutions and  C:\Program Files (x86)\Twenty57 Linx 4\bin\Log).

Please note that the following rules apply during installation:
  • If you upgrade an existing Linx installation (with valid solutions) to the latest version, your solutions and logs will remain in their current location.
  • For new installations, solutions and logs will be saved in the ProgramData folder (%programdata%\Twenty57\Linx 4).
You can override the default data location by making use of the "ServerDataPath" configuration setting.  Add the setting (with a valid path) to your service configuration file (Twenty57.Linx.Server.Service.exe.config).
For example:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <appSettings>
    <add key="ServerDataPath" value="D:\PutStuffHere" />
  </appSettings>
</configuration>

If you've performed an upgrade to an existing Linx installation and want to make use of a different data directory, you can move your current "Solutions" and "Log" directories to one of the locations outlined above (either the default ProgramData location, or the custom ServerDataPath location).  Please ensure that the Linx service is not running while performing this operation.

May 22, 2014

Linx 4.0.750.3615 released

Changelog


  • ImageSplit and ImageConcat now faster when working with Tiff files
  • Allow treeview resizing on server administration site
  • Fix bug: "Cannot convert System.Guid to internal type." when selecting a SP that returns Guid columns
  • Fix bug: Display processes in folder when configuring webuser on server administration site
  • Fix bug: Fix drag/drop items between different Linx instances

Important

ImageSplit and ImageConcat now use a different library to work with Tiff files. It is much faster so should have less of an impact on system resources. Please test them thoroughly with your existing data if you use this functionality in production.

Apr 4, 2014

Linx 4.0.731.3576 released

Changelog


Version 4.0.731


  • Add User role to LinxServer with optional rights Manage Solutions, Run Processes and Start and stop Services
  • Database Stored Procedure components now apply formats when passing strings to parameters
  • Display full exception when ReadMailMapi initialization fails.


Version 4.0.715


  • Fix DatabaseWrite SqlEditor disabled textbox
  • SqlEditor Run now uses same Oracle driver as when executing or retrieving schemas
  • Fix XmlParser exception when text element does not exist
  • DatabaseRead can now retrieve schema when temp tables are used
  • ExtractString optimized to handle very large strings
  • Timeout property now applied when retrieving StoredProc schema

Feb 10, 2014

ETL Application Comparison, Part 7


With this post we are concluding the ETL Application Comparison series. In order to provide an overview of the different applications we have looked at, I have composed a table listing the most significant strengths and weaknesses of each tool. I hope that you have found the series insightful, and that it has helped to create a better idea of the software that is available for building automated ETL processes.
Strengths Weaknesses

PDI 5.0.1

  • Provides steps for quickly doing simple, common things, as well as steps for handling more advanced tasks
  • Agile flow of data rows
  • Step Metrics provide insight into the performance of the constituent steps in transformations
  • Cross-platform
  • Most text fields do not support expressions, forcing the user to introduce additional steps for making calculations
  • Offers no background-running process for facilitating scheduling or monitoring

Linx 5.0 Beta

  • Features a minimal set of versatile, but simple components
  • Widespread use of C# expressions
  • Visual editor provides a concise view of the process
  • Simple and clean way of scheduling and monitoring processes
  • The current beta version offers no component for writing custom C# code

SSIS

  • Offers some powerful components like the Script Component
  • Very developer-oriented; not so user-friendly

Jaspersoft ETL 5.2

  • Java code view offers transparency
  • Widespread use of Java expressions
  • Cross-platform
  • No expression editor
  • Flow of data rows between components is rigid
  • Frustrating experience for new users not familiar with all the quirks

AutoMate 9

  • Widespread use of BASIC expressions
  • Good expression editor
  • Easy to create and manage schedules
  • Program’s representation in the visual editor is overly verbose and long-winded
  • Runs slow for ETL tasks

Feb 6, 2014

Linx 4.0.709.3509 released

Changelog


  • Search includes assignments
  • Search includes DataIn and DataOut
  • Show path in search results
  • Get output tables from StoredProc component with temp tables in stored procedure
  • Fixed hanging when debugging by limiting rows in lists to 100
  • Fixed mapped fields getting lost when the name contains "/"
  • Add ReadOnly role to LinxServer users
  • Fixed Oracle timeout not applied
  • Changed layout of log in LinxServer
  • Add logging of changes to solutions and services to LinxServer
  • Add validation of references in SQL
  • Changed exceptions to include component and process names
  • Fixed StoredProc refresh of Table property asks for param values
  • Fixed cascading docking windows loses frame when only a single one docked
  • Fixed string add operation in LoopTable

Oracle

Please be aware that we are using the latest Oracle drivers. Older clients might have to be updated. Here is the latest one - http://www.oracle.com/technetwork/database/windows/downloads/index-090165.html.

Feb 3, 2014

ETL Application Comparison, Part 6


We are having a look at an application called AutoMate 9. AutoMate is another application that, like Linx, allows the user to build a program in a procedural style. AutoMate in fact works very similar to Linx, so we will highlight the differences between these two applications.


AutoMate features a large variety of Actions that can be used. This means that the user can do a lot more with the application but this comes with the trade-off that it becomes harder to find the more common and practical Actions. A plugin-based approach might have turned out better: include only the most common components by default, and offer an easy way of browsing a large directory of more specialized components to add to the library as they are needed.

Like Linx, AutoMate also allows the use of expressions in most of the Actions, and AutoMate features a good expression editor that aids users who are not too familiar with the language. AutoMate also features a Script Action which allows the user to write a block of code to execute. The language used for expressions and statements is BASIC, whereas Linx uses C#.
AutoMate features an appealing visual editor, which highlights the code block of a control structure when that control structure is selected. The editor is very verbose and displays very descriptive texts for the actions such as
Create a variable named "inputdirectory" with an initial value of "C:\Projects\ETL Framework Comparison\data\input".
Whereas this causes a program to be very descriptive, the program quickly becomes very long. When comparing an AutoMate program next to an equivalent Linx program, the AutoMate program looks very long-winded, whereas the Linx program appears much more concise. Linx allows the programmer to give each statement a custom name, which leaves it to the programmer to assign sufficiently descriptive names to each statement in order to make the program more readable.

Implementing the data loading task in AutoMate was fairly straightforward, except for reading the input text file. It looks like AutoMate offers no easy way to read a CSV file. Thankfully, there was a sample program that showed how to do it using SQL through the “Microsoft.Jet.OLEDB.4.0” OLE DB provider, but it is tricky. Here is what the program I’ve built looks like in the visual editor:

AutoMate takes a similar approach to scheduling tasks as Linx does with a Windows Service that starts with Windows. It comes with a separate application, Task Administrator, that facilitates the scheduling and monitoring of the available tasks. Automatic task runs are set up via “Triggers”, which include Schedule, File System, and Startup, among others.

AutoMate looks like a great tool, but has one critical impediment that I could not overcome with the data loading problem that I’ve implemented. The process takes a very long time to run. It takes about 3 minutes to import the 5 000 accounts, whereas Linx takes only 10 seconds. AutoMate does not advertise itself specifically as an ETL tool, so maybe AutoMate is intended for other types of automation tasks which might explain the inferior performance.

Jan 27, 2014

ETL Application Comparison, Part 5


The application that we are evaluating in this post is Jaspersoft ETL 5.2. This application provides a workflow-based designer similar to PDI, offering a large variety of components to use. The interesting aspect of Jaspersoft ETL is that the application dynamically generates procedural Java code from the visual design. One can view the Java code, but not edit it directly. The generated code is laborious and ugly, but viewing it provides transparency and can help to understand what is happening when problems arise or when something is not working as expected.


The values of most of the properties that can be configured on the components are based on Java expressions. This is similar to the way that Linx works with C# expressions and the consistency-aspect of this feature is appealing. These expressions are substituted in the generated code directly. There is no expression editor and that’s fine for users familiar with Java, but unforgiving for users having none or little Java experience.

It seems like Jaspersoft ETL does not feature any specialized data-manipulation components like PDI does. Data manipulation tasks are typically done through the Java-Row component which allows one to write any block of Java code.

This is what the data loading job that I’ve built in Jaspersoft ETL looks like:
The two light-blue rectangles above represent two sub-jobs. The topmost sub-job reads all the account files in the input folder and stores the data in the database. Once this sub-job completed running, the second sub-job is started, moving all the files in the input folder to the processed folder.

While the resulting job above looks fairly simple, I have experienced a number of problems while building it. Some components do not indicate what values they output, which means that the user has to look into the generated Java source code in order to find the names of the variables to use in following components. Furthermore, many components absorb incoming rows instead of passing them on to following components. Compared to PDI, Jaspersoft ETL seems to be far less agile in the way that rows are allowed to flow between the components (this actually makes sense considering the application's code-generation approach). Jaspersoft ETL is essentially terrible to use for new users of the application not yet familiar with all of its quirks.

Jaspersoft ETL facilitates scheduling a job through its Scheduler. After selecting the days and times when the job should run, the scheduler generates a crontab file that can be used with Cron on Linux and Unix, or with a program like cronw on Windows. So, whereas Jaspersoft ETL offers some help with the scheduling, it is not as seamless as Linx’s built-in scheduling support.

Jan 20, 2014

ETL Application Comparison, Part 4


In this part of our evaluation series we are having a look at SSIS (SQL Server Integration Services). SSIS is part of the Microsoft SQL Server software package and allows for performing a broad range of data migration tasks. SSIS is, similar to PDI, a workflow-based application, but it features separate sections for your project’s Control Flow and Data Flow. The Control Flow section contains the high level steps to be executed in sequence, and the Data Flow section specifies how the data is read and processed.



Figuring out how to read the data from the CSV file and moving the data into the database, with the help of a tutorial, was not too hard. However, at first I simply could not find a way to dynamically select all of the files in the input folder. The application allowed me to select one file only. Eventually, Google came to the rescue and taught me that I need to loop through the files using a Foreach Loop Container in the Control Flow section. The Foreach Loop Container looks at a fixed folder and loops through all files matching a specified wild card expression. During each iteration a variable containing the file path is set, which can then be used in the data flow section. It is conceptually simple but actually a bit tricky to get right.

Here is a screenshot of the Control Flow that I have built:
 
And here is what the Data Flow section looks like:
 

All the string manipulation, as well as the generation of the GUIDs is done by the Process Data component. It is a Script Component which allows the user to manipulate the incoming data rows using custom C# code.

For the development of our test scenario SSIS was found to be very developer-oriented and not particularly user friendly. One can easily get lost among all the different sections and properties that are available to configure. SSIS features a number of very powerful components, such as the Script Component, which allows one to write a block of C# code that is executed for each data row that passes through. Scheduling the SSIS package can be done fairly easily by adding a job to SQL Server Agent using SQL Server Management Studio.

Jan 13, 2014

ETL Application Comparison, Part 3


The next application we are evaluating in this series is Linx 5. This article represents my experiences with the current beta-version of Linx 5 (hereafter referred to as just “Linx”) while using it for the first time to implement the data loading problem as defined in the first post of this series. Linx distinguishes itself from the application of the previous post, PDI, in the sense that Linx programs are procedural in their structure as opposed to workflow-based. This aspect can make the application immediately very intuitive to use for anyone having some prior programming experience.


Instead of featuring a large array of different components to use like PDI, Linx offers a fairly minimal set of components. The number of components is smaller, but the components that are available tend to be more versatile. The ExecuteSQL component, for instance, can be used to read data from a database, or for lookups, or for updating or inserting into a database. This means that there is a smaller number of components one needs to know of in order to get the task done.

Many properties of the components support expressions, and all expressions are written in the same language, namely C#. This consistency makes things easier for Linx users, as there is no need for using different languages in a Linx program. In PDI, on the other hand, a complex transformation with many steps often uses different types of expressions. Furthermore, most fields do not support expressions, forcing one to introduce additional steps for fetching variables or making calculations. The simplicity and consistency in Linx contributes to a more positive user experience when building a Linx program.

Opposed to PDI, Linx currently lacks a component for writing a block of C# code. This did not cause any trouble with the implementation of our simple data loading scenario, but such a component would come in handy when there is a need to do some involved task that is not directly supported by any of the existing components.

Implementing the data loading problem seemed fairly natural and I was satisfied with the result. Here is a screenshot of the program that I’ve built. As I went along, I have given the components concise, but descriptive names in order to keep the program readable.

 

As you can see, the program turned out nice and simple.

Linx offers a very simple and clean way to schedule and monitor processes. A Windows Service that runs in the background periodically checks the schedules and automatically runs the processes when they are due. With the help of services, a process can be scheduled to run on a periodical basis, or whenever changes in a folder are detected.

In this exercise, building the data loading process using Linx has produced a program with a very concise representation. The task of scheduling the process to run on a periodical basis is a quick and easy process. Even though it did not hamper us in this example, one shortcoming of Linx that presented itself was the lack of a component to enable the user to do more advanced things that cannot be done using the existing components. Maybe Linx 5 will soon add a component to address that issue.

Jan 6, 2014

ETL Application Comparison, Part 2


In this part of the ETL Application Comparison series, we are going to take a look at PDI 5.0.1 (Pentaho Data Integration), an ETL application that I have used several times in the past.

PDI is a workflow-based ETL application that is fairly simple and intuitive to use. PDI features a large variety of steps, allowing one to read and write data from a wide array of data sources. When setting up database connections, the user is presented with a large array of different databases to choose from. There are many different steps available for manipulating data and controlling the flow of rows between the steps. The implication of this, however, is that there are often several ways of doing the same things by using different steps.

There are a number of scripting steps that add a lot of power to the tool. There is a Java Expression step, that allows one to evaluate any Java expression to assign its value to a field. There is a Javascript step that allows one to define a javascript program that will run once for every data row that passes through the step. The most powerful step is the Java Class step, that allows one to process the rows using Java code. Rows are read and written using Java statements, which implies that incoming and outgoing rows need not be one-to-one. One can implement the step to output many rows for each incoming row or to group a number of incoming rows into one row.

Another useful feature of PDI is the Step Metrics that are displayed while a transformation is running (or after it finished). The Step Metrics report displays the number of rows passing into and out of each of the steps and also shows the speed of each step as rows/second. This can help to identify any bottlenecks if a transformation is taking too long to run.

I have built the data loading problem as defined in the previous post of this series in PDI, and PDI handled the scenario with relative ease. Here is a screenshot of the PDI “transformation” that I have built. A very brief description of the transformation follows.
 

The icons above are known as steps. Every step in a so-called “transformation” receives data rows from a previous step, one after the other, processes it, and outputs the altered rows to the following step(s). The Read Accounts step at the top-left corner reads all the CSV files using a regular expression to find the file names in the input folder that it should read. The Account Found? step splits the stream of incoming rows into two branches depending on whether the account already exists in the database or not. The Remove values step removes the fields that were created in the account-not-found branch, so that the rows of the two branches can be merged at the Build file paths step. Finally, the Group by File Name step groups the incoming rows by the corresponding source file names, outputting only one row per file. The Move Files step then moves each file to the processed folder.

Scheduling a PDI job or transformation to run on a periodical basis is not too trivial. You have to use the command line tool Pan.bat (or pan.sh for Unix-based systems) for transformations or Kitchen.bat (kitchen.sh) for jobs to achieve that (what we have built above is a transformation; a job consists of a number of entries that execute in sequence, and may include entries that invoke transformations). The recommended approach to scheduling is to write a simple .bat file that calls Kitchen with the necessary parameters and redirects the console output to a log file; then schedule the .bat file to run using Windows Schedule Tasks.

In conclusion we have seen that implementing the data loading problem in PDI was fairly straight-forward. The biggest catch for an inexperienced user of the application is probably the fact that any new, temporary fields that were defined in a branch need to be removed in order to merge the streams together again. Scheduling the process is doable, but could have been made easier.