Dec 30, 2013

ETL Application Comparison, Part 1


We are starting a series of articles in which we are evaluating and comparing a number of ETL/automation applications to see how Linx matches up to the competition. We have set up a simple data loading scenario that we are going to solve using different products to see how things go. In our evaluations we will focus on how easy and intuitive it is to build the program, and we will also take a look at how easy it is to schedule a task to run on a periodical basis.

The data loading problem that we have defined involves reading a number of CSV files containing account information, and storing and updating the information in a number of database tables. Each input CSV file looks something like this:


Ref,Balance,Title,GivenName,Surname,Birthday,EmailAddress,TelephoneNumber,StreetAddress,City
911528,970,Mr.,William,Velez,3/20/1988,WilliamLVelez@einrot.com,083 741 0515,1115 Rissik St,Nasaret
470206,392,Mr.,Matthew,Seeley,6/30/1978,MatthewBSeeley@fleckens.hu,084 926 4283,2428 Mandela Dr,...
725923,663,Ms.,Alta,Stewart,1/29/1942,AltaAStewart@jourrapide.com,084 201 3307,435 Visser St,...
067808,224,Mr.,William,Gilbert,8/11/1942,WilliamRGilbert@fleckens.hu,084 445 8760,2241 Visser St,...
179263,308,Mr.,Michael,Herrmann,3/25/1934,MichaelJHerrmann@einrot.com,082 318 0187,1473 Bath Rd,...
.
.
The contents of the CSV files have been composed with the help of the Fake Name Generator, so all the information is random and does not represent real life data.

This information must be stored in a SQL Server database into three tables: Account, Person, and PersonContact. The first value of each row in the CSV file represents a reference number for the account that will be used to do a lookup to see if the account already exists. If the account does not yet exist, new entries will be made into all three tables. Otherwise, only the balance in the Account table will be updated. The Title, GivenName, and Surname fields must be concatenated together with spaces in between and stored in the FullName column of the Person table. The spaces in the TelephoneNumber must be removed before it is stored in PersonContact. The full name and the email address fields must have their lengths checked and be truncated at the end in case they would be too long to fit into the corresponding table columns. Finally, each input file must be moved from the input folder to a processed folder as soon as it is processed.

I am a new employee at Digiata and have been assigned to work with the Linx development team. I have a fair amount of experience working with PDI (Pentaho Data Integration), and in the next post of this series I will be building the data loading problem using PDI. Later in the series I will be using Linx 5 for the first time to implement the data loading scenario, and other ETL applications will follow after that.

Dec 2, 2013

Linx 4: Writing large files

We recently had an issue reported that Linx 4 takes a long time to build strings. The problem showed up when trying to write a large file in Linx 4 took a lot longer than it used to in Linx 2. The process to prove it was


  1. Create a string variable.
  2. Read a database table with 1 million records.
  3. Loop through the records and add to the string variable i.e. for each record string = string + record data.
  4. Write the string to file.


The problem turned out to be the creation of a very large string by appending to the same string in a loop. We have improved the performance to Linx 2 levels by using a different string concatenation mechanism when we detect the process wants to append to a string. Even though this solves the speed problem it does not mean that building large strings in memory is a good idea.

To illustrate the problem I built a small application that writes 500 character long rows to build 3 different file sizes. One method uses the string concatenation method and the other writes directly to file. There was not much difference in speed but memory usage was drastically different:


So be careful when building very large strings in memory, especially if they can happen simultaneously. At least now you won't have to wait very long for the crash...

Linx 4.0.693.3441 released

Linx 4.0.693.3441 is available for download. It includes a mix of bug fixes, performance improvements and new components. Please see the release notes for all the gory details.

There is a very good reason to upgrade:



Remember to give Murray a high five when you see him again.

The search looks for the complete string, including spaces, in the solution. Double click on a result to open the relevant tab. Please let us know how we can improve it.

Nov 27, 2013

Linx 5.0.392.5076 released

The third Beta release is out. Linx 5 will automatically prompt you to update after opening the Linx Designer. If you don't have it you can get it from linx.twenty57.com.

Changes


This release includes bug fixes, improvements to existing components and the following changes.

Settings


Settings are now solution wide. Any project in a solution can access the same settings. Settings can be changed on the Linx Server to override the default settings in the solution.

Folders


The default Processes and Services sections have been removed from the Solution Explorer. Folders can now contain any mixture of Processes, Services, User Defined Types or other Folders. Use Folders to organize your solution the way you want to.

Linx Server


  • Styling.
  • All processes can now be run from the Linx Server UI.
  • Settings editor.
  • Lots of small improvements.


New components


  • XmlPeek - Extract a snippet of xml using XPath.
  • XmlPoke - Write a snippet of xml to an existing file or xml text.


Analytics


We have included analytics in Linx Designer. This means Linx Designer will send us information on how Linx is used. Users have the option to switch it off but the information is completely anonymous and does not contain any user or solution related data. The information gathered will help us build a better Linx.


Dogfood


We now use Linx to queue and forward the email from our product sites. This allows feedback from website users to always succeed even if email servers or related infrastructure are down.

Nov 5, 2013

Linx 4: Hidden features #2

Speed up by saving the solution as a lsoz file


Saving your solution as an lsoz file compresses it to +-5% of the original lso file. For larger solutions it makes tasks like updating source control and uploading to servers much faster. I know of one solution that can go from 76MB to 4MB...

To do this just save the file using File - Save As.

Nov 1, 2013

Linx 4: Hidden features #1

Today Keegan pointed out to me that there is a feature in Linx 4 that everybody might not know about - tooltips on debug values. Please send me your favorite hidden features and I'll publish them here.

Tooltips on debug values


When debugging you can view the full text (up to 1000 characters) of the value you are interested in by hovering your mouse over the value. Right click on the value to get a Copy shortcut menu if you want to use the value somewhere else.


Oct 28, 2013

Linx 5 Release 5.0.346.4840

The second Beta release is out. Linx 5 will automatically prompt you to update after opening the Linx Designer. If you don't have it you can get it from linx.twenty57.com.

Changes in this release



Dogfood

We now use Linx to build Linx. It has replaced Nant to get source code from repositories, build the software, run the unit tests, start and stop virtual machines, copy files around, log stuff and all the bits and pieces that are required to make an automated build process work. So far it looks like it will be much easier to maintain but time will tell...

Stadium

The new Stadium has been released. Create web applications without coding with Stadium. You can get it here.

Sep 18, 2013

Linx 5 Beta Released

Download the beta from http://linx.twenty57.com. Thanks to Julian, Victor, Murray, Petra and Franz for their perspiration and inspiration.

This version is not compatible with Linx 4 and it includes a reduced set of components. It can be run side by side with Linx 4 but remember to specify a different server port when running the install on a machine with Linx 4 installed.

Some of the bigger changes are

Auto update
Linx checks for updates at start-up. If an update is found it offers to install it.


Deploy to server
Solutions can be uploaded to servers from the Designer.


Run
A process can be run without having to go through the debugger. It launches the Linx process in a different Windows process and runs it.


Types
Types and Custom Types can be used to keep state in a Linx process. The Variable component has therefore been deprecated.


Expressions
C# expressions can be used in most places where values are assigned e.g. properties.


SQLEditor
SQLEditor has syntax highlighting.


TextFileReader
TextFileReader includes parsing of CSV and fixed length files.


Constants
Constants are now shown in a tab and are not part of the solution tree.


Help
Help is available online and can be reached by clicking on the ? next to a component.


Input and Output
DataIn and DataOut are replaced with Input and Output properties on the Process.


Service Events
Service events are now full blown Linx processes.


Razor Templates
There is a new component called RazorTemplateTransform that can use data in Linx to populate a Razor template. Razor templates are used in ASP.NET MVC to generate html pages. It makes generating text easier.


RunProcess Component
A Linx process can now be dragged directly from the solution tree to the Process tab. The RunProcess component has therefore been deprecated.


Solution Name
The solution name is now the same as the file name. If you change one the other one also changes.

TimerService UI
The TimerService has a simplified UI.


Debug values
Input and Output values are now available in the debugger.


May 23, 2013

Linx version 4.0.520.3104

Please note that Linx version 4.0.520.3104 is available for download. See the release notes for a complete list of features.

This is the first official release of Linx 4, after our beta period that we used to gather user feedback and fix any outstanding issues. As you can see from previous blog entries (Linx version 4.0.70.1807 and Linx version 4.0.132.2025), there are numerous changes that help to improve usability, speed and stability. We hope you enjoy using the new version and as always, comments and suggestions are welcome.

May 22, 2013

Linx version 2.4.857.1427

Please note that Linx version 2.4.857.1427 is available for download. See the release notes for a complete list of features/fixes. Notable changes include:
  • Add CSVReader component
  • Refactor CallFincad component to work with 2009 & 2012 versions
  • Numerous bug fixes, including:
    • Validate ImageSplit output directory in component execute
    • Object reference error in PowerPoint scripting component
    • Clear existing table rows when populating with string data
As always, comments and suggestions are welcome.