Jan 20, 2014

ETL Application Comparison, Part 4

In this part of our evaluation series we are having a look at SSIS (SQL Server Integration Services). SSIS is part of the Microsoft SQL Server software package and allows for performing a broad range of data migration tasks. SSIS is, similar to PDI, a workflow-based application, but it features separate sections for your project’s Control Flow and Data Flow. The Control Flow section contains the high level steps to be executed in sequence, and the Data Flow section specifies how the data is read and processed.

Figuring out how to read the data from the CSV file and moving the data into the database, with the help of a tutorial, was not too hard. However, at first I simply could not find a way to dynamically select all of the files in the input folder. The application allowed me to select one file only. Eventually, Google came to the rescue and taught me that I need to loop through the files using a Foreach Loop Container in the Control Flow section. The Foreach Loop Container looks at a fixed folder and loops through all files matching a specified wild card expression. During each iteration a variable containing the file path is set, which can then be used in the data flow section. It is conceptually simple but actually a bit tricky to get right.

Here is a screenshot of the Control Flow that I have built:
And here is what the Data Flow section looks like:

All the string manipulation, as well as the generation of the GUIDs is done by the Process Data component. It is a Script Component which allows the user to manipulate the incoming data rows using custom C# code.

For the development of our test scenario SSIS was found to be very developer-oriented and not particularly user friendly. One can easily get lost among all the different sections and properties that are available to configure. SSIS features a number of very powerful components, such as the Script Component, which allows one to write a block of C# code that is executed for each data row that passes through. Scheduling the SSIS package can be done fairly easily by adding a job to SQL Server Agent using SQL Server Management Studio.

No comments:

Post a Comment