Skip to content

shannonlowder.com

Menu
  • About
  • Biml Interrogator Demo
  • Latest Posts
Menu

SQL 301 – ETL With SSIS, Part 2

Posted on January 12, 2009March 31, 2011 by slowder

etlSSIS_6In my last post on SSIS, I started you into developing an ETL solution with SSIS.  I walked you through setting up your solution, renaming the package, and setting up the variables we would use during the rest of these lessons.  Today, we’ll pick up with a Foreash Loop Container.

In most case, you’re going to deal with a vendor that will send you a file every day.  You’ll want to be able to handle the situation where the vendor misses a day, and sends you two files the next day instead.  If you want to handle that easily, then you set up your entire package inside a for each loop.

A for each loop does exactly what it says, it will do whatever you put inside the for each loop for each item in a collection it finds.  You can do something for each file in a folder, for each record in a record set, each node in an XML element, or each object in a collection of Server Management Objects (SMO).  In our case, we’re going to want to load each file it finds in a folder.

etlSSIS_7To begin, click the Foreach loop container icon from the toolbox, and drag it onto your workspace.  Double click on the object you just placed in order to get the Foreach loop editor.

I would suggest giving the object a useful name, such as “Foreach file matching SearchFilePattern in SearchPath”.

Next, click over to the collection page.

On this page, we’re going to configure the for each loop object to search our SearchPath for the SearchFilePattern.  To do that, we’re going to use Expressions.  Expressions are a combinations of variables, functions and literals used together to create a single value.  What they are will make more sense in a moment.  Click the […] icon to the right of Expressions to define your first expression.

etlSSIS_8

Once the Property Expressions Editor opens, you can click in the first row beneath property and choose Directory, since we’re going to define an expression to set where the for each loop should look.

On the line below that, choose FileSpec, we’re going to set that to SearchFilePattern.

Once you’ve got two lines, click the […] button to the right of Directory, and you’ll get the Expression Builder.

etlSSIS_9When you first work with the Expression Builder it can feel a bit wonky.  Let me walk you through how we will build our expression. 

Pivot open the Variables folder in the left half of the Builder, then scroll down to the User::SearchPath entry. Click and drag it down to the Expression block.  Double clicking an entry in the variables section does nothing.

While we’re here I’d like to point out we can do more than use variables.  Like I mention before we can add a literal to the expression.  Let’s say you didn’t set all your paths to end with a slash, but you were concatenating a path variable and a filename variable to make a file path… you’d have to add + “/” + between the two variables in order to add the slash you need between the path and the file name.

In our current example, we just need to assign the variable SearchPath to the directory expression for the for each loop.  Repeat this process for the FileSpec expression.  Set it to SearchFilePattern.  Once you’ve done that, hit OK to return to the Foreach Loop Editor.

Even with the expressions defined, I enter the folder, and files values in the Foreach Loop Editor.  That way you can test your code.  I also code my solutions to expect the for each loop to pass back just the filename and extension, so change the radio button in the “Retrieve file name” section to “Name and extension”.

etlSSIS_10Next, we’re going to click on the Variables Mapping option in order to set the file name the loop finds to our FileName variable.

Simply click on the line under Variable, and choose User::FileName, and leave the index to zero.  Since that’s the first value that will return from our for each loop, you don’t have to change that… if you want other variables set, then you’d go messing with that… but for now, leave it alone.

We’re now finished with the for each loop.  Hit ok to continue.

The next thing we’re going to cover is a Script Task.  In the toolbox, click and drag a Script task to the inside of the for each loop we created earlier.

etlSSIS_11Double click the Script Task you just created to start the Script Task Editor.

Don’t ask me Why, but the editor should start with the General tab first.  Click over to general and give your Script task a useful name.  I’m choosing: “1-Set values for archive and error file paths”.  I number my steps to make reading the log files a little easier.

Once you’ve named your script task, click back over to the Script tab.  On this screen we need to set 3 pieces of information.  First the scripting language.  Since SSIS is built with .NET you can choose C# or Visual Basic.  I’m going to leave this option to you.  I’ll use VB in this lesson, but in future lessons you’ll see some C#.

Next, we want to tell the script what variables it can have access to.  You can set variables as read only to the script, or read and write.  Basically, will the script need to change values stored in those variables or not.  In our example, User::DestinationPath, User::FileName, and User::SearchPath will not be changed in this script, but are required to compute the archive and error file paths. So we enter them as ReadOnlyVariables

That leaves us putting User::ArchivePathAndFileName and User::ErrorPathAndFileName as ReadWriteVariables.

Once we have that, click Edit Script to begin editing.

All of the editing you’re going to do in this script will be in the Main() function.  If you’re not familiar with programming concepts, please let me know, and I’ll get you up to speed!

First we’re going to set the ConnectionString for our flat file source to the file found by the for each loop.  To do that, we add the following line to the Main function.

   1: Dts.Connections("NewFile").ConnectionString = Dts.Variables.Item("User::SearchPath").Value.ToString() _

   2:             + Dts.Variables.Item("User::FileName").Value.ToString()

   3:  

All we do is take the SearchPath variable and Concatenate the FileName variable onto it.  Yes, we could have captured the filename and path by the loop itself, but for now, practice some coding.  When you learn to improve your SSIS Packages, feel free to do it either way!

If you’re debugging your code, you might want to know how to throw values to a dialog box.  That way you can see what’s going on in the script.  To throw the NewFile Connection string to a dialog box, include the following command to your script.

   1: System.Windows.Forms.MessageBox.Show(Dts.Connections("NewFile").ConnectionString.ToString())

   2:  

But remember, before releasing this to production, comment out that line by adding a single quote to the beginning of the line.

Next we’re going to set the ArchivePathAndFileName.  This one is a bit more complex.

   1: Dts.Variables.Item("User::ArchivePathAndFileName").Value = _

   2:             Dts.Variables.Item("User::DestinationPath").Value.ToString() + "archive\" + _

   3:             Dts.Variables.Item("User::FileName").Value.ToString.Substring(0, _

   4:             Dts.Variables.Item("User::FileName").Value.ToString.LastIndexOf(".")) + "_" + _

   5:             Now().ToString("yyyyMMdd") + ".txt"

   6:  

Again, we could have set this in an expression, but that’s the great thing about T-SQL and SSIS, there are always several ways to accomplish the same goal.  In this case we want to take the DestinationPath variable, add on archive\, so it will put the files in the archive subfolder in the destination folder.  In our example, the vendor reuses file names, and we don’t want our archive to overwrite older files, so we take the given file name, take of the name part, add the current date onto that name, then put the extension back on the end.

That way filename.txt gets saved as filename_0090112.txt in the archive folder.

We repeat the process for the error file path.

   1: Dts.Variables.Item("User::ErrorPathAndFileName").Value = _

   2:         Dts.Variables.Item("User::DestinationPath").Value.ToString() + "error\" + _

   3:         Dts.Variables.Item("User::FileName").Value.ToString.Substring(0, _

   4:             Dts.Variables.Item("User::FileName").Value.ToString.LastIndexOf(".")) + "_" + _

   5:         Now().ToString("yyyyMMdd") + ".txt"

   6:  

Ok, with that, if you want to debug, feel free.  That way you can see the source filepath, and the destination paths.  Your package won’t really do anything just yet.  We’ll fix that soon enough.  Next time, we’re going to cover the structure of our raw table, and set it up.  We’ll talk about Truncating that table before each attempt to load the data from the text file to the raw table.  Then we’ll talk about the Data Flow Task.

Until then, if you have any questions, please let me know!  I’m here to help you understand SSIS better.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • A New File Interrogator
  • Using Generative AI in Data Engineering
  • Getting started with Microsoft Fabric
  • Docker-based Spark
  • Network Infrastructure Updates

Recent Comments

  1. slowder on Data Engineering for Databricks
  2. Alex Ott on Data Engineering for Databricks

Archives

  • July 2023
  • June 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • October 2018
  • August 2018
  • May 2018
  • February 2018
  • January 2018
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • June 2017
  • March 2017
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • August 2013
  • July 2013
  • June 2013
  • February 2013
  • January 2013
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • November 2004
  • September 2004
  • August 2004
  • July 2004
  • April 2004
  • March 2004
  • June 2002

Categories

  • Career Development
  • Data Engineering
  • Data Science
  • Infrastructure
  • Microsoft SQL
  • Modern Data Estate
  • Personal
  • Random Technology
  • uncategorized
© 2025 shannonlowder.com | Powered by Minimalist Blog WordPress Theme