Skip to content

shannonlowder.com

Menu
  • About
  • Biml Interrogator Demo
  • Latest Posts
Menu

SQL 301 – ETL With SSIS, Part 1

Posted on January 2, 2009April 5, 2011 by slowder

I’ve discussed SSIS on this site before, but I’ve heard the feedback, you want real world examples. So, I decided to use the business case I address most often using SSIS, and that’s Exchange Transformation and Load. 

Let’s dive right in to this problem.  You receive a file via FTP from a vendor every day.  You need to import this data into your database.  The data file is a comma delimited text file.  This process has become so common I have a design template I follow when developing this solution.

I search the source folder for files matching a certain naming scheme, I then import the files into a “raw” table.  A raw table is simply a table with the right number of columns, yet all the data types are VARCHAR.  That way if they pass a bad data type, I can test for that, and fail an individual record, rather than failing the whole file.  Once the data is in the raw file, I then scrub the data.  I’m looking for bad data types, or inconsistent data.  Once I’ve handled all the problems, then I import the data into the “production” table, or a table that will be used by other processes.

Simple, right?

Right.  Let’s get started

etlSSIS_1Open Visual Studio (in this example I’ll be using Visual Studio 2008), and hit File >New >Project.  Choose the project template for Integration Services Project.  Give your project a name, in my example I’ve named it GenericFileImport.  You can create this project wherever you like. Hit OK on the new project wizard to continue.

 

When Visual Studio loads, you should see a pretty blank screen.  This is your canvas. 

etlSSIS_2Before I start, I like to rename my dtsx file from Package.dtsx to a useful name.  Otherwise when you publish these, they’ll all have the same name.  Not useful!

Right click on package.dtsx in the solution explorer.  By default it’s on the right.  Choose rename.  I chose to call mine GenericFileImport.dtsx.

etlSSIS_3When you rename your package, you will get a dialog box asking if you want to rename the package object as well.  Trust me, you want to hit yes.  Information inside the package won’t match, and you’ll have issues further down the road.  So hit yes on that dialog box.

Now that we have that, let’s start with the variables.  There are some parameters we’re going to want to set when we run this package. 

  • We want to be able to change the path where we expect to find the files.  Right now we get this via ftp, what if we wanted to get it from a file share instead?
  • We want to be able to change the file pattern, imagine the vendor wants to change from a txt file to a csv file.
  • We want to be able to set our archival location.  Right now we may store these on one server, but what if we have a new file server added to our network later?

etlSSIS_4In order to do this easily, we will need to use variables.  In order to add variables, hit View > Other Windows > Variables.  A panel should slide out from the right (by default).  This is your Variables viewer.   Click the first icon, and you’ll be able to add a variable.

First we’re going to add our SearchPath variable to store the folder we want to search for our file.  We enter the name and leave the scope as GenericFileImport (This makes the variable global.  In a later post I’ll cover the various scopes and when you should use each.  For now, let’s leave this default.)  Next we change the data type to string, and for a value, let’s have that point to a test folder.

In the project folder I’ll create a sub folder called “sourcefolder”.  I’ll then use the full path to that folder as my test.  I would like to take an moment to mention a common mistake.  When you’re dealing with paths, you have to make a call, am I always going to end with a \ or not.  In this example, paths will always end with a \.  If you have problems getting pathing to work in your packages, I suggest you check to see which you’ve chosen, to include the slash or not.

Ok, while we’re here, we’ll go ahead and set up a few more variables.

  • DestinationPath, a string.  I would suggest setting up a test folder in your project to stand in as your destination.  I called mine destinationfolder.
  • ArchivePathAndFileName, a string.  We’ll set this in a scripting object later, you don’t have to fill this in yet.  But you will want to put an archive sub folder in your destination folder.
  • ErrorPathAndFileName, a string.  Again, we’ll set the value in a scripting object.  Again, I’d create a folder under your destination folder called error to that will hold your error files.
  • FileName, a string.  This will be the name of the file we find in the search folder.  You can leave this blank for now. 
  • SearchFilePattern, a string.  This is the pattern that will be used by the package to find the source file.  In this case we’re going to look for a .txt file.  Go ahead and download this file now, and save it to the source folder you created earlier.

Once you’ve set up all those variables, you’re ready to continue.  Take a look at the following screen cap to make sure we’re on the same page.

etlSSIS_5

Before we end for today, we need to set up data connections.  Without them, this whole process couldn’t work.  We’re going to set up two connections, one to our source data file, and the other to our database server.

To set up connections, look at the bottom of Visual Studio.

etlSSIS_12

Yeah, it really is that easy.  Right click in that area, and choose New Flat File Connection.  You’ll need to fill in a Connection ManagerName, I chose NewFile.  Then browse to the demo file you’re using to test your import.  Luckily for this demo, all the default options on this screen can be left alone.  Our demo file is a delimited file.  It isn’t using any text qualifiers, like quotes.  We don’t want to skip the first row.  The only option we have to change on this first screen is check the box “Column names in the first data row”.

etlSSIS_13

Check the image to the left to compare how you set up your file connection. 

You will want to click the columns tab on the left, in order to proof your file connection.  Verify the columns appear correctly.  In other words, does it look like a spreadsheet, or is the text all run together?

If it looks like a spreadsheet, you’re finished with this connection.  If it doesn’t, please reach out to me, and we’ll work through your issue.

Hit Ok to save this connection.

Next we’re going to Set up a connection to our database server.  In my case I’m connecting to the training server.  You’ll need a connection to a database server where you have a copy of AdventureWorks installed.

etlSSIS_14To set up a sql connection, right click in the connection manager section of Visual Studio and choose “New OleDB Connection”  The “Configure OLE DB Connection Manager” will open.  If you haven’t set up a connection before, then you’ll have to define your connection now.  Hit New to start a new connection manager.

You’ll need to enter the server name, and your authentication method.  Then choose the Database AdventureWorks.  Be sure to click “Test Connection.” That way you can verify you can connect to the server.

Once your connection is verified, click ok, then ok.  you now have two connections in your package, one for your flat file source, the other for your database destination.  It’s just a decorative thing, but I usually right click and rename my database connections to just the database name.  Since the server name and credentials are almost always variable… there’s no need to save that information to the connection name.

Let’s stop here for now.  Next time we’ll pick up with a for each loop and a scripting task.  With these we’ll be able to handle multiple files loaded to our SearchPath, and for each file we’ll be able to set the FileName, ArchivePathAndFileName and ErrorPathAndFileName.  Necessary steps we have to take in order to import this file into our database.

If you have any questions so far, please, let me know!  I’m here to help.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • A New File Interrogator
  • Using Generative AI in Data Engineering
  • Getting started with Microsoft Fabric
  • Docker-based Spark
  • Network Infrastructure Updates

Recent Comments

  1. slowder on Data Engineering for Databricks
  2. Alex Ott on Data Engineering for Databricks

Archives

  • July 2023
  • June 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • October 2018
  • August 2018
  • May 2018
  • February 2018
  • January 2018
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • June 2017
  • March 2017
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • August 2013
  • July 2013
  • June 2013
  • February 2013
  • January 2013
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • November 2004
  • September 2004
  • August 2004
  • July 2004
  • April 2004
  • March 2004
  • June 2002

Categories

  • Career Development
  • Data Engineering
  • Data Science
  • Infrastructure
  • Microsoft SQL
  • Modern Data Estate
  • Personal
  • Random Technology
  • uncategorized
© 2025 shannonlowder.com | Powered by Minimalist Blog WordPress Theme