We’re back to our ETL with SSIS project, and today we’re going to pick up with transforming the data from our raw format to the final destination for our contact data. So far we’ve scanned a directory for files matching a certain pattern. After finding a file matching that pattern we set up some variables, truncate our raw table, then insert the data from the file into that raw table.
Once we’re at this step I usually build a stored procedure that will search for bad data within the raw table. I’ve been known to search for leading spaces in text data and remove it. I also search columns that will be saved in INT or some numeric for, and search for rows that will fail that conversion. This is the best time to search for these kind of errors because once you put this data into the destination table, updates could take much longer.
Think about it. Let’s say AdventureWorks had more than a million contacts, but you’re only importing a thousand at a time. You know it takes less time to update a table of a thousand than it does to update a table of a million.
For this transformation I’m simply going to “line up my data” so that the source columns in the raw table will be inserted into the correct columns in the destination table. If you want to explore more here, build the queries to search for leading and trailing spaces in each column and remove the spaces.
I’ll just walk you through the transformation to the destination table.
Just like before, we’re going to rename the task to something notable. I’m going with “Transform _Raw data to Destination Table”.
Open up the task, and set the Connection to your database connection. And then set your SQLStatement to any query you want, as long as you include the following query as the last query in your SQLStatement.
With this as the last statement, any transforms you want to do will be performed, then the last step will be to insert the data into the destination, Person.Contact.
With these five basic steps you can perform most every ETL workflow I’ve ever worked on. But there is far more you can do with SSIS. In my next post, I’m going to cover how you can handle errors in the Data Flow Task. There are a few different ways to do this. I’m going to show you the two most common way I’ve handled it in the past.
If you have any questions about SSIS, please send them in. I’ll do my best to answer the questions, and help you learn more about using SSIS to perform your ETL projects!