In addition to working on my database sharding presentation I’ve been working on upgrading an ETL framework. In learning this new framework I dug into the business logic as to what the framework was trying to load, looked into all the auditing that was currently captured, and I looked for gaps when packages didn’t perform as expected. It took a few weeks to dig through the full project, and now that I’m starting to make incremental improvements to this process, I’m finding PowerShell can be a powerful tool in your SSIS tool belt. The trick is, you have to sharpen that tool before you can really use it.
SQLPSX
SSIS Packages are just XML right? So you can just parse out the XML and interact that way, right?
You could. It would take a lot of time and energy, but you could. I’m fortunate to have stumbled across SQLPSX, SQL Server PowerShell Extensions on codeplex. With this, I was able to really dig into my packages and manage my variables (both at the package and container level), connection strings, executables, and more! All of that without once having to build out a XMLPath statement or build out a complex TRY..CATCH to see if the package put the variable in the root node, or in one of the children nodes of the current element.
This tool hasn’t been touched since March 2011, but it’s served as the base for several tools I built over the past week. Download it, and PowerGUI, my current PowerShell IDE. (If you know a better IDE for PowerShell, hit me up in the comments!) Then get started with a couple of these demos.
What packages have this variable?
In this framework we moved from building MERGE statements through expressions, to building them via a stored procedure. The problem is we have nearly a hundred packages I have to go check to find out which ones were already upgraded, and which ones haven’t yet been upgraded.
How much time would that take to open each package in your project, look for the variables and build a to-do list?
Too much.
Let’s do it the easy way:
update: I’ve added the management scripts to a GitHub repository, and you can find the reading variables script here!
Import-Module -Name c:\code\posh\SQLPSX-Modules\SSIS $path = "C:\code\localrepolocation" $loadPackages = Get-ChildItem -path $path -Filter "*_load.dtsx" foreach($loadPackage in $loadPackages) { $packagePath = $path + "\" + $loadPackage $package = Get-ISPackage -path $packagePath foreach($variable in $package.Variables) { if ($variable.Name -eq "VariableName") { $package.Name + " has VariableName, with value: " + $variable.Value } } }
Ok, let’s start at the top. The first line gives me access to the SSIS POSH modules. Since I will be deploying this to other machines, I will have to make sure I deploy both this script, and the SSIS module to the same folder structure as I use in dev. If I don’t then I would need to update my package to new paths.
Next, I set a variable for the path where I’ve stored my packages. This is the working clone of the ETL repository.
Then, I spin through and grab all the packages in that folder that end with _load.dtsx. The up-side to having a naming convention is I can spin through my packages and trust that I’m grabbing only the load packages, and not my stage, transform, or transfer packages. The -Filter option can work wonders with the Get-ChildItem commandlet.
Once I’ve loaded all my packages into the $loadPackages object, in this case I think it’s a collection). I use a foreach to deal with each package one at a time.
I build out a fully qualified name and path for each package, which I need to pass on to the Get-ISPackage commandlet from SQLPSX. This is the real engine in this tool. It grabs your package, and then loads it into an object that IS your package!
Once you have this object, you can do nearly anything you could do through BIDS/SSDT. Check out the screen shot, then try it for yourself. Explore the object, see what all you have access to. It will blow you away once you see it all laid out right in front of you!
Now, back to the script. The next foreach loop spins through all our package variables, and checks their names. If we find the one named “VariableName”, then we output “That package has VariableName, with value: blah”. That output becomes your to-do list. You can then go and perform the work of upgrading just those packages that need to be upgraded, without having to open those that don’t have any work to-do!
Think about the possibilities of looking at all of your packages, and seeing all the variables you are using. You could create a CSV of all the package names and variable names. You could then drop that into a database, and build a query that would show you which packages aren’t using your standard variable names. You could also check for variables with the wrong default values. The key is this: anything you can do through the GUI, you could now do through PowerShell. That means it’s automatable, repeatable, and reduces the chance for human failure.
In a future article, I want to share with you how I built a testing framework that spins through just the files I have modified and not checked in to source control, executes them, checks to see if they succeed or not. If they succeed, then it goes to compare the results in the database versus the last production run, if the results are the same, then the output is validated. I could then build on top of that to make that a requirement before checking my new or updated packages into the repository! This is test-based development for SSIS folks!
This is exciting stuff folks!
As always, if you have any questions, please send them in. I’m here to help!