We’ve been going through a simple U-SQL script to perform some ETL processing in Azure Data Lake. Last time, we started by covering some basic syntax like variables and expressions. Now, we’ll pick up with some transformations. SSIS v U-SQL In traditional data warehouse ETL, we’ve been spoiled by the ease of drag and drop…
Month: October 2017
U-SQL and ETL Processing
When you get started with Azure Data Lake Analytics and U-SQL specifically, you may get a little confused. It looks like a mash-up of T-SQL and C#. Turns out, That’s exactly what it is! You can find lots of information on MSDN, or GitHub, or StackOverflow. Let’s get started with some basics. Variables, DataTypes, and Case…
Azure Data Lake Storage ACL Automation
In my last blog entry, we covered how to layout folders in your Data Lake Storage account based on a logical design. That’s only half the battle. You also need to set up Access Control Lists (ACLs). Setting up controls via the Azure portal is easy, but not something you can automate. Today, we’ll jump…
Azure Data Lake Storage Zone Layout Automation
After laying out the structure for our zones, my client quickly asked, is there a way we can automatically stand up this structure each time we bring a new Tenant into our solution. With a smile, I replied it was possible! In the short term, we would use PowerShell to stand up these folders every…
Azure Data Lake, Step by Step
Over the next few blog posts in this series, I’m going to share with you the story of how a Data Lake project comes together. As I tell this story, I’m going to keep pointing back to traditional ETL work and to Automation techniques. Not all of these will include Biml. I hope to help…
Data Warehouse Automation is for Data Lakes too!
Last week you might have seen a tweet about a day in the life of a Data Platform Consultant. To say the least, my days are varied.This day, in particular, I was split between building out automated ETL tests using Biml and spinning up a new Azure Data Lake. Up until recently, I would have…
The File Staging Loop
As I mentioned in the overview, the largest cost in terms of time is the requesting each web page from the web server and downloading that file to disc. That’s why this file staging loop is a separate step from the parsing and transforming step. This loop can be as simple as two steps: get…
Parsing and Extracting Web Data
In the last article, I laid out the architecture to dealing with this type of data source. This time, we’re going to get in to the basics of parsing data. A Few New Technologies Before we dive into details, let’s cover the technologies this solution rests on. First, there’s C#. As a Microsoft data professional,…