You hear time and time again how 50 to 80 Percent of Data science projects is spent on data wrangling munging and transformation of raw data into something usable. For me personally. I’ve automated a lot of those steps. I built tools over the last 20 years that help me do more in less time….
Tag: ADLA
NOAA Radar and Severe Weather Data Inventory
After I finished evaluating the Storm Events Database from NOAA, I was convinced we needed to look for machine recorded events. When you start poking around the NOAA site looking for radar data, you’ll find a lot of information about how they record this data in binary block format. Within this data, you’ll find measurements…
U-SQL: Automating Schema on Read
Moving to U-SQL for your ETL can feel like a step back from the drag and drop functionality we have in SSIS. But there is one great thing about your ETL being defined in text rather than a UI: you can automate it! Today I’m going to show you how you can automate the part…
U-SQL and ETL Processing–Part 2
We’ve been going through a simple U-SQL script to perform some ETL processing in Azure Data Lake. Last time, we started by covering some basic syntax like variables and expressions. Now, we’ll pick up with some transformations. SSIS v U-SQL In traditional data warehouse ETL, we’ve been spoiled by the ease of drag and drop…
U-SQL and ETL Processing
When you get started with Azure Data Lake Analytics and U-SQL specifically, you may get a little confused. It looks like a mash-up of T-SQL and C#. Turns out, That’s exactly what it is! You can find lots of information on MSDN, or GitHub, or StackOverflow. Let’s get started with some basics. Variables, DataTypes, and Case…
Azure Data Lake Storage ACL Automation
In my last blog entry, we covered how to layout folders in your Data Lake Storage account based on a logical design. That’s only half the battle. You also need to set up Access Control Lists (ACLs). Setting up controls via the Azure portal is easy, but not something you can automate. Today, we’ll jump…
Azure Data Lake, Step by Step
Over the next few blog posts in this series, I’m going to share with you the story of how a Data Lake project comes together. As I tell this story, I’m going to keep pointing back to traditional ETL work and to Automation techniques. Not all of these will include Biml. I hope to help…
Data Warehouse Automation is for Data Lakes too!
Last week you might have seen a tweet about a day in the life of a Data Platform Consultant. To say the least, my days are varied.This day, in particular, I was split between building out automated ETL tests using Biml and spinning up a new Azure Data Lake. Up until recently, I would have…