Skip to content

shannonlowder.com

Menu
  • About
  • Biml Interrogator Demo
  • Latest Posts
Menu

Category: Data Engineering

SQL Server to Databricks Profiler

Posted on January 26, 2023January 19, 2023 by slowder

Recently I had a client that expressed interest in migrating their data warehouse from Azure SQL DB to Databricks. They weren’t looking to move due to any performance issues in Azure SQL DB. They were running on the Hyperscale offering. They were looking to share a common data architecture between their data warehouse and data…

Continue reading

Delta Sharing – Data Recipients

Posted on January 24, 2023January 13, 2023 by slowder

Recently I was asked to look into Delta Sharing to learn what it’s all about and how it could be used. fter digging in a little bit, It appears that it’s a way to share data in parquet or Delta format. ou can build these shares on top of any modern cloud storage system like…

Continue reading

Metadata-Driven Python

Posted on January 19, 2023January 19, 2023 by slowder

You will manually build your ingestion code when you first learn to ingest data into any new engine.  This makes sense; you’re just getting started.  You want to learn how the engine will read the data and then write it back again. You want to learn how to log what’s happening during ingestion. You want…

Continue reading

Monolithic vs. Unit-of-Work

Posted on January 17, 2023January 6, 2023 by slowder

When we start developing in a new language or on a new platform, it’s easy to fall into the trap of monolithic design.  The free flow from idea to code leads to a single blob of functional code. This leads to quick prototype code that meets the functional requirements. The problem with this approach creeps…

Continue reading

Testing Ingest

Posted on January 12, 2023January 6, 2023 by slowder

Last time, we built a simple transform function in Python, but how do we know if it works? We need to build some tests to find out. I admit data engineering has be late to the practice of test development, but it’s not too hard to adopt. Let’s work through a simple data test, a…

Continue reading

Developing Datbricks Ingestion locally

Posted on January 10, 2023January 5, 2023 by slowder

Spark engines like Databricks are optimized for dealing with many small-ish files that have already been loaded into your Hadoop-compatible file system. If you want to process data from external sources, you’ll want to extract that data into files and store those in your Azure Data Lake Storage (ADLS) account attached to your Databricks Workspace….

Continue reading

Prepare VSC Local Databricks Development

Posted on January 5, 2023January 2, 2023 by slowder

Last time, we walked through how to perform analysis on Databricks using Visual Studio Code (VSC). This time, we will set up a local solution in VSC that will let us build out our data engineering solutions locally. That way, we don’t have to pay for development and testing time. We’d only pay for Databricks,…

Continue reading

Data Engineering for Databricks

Posted on December 13, 2022December 20, 2022 by slowder

Since Databricks is a PaaS option for Spark and Spark is optimized to work on many small files, you might find it odd that you have to get your sources into a file format before you see Databricks shine. The good news is Databricks has partnered with several different data ingestion solutions to ease loading…

Continue reading

Databricks for SQL Professionals

Posted on November 18, 2022November 19, 2022 by slowder

I’ve been a Microsoft Data professional for over 20 years. Most of that time I’ve spent in the SQL Server stack, the core query engine, SSIS, SSRS, and a little SSAS. But times changed, and the business problems grew more complex. As they did, I looked at other technologies to try and answer those questions….

Continue reading

Notebooks Explore Data

Posted on October 22, 2022November 14, 2022 by slowder

On a recent engagement, I was asked to provide best practices. I realized that many of the best practices hadn’t been collected here, so it’s time I fix that. The client was early in their journey of adopting Databricks as their data engine, and a lot of the development they were doing was free-form. They…

Continue reading
  • 1
  • 2
  • 3
  • Next
  • Career Development
  • Data Engineering
  • Data Science
  • Infrastructure
  • Microsoft SQL
  • Modern Data Estate
  • Personal
  • Random Technology
© 2023 shannonlowder.com | Powered by Minimalist Blog WordPress Theme