Databricks – shannonlowder.com

A New File Interrogator

Posted on July 3, 2023July 6, 2023 by slowder

If you work with CSV files, you know how important it is to have a clear and consistent schema for your data. A schema defines your data’s structure, format, and types and helps you validate, transform, and analyze it. The problem is CSV files have no INFORMATION_SCHEMA or similar metadata store you can query to…

Using Generative AI in Data Engineering

Posted on June 29, 2023June 29, 2023 by slowder

As a professional, I place a high value on my role as a data engineer, with consulting being a close second. My role requires both technical and interpersonal skills. I’ve come to depend on two AI-based tools to enhance both. Grammarly has helped me be cognizant of my tone and style of writing. Copilot has…

Delta Sharing – Data Providers

Posted on January 31, 2023February 18, 2023 by slowder

Setting up Delta Sharing in Databricks is straightforward once you understand the diagram provided in the Azure Databricks documentation. Delta sharing is implemented as a part of Databrick’s Unity Catalog. Unity catalog is the official data governance solution for Databricks. You can consider it an extension to the metastore catalog or Databricks version of a…

SQL Server to Databricks Profiler

Posted on January 26, 2023February 18, 2023 by slowder

Recently I had a client that expressed interest in migrating their data warehouse from Azure SQL DB to Databricks. They weren’t looking to move due to any performance issues in Azure SQL DB. They were running on the Hyperscale offering. They were looking to share a common data architecture between their data warehouse and data…

Delta Sharing – Data Recipients

Posted on January 24, 2023January 13, 2023 by slowder

Recently I was asked to look into Delta Sharing to learn what it’s all about and how it could be used. fter digging in a little bit, It appears that it’s a way to share data in parquet or Delta format. ou can build these shares on top of any modern cloud storage system like…

Developing Datbricks Ingestion locally

Posted on January 10, 2023January 5, 2023 by slowder

Spark engines like Databricks are optimized for dealing with many small-ish files that have already been loaded into your Hadoop-compatible file system. If you want to process data from external sources, you’ll want to extract that data into files and store those in your Azure Data Lake Storage (ADLS) account attached to your Databricks Workspace….

Prepare VSC Local Databricks Development

Posted on January 5, 2023January 2, 2023 by slowder

Last time, we walked through how to perform analysis on Databricks using Visual Studio Code (VSC). This time, we will set up a local solution in VSC that will let us build out our data engineering solutions locally. That way, we don’t have to pay for development and testing time. We’d only pay for Databricks,…

Get Started with Databricks in VSCode

Posted on January 3, 2023December 30, 2022 by slowder

You’ve just received a new dataset, and you have to analyze it to prepare for building out the d ta ingestion pipeline. But first, we’ll need to create a cluster to run our analysis. Let’s run through a simple data analysis exercise using Databricks and Visual Studio Code (VSC). Create a cluster from the Web…

Connecting Visual Studio Code to Databricks

Posted on December 29, 2022December 29, 2022 by slowder

After you have your Databricks workspace, it’s time to set up your IDE. Head over to https://code.visualstudio.com/ to download the version for your operating system. It’s available for Windows, Mac, or Linux. During my most recent Databricks presentation, I was asked to point out that Visual Studio Code (VSC) is separate from Visual Studio. It…

Provisioning Databricks

Posted on December 27, 2022December 27, 2022 by slowder

Now that you’ve had an introduction let’s get started exploring Databricks. Head to https://community.cloud.databricks.com and click the sign up link at the bottom. The community edition is a completely free option. Fill in your contact information. It may help to use a ‘+’ email address to sign up; that way, you can later sign up…