Skip to content

shannonlowder.com

Menu
  • About
  • Biml Interrogator Demo
  • Latest Posts
Menu

Foreign Keys and Witchcraft

Posted on December 7, 2010February 9, 2011 by slowder
New fangled way to burn witches
New fangled way to burn witches

I’ve worked with hundreds databases in my career.  I could probably design a handful of them from memory if asked.  I have notes to remind me of lessons learned from most of them.  When I start looking through that list one of the things I most often reference is the importance of Foreign keys.  Why is that?

Because most of the time, all of the control over referential integrity has been moved up to the business logic or application layer.  Developers will give me a handful of reasons why they didn’t add them to the database.  Unfortunately many of the reasons can be proven false, once you start considering all the costs of not implementing them.

I’m creating this post, so I can refer to it later when developers start calling me a witch, and asking if they can burn me.

Myth #1: I can do updates quicker without foreign keys

This is probably the most common point I have to argue against.  Developers think since they have to update the “parent” table first before they can update the “child” table, it will take more time.  They usually fail to see the queries they have to write to make sure that ID exists in the “parent” table for every insert/update query they want to write.

Instead of having to write that validation query each time, why not let the database server handle that for you?  MS SQL is a database management system.  Let it manage those keys for you.  Yes, you’ll have to trap an error in your code that handles the case where that primary key doesn’t already exist, but that’s what you’re doing now, more or less.

Myth #2: I won’t be able to import my files into the database with foreign keys

This is the second most common reason I receive from programmers on why they don’t want foreign keys.  They want to read a file from another system (or client) directly into their dbms without considering the fact the data could be corrupt in some way.  Instead they’ll import the data directly into their production tables, then try to clean it up.

How many of you built an SSIS package or DTS package to read data into your database, and got it right the first time?  How many of those packages never received a bad file, a corrupt file?

What?  None of you?

Of course you’ve had to handle errors on imports.  That’s why you shouldn’t import directly to your production tables.  Read the data into a workspace ( a disposable table).  Read it in raw.  Check for errors on that table.   Once the data passes all your validation checks, then import it to your system.

I think perhaps the goal should be to never import bad data into the database.  Catch it before it becomes an error.

Myth #3:Foreign key constraints cost you in performance

Another one I hear is “We don’t use foreign keys because they’ll cause the database to be slower.”  And then I check the tables, and find an indexes covering every column (several times over).

Those indexes are costing you more than the foreign key constraint ever could.  Every insert is having to be entered multiple times into your keys…that you’re using in your application layer to test data integrity.

If you’re having slow downs, you’ll often find the solution in letting the server do more for you, not less.  If you limit your server by not letting it know there is a key relationship, the optimizer can’t take advantage of that knowledge.

Let’s take a quick look at an example of how a foreign key will actually help performance.  Let’s do some counts.

select count(*) from table1
--659687, and it took logical reads 642
select count(*) from table2
--15204678, and it took logical reads 35574

Without a foreign key constraint between these tables, if we want to check the number of entries in table2 that match table1, we have to do a join.  And we’ll have to include that join in every query we write, since the integrity cannot be confirmed without it.

select count(*)
from table2 
INNER JOIN table1 
 on table1.sharedKey =table2.sharedKey
--14974478
/*
Table 'table1'. Scan count 9, logical reads 705...
Table 'table2'. Scan count 9, logical reads 35526...
*/

Now, if I add the foreign key constraint, we only have to count the records in table 2, since the server is now maintaining the integrity between the tables.

select count(*) 
from table2 
--14974478, scan count9, logical reads 35574

Looks to me that 35574 is less than 36231 (705 + 35526).  Smaller read counts are better than larger ones.  Looks like adding a key made the query faster, rather than slower.  Not even to mention that we can now improve the developers coding speed, since he or she no longer has to do a join to table1 each time.

Fact #1: It will be difficult to maintain consistent validation checks across all insert/update queries.

Since you don’t have the server maintaining the relationship between those tables, every time you write to the those tables, you’re going to have to do the validation yourself.  That means that you will have to hope each developer does it the same way…or at least hope that they do it well enough to prevent bad data from getting into the tables.  I’ve worked with marines before (OOOH-RAAA!), and they beat it into my head: “Hope is not a course of action!”

I’m sure I’ll find more notes on why foreign keys are not witchcraft.  When I find them, I’ll update this post.  If you have any reasons why developers should implement foreign keys, let me know!  I want this article to be more complete.  If you have arguments against it, I’ll listen.  There may be some legitimate reasons to avoid them on occasion, but those I’ve seen are exceptions, not the rules.

Now, about those villagers with pitchforks… I think I’ll cast magic missile!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • A New File Interrogator
  • Using Generative AI in Data Engineering
  • Getting started with Microsoft Fabric
  • Docker-based Spark
  • Network Infrastructure Updates

Recent Comments

  1. slowder on Data Engineering for Databricks
  2. Alex Ott on Data Engineering for Databricks

Archives

  • July 2023
  • June 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • October 2018
  • August 2018
  • May 2018
  • February 2018
  • January 2018
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • June 2017
  • March 2017
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • August 2013
  • July 2013
  • June 2013
  • February 2013
  • January 2013
  • August 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • November 2004
  • September 2004
  • August 2004
  • July 2004
  • April 2004
  • March 2004
  • June 2002

Categories

  • Career Development
  • Data Engineering
  • Data Science
  • Infrastructure
  • Microsoft SQL
  • Modern Data Estate
  • Personal
  • Random Technology
  • uncategorized
© 2025 shannonlowder.com | Powered by Minimalist Blog WordPress Theme