Referential Integrity – shannonlowder.com

OK, I’ve talked about Primary Keys and Foreign Keys. Now let’s talk about Referential Integrity. Basically this just means that any Foreign Key value in one table has to have that same value defined in the table where that Foreign Key is the Primary Key. Let’s look at two tables, and try to explain this more clearly.

CREATE TABLE person (
    personID INT IDENTITY(1,1)
  , firstName VARCHAR(50)
  , lastName VACHAR(50)
  , CONSTRAINT PK_person_personID PRIMARY KEY CLUSTERED (
      personID 
    )
) 

CREATE TABLE address (
    addressID INT IDENTITY(1,1)
  , personID INT
  , streetAddress VARCHAR(100)
  , city VARCHAR(50)
  , state CHAR(2)
  , zip VARCHAR(10)
  , CONSTRAINT PK_address_addressID PRIMARY KEY CLUSTERED (
      addressID
    )
)
ALTER TABLE address  WITH CHECK ADD  CONSTRAINT FK_address_person_personID FOREIGN KEY(personID)
REFERENCES Person (personID)
GO

In our example we have a person table, with the Primary Key personID. We also have an address table with a Foreign Key of personID. Notice the ALTER statement at the end. This is what is setting up our referential integrity. This CONSTRAINT requires that any value I choose for personID in address, must already be defined in person. Otherwise we get a CONSTRAINT violation.

The term Cascade Referential Integrity comes into play when you have several tables each referring to another. The check must cascade down, like a waterfall. Each table affecting the next that depends on it. When you start setting up references like this, you’re going to make it more difficult on your programmers. You’re going to require that they do things in a certain way… unless you provide them a view that uses an INSTEAD OF trigger… check out my lessons on that for more details.

This integrity check for your data can make it more difficult on your programmers, but it also make sure they do their job properly. Dropping these constraints is never the answer. Which would you prefer, lots of dirty data that has duplicates and causes your computations to be off by double, triple, or even cause you to multiply or divide by zero?

Or would you rather your programmers take a little more time to understand the relationships in the data for the database they’re developing?

It’s a balancing act, developing relationships in the data versus the speed to build on top of that data… but I think you’ll find some relationships will benefit you far more than no relationships.

Talk to me. Let me know what you’re going through. I can help you build stronger database solutions, while working with your developers. I’m here to help!

Leave a Reply Cancel reply