When you begin designing a database, you start with the tables. You can’t very well start with views or stored procedures, right? What tables would they reference?
When you begin designing tables, one of the first calls you’ll have to make is are you going to build a set of normalized tables or denormalized tables? In order to participate in this part of the conversation you need to know what those terms mean, and how they relate.
Normalization
One of the first things I do when I want to learn a new term is hit Wikipedia. So check out the Database Normalization article. The main points I want you to get are
Normalization is the process of organizing data to minimize redundancy. You don’t want to have to type in the same values multiple times. Let’s say you wanted to add a table of addresses, do you want to have to type out North Carolina over and over, once for each address? Of course not. That’d just be busy work. Besides, some people would type out North Carolina, some would type in north carolina, others would just enter NC.
But if you built your database using the concept of normalization, you’d create one table that listed all the states, and your address table would reference that table, rather than relying on someone to enter the states in a standard way.
In normalization you generally divide large tables into smaller (and less redundant) tables and defining relationships between them. You may hear this described as building a bunch of tall tables. Basically in normalization you’re building tables with more rows and fewer columns. You then define relationships between the tables in order to construct datasets with many attributes.
The end goal of normalization is to optimize for INSERT, UPDATE, and DELETE statements. In my opinion this is what SQL Server excels at… transactional database design. If you have to design a solution that requires more writes than reads, you’re looking for a normalized database.
Denormalization
If you’re looking for more reads than writes, you may want to look at denormalized database solutions. If you consider the supermarket example listed in the denormalization section of the wiki page, then you can see when you want to build a database to report on the historical performance of a supermarket checkout counter, you’d want to build a denormalized set of tables, so you could optimize for reads and analysis of the events that happened.
There is much more to designing denormalized structures, so for now, we’re going to leave the discussion at that. We’ll cover more on denormalized structures and building analysis solutions in a later class (perhaps in SQL 203 or 303).
Case Study
Throughout the 103 series we’re going to build out a database solution based on a Customer Relationship Management (CRM) business. The primary objective for this business is to reach keep in touch with customers. The business can answer questions about products the customer’s have purchased in the past. The business can update contact information on the customers as they move. The business can even sell new products to the customer if the customer chooses to add products. You should also know that this CRM company has over 500 operators, and handles several thousand customer contacts per day.
Based on that much description, we know that we’re going to have to keep up with customers, addresses, phone numbers, products, sales, etc. Since we know that the CRM data will be modified several thousand times per day, we know we need to build our solution with that in mind.
So based on the information I’ve shared with you so far, which type of database are we going to begin designing?
That’s right: Normalized
In my next article, we’re going to jump right in to the design process by modeling some “objects”. If you have any questions so far, please send them in. I’m here to help!