Yeah, I used a pipe in my title. That’s because indexes can help some things, but hurt others. The brief of it is indexing helps reads, but comes at a cost: your inserts, updates and deletes are going to take longer. That’s because as you change your data, you’re going to have to update the index too. Let’s dive into a scenario I was working on a few weeks ago.
Our “Bad” Table
So we had this table in our database that supports a high profile product from a company we all know. We’re currently rolling out the largest instance of this product in the world. So we’re pushing a lot more data into the system than any other users of this product. So we expected to find places where the use of the table was greater than the table designers anticipated.
During several of our test deploys we discovered deadlocking on one process. So we began to dig in. We found that this one table had a lock that was being held whenever a user or process was accessing the system during a deploy. This lock was caused by the fact our IO, although on a high end SAN, still couldn’t keep up with our usage (or misuse as we would soon discover).
After a little digging we narrowed the problem down to this one table. whenever we deploy we have to read this table, find the object ids that are being updated during the deploy, and write some changes to this table.
Straightforward, right?
Well, not so much. It turns out that this table has over 100 columns, and 42 non clustered indexes.
I was shocked. How could a product from this company have a table like this? So I started looking at the index statistics.
SELECT o.name Object_Name, i.name Index_name,i.Type_Desc, s.* FROM sys.objects AS o JOIN sys.indexes AS i ON o.object_id = i.object_id LEFT OUTER JOIN sys.dm_db_index_usage_stats AS s ON i.object_id = s.object_id AND i.index_id = s.index_id WHERE o.name ='badTable'
Yeah, we weren’t using most of these indexes. We used the clustered for our bookmark lookups, and a couple of the non-clustered indexes for seeks, but most of them had no reads, seeks, or scans. But they all had remarkable update costs.
So during one of our meetings with the vendor about our deploy problem we disclose our findings. They listen to us patiently and take notes on all we’ve found. They take it back to their engineers and do some research of their own. After a week, they present us with a fix. Add 14 more indexes.
Really!?
We’re holding off on implementing these indexes until we can have another discussion about this issue, but it does bring up something you really must know if you’re going to architect database solutions. You really have to understand that indexes will only speed up your reads. They can change table scans into index seeks. If you add a covering index you can even eliminate lookups. But they come at a cost.
Your writes are going to take longer because you’re also going to have to write to the indexes. So in our case, rather than updating just the table, you’re updating up to 42 non clustered indexes. It depends on what columns you’re changing in the update. You have to update each index that covers the value you changed. This cost can become so high, you end up with deadlocks.
When you start designing indexes, look at the workload. Don’t just throw indexes at a problem. And for all that is good and just in the world, please, take anything the Database Engine Tuning Advisor says and test it before you send it out as a good suggestion. It only looks at the one query you send it, not at your entire workload. The DMV’s will help you look at that entire workload.
—
If you’re struggling with indexing, or anything else in SQL Server, let me know. I’m here to help!