Could my Stored Procedure cause Performance Issues?

Problem originally posted at experts-exchange.

I have 3 major stored procedures, meaning they are the backbone of my application. I built them to be as dynamic as possible, but every 1st of the month the database crashes and our database people blame these 3 stored procedures. The only reason I think they blame them is because they are ran so often they are running at the time the datbase crashes. These stored procedures run about 10,000 times a day.

Here is an example of 1 of the stored procedures.
--temp table
Insert into #TempA
From Work tt
Left Join Error tf
on tf.UserID=tt.UserID
and tf.SubmitDate=tt.TimeDate
Union All
from Error tf
Left Join Work tt
on tf.UserID=tt.UserID
and tf.SubmitDate=tt.TimeDate
tt.UserID is null) BF
Inner Join EmployeeManagement_History A
on A.UserID=BF.UserID
and BF.TimeDate between A.StartDate
and isnull(A.EndDate,GetDate())
WHEN 1 THEN [Employee]
WHEN 2 THEN [Supervisor]
AND BF.TimeDate between @StartDate and @EndDate
-- Moved all the calculations down to here, so speed up the process
Select Distinct
from #TempA A
, B.WorkDescription
, UnitWorkedCummulative = (
SELECT sum(UnitWorked)
A.TimeDate <= B.TimeDate
AND A.WorkDescription = B.WorkDescription)
, ErrorCummulative = (
SELECT sum(TotalErrors)
C.TimeDate <= B.TimeDate
AND C.WorkDescription = B.WorkDescription)
FROM #TempA B ) B
ON (A.TimeDate =B.TimeDate
AND A.WorkDescription = B.WorkDescription )
A.Work is not null
Order by
, A.TimeDate
Drop Table #TempA;

The stored procedures work flawlessly the whole month running 10,000 times a day, but each 1st of the month the database crashes. The fix for the database crash is to run: sp_updateStats Is my stored procedure to blame, if so then why doesn’t it crash on other days instead of the 1st of the month. Like clockwork the 1st of the month the database
crashes. Or is something else going on?

Some people have questioned the use of MAXDOP, indexes on tables,  indexes on #temp tables, the use of #temp tables versus ##temp versus @table variables, but there are two topics I’d like to discuss as possible solutions to this problem. Table hints and
data changes. The data changes that I have seen that most often cause this are batch inserts or updates that cause the indexes to become fragmented and cause queries to slow dramatically.

I’m not going to discuss the obfuscation in the code except to say, sometimes code like this is doing things differently than you think it is. I try to stick with straight join clauses,
rather than joining to subqueries since nesting can make code review difficult. If you’d like to see a post on joins versus subqueries, please let me know!

Back to the problem. First, let’s talk about table hints. I’d like to refer you to The
official Reference
material on table hints. There are many more hints than I’m going to cover here.

Whenever you build queries you have to keep in mind that your tables are in a
multi-user environment, you will rarely have exclusive access to your table. Every time a query is run against a table a lock is established. Many of these locks are insignificant and won’t cause you any problems. But sometimes, possibly in this case, one
query could be accessing the table in such a way this procedure will run extremely slowly or fail all together.

There are ways around that problem. Let’s assume you have control over only the
code posted above. The first question I would ask is if a query were running at the same time that was inserting records into the work and error tables would you want this query to return those records, or would it be better to only return those records that have been committed? What I’m trying to discover is do you want to read data that could be removed if the insert failed?

Most of the time you would not want to see that data, only the data that had completed its transaction, and was considered “complete”. If so, then if you add (READCOMMITTED) after “Work tt” and “Error tf”. This will tell the query optimizer that you want to only deal with those records that have been committed.

You may also want to consider adding READPAST to the hint, it will tell the
optimizer as soon as you hit a lock, go on past it.

If you would want to see data, even if it has not yet been committed, then the hint you would want to use is READUNCOMMITTED or NOLOCK. This will allow you to receive results that are still being processed, and have been marked with some lock. The good news is you can get results more quickly, and your query shouldn’t block any other query running at that time, unfortunately you may make some decision based on the results of this query. If you did and an insert were rolled back after your decision was made, you may find the case you were looking for no longer exists. Proceed with caution When using NOLOCKs.

There are other hints, I’ll try to cover those in later posts. For now, let’s move on to problems you can have with large batches of records being changed.

If you have a table that records transactional information, there will be statistics that help the optimizer find records in that table. The problem with that is, if you were to load 10,000 records per hour into that table those statistics would grow out of date very quickly. This problem would be compounded if you were also deleting say 1,000 records per hour.

If you notice a table is accessed a lot, you may want to consider making  the last step of an large batch insert a re-index and re-stat step. This quite often is overkill, but in some situations quite called for. You may also want to consider reading up on data-sharding or partitioning… but those are topics I’ll leave until another time.

Hopefully this will give you two additional tools you can use when you come across this problem in the future. As always, if you have SQL questions, feel free to
email me, or add your questions to the comments!

Leave a Reply

Your email address will not be published. Required fields are marked *