Sooner or later in SQL, you’re going to be asked to change how some text is formatted. Making changes to text is pretty straightforward. First, you need to know that when you’re manipulating text (in SQL it’s stored as a VARCHAR or NVARCHAR), you’re manipulating a string. If you think of the string to the right, then you’ve got a good idea of how the text is stored.
Let’s take the string “Hello world!” (If you’ve ever taken a programming course you’ve heard this explanation.) If you spelled that string on a physical piece of string, one letter at a time… that’s very similar to how it’s stored in SQL. When you want to go changing things around, you’re doing string manipulation.
When you want to take parts of the string, you’re taking a SUBSTRING.
Clear? Awesome! Let’s get to the meat of this lesson.
Before I go into how to go altering strings in SQL, I would like to point out that you have to be careful when and how often you use the techniques below. A good rule to keep in mind is text manipulation is slow and painful to a server. If you can leave the text manipulation to your middle-ware or front end, that would be better. But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.
The Problem
phone |
605-555-2862 (561)555-2700 904-555-5680 N/A 580-555-5371 2815558368 (254)555-8430 336-555-2797 3365557233 592-555-3181/4951 96615551222 Ext. 249 +44 7930 555271 |
Let’s say you have a column in a database that collects phone numbers from a web form and that form doesn’t have any JavaScript parsing the code into the three standard parts of a phone number (area code, prefix, and telroot). This data is being dumped into a leads file and the manager from the sales department comes down and asks you to help split up the leads by area code.
The first thing you need to do in a situation like this is to become aware of what kind of data you’re looking at. Select out the first 100 or so rows and get an idea of what shape the data is in. In this case, you can see there are several types of phone numbers entered.
I’ve limited the example above to just a few of the types of numbers you might find if you allowed users free-form access to storing their phone numbers.
- 3,3,4
- Parenthesis, dashes, and 3,3,4
- Blank, null, or N/A
- Parenthesis, dashes, spaces, and 3,3,4
- 10 numbers
- Extensions
- International numbers
The Solution
So how can you deal with all these situations? Use SUBSTRING.
SUBSTRING (string, startPosition, length)
Using this function you can check out the first three characters of each phone number.
SELECT TOP 100 phone, SUBSTRING(phone, 1, 3) AS [area code] FROM sourceTable (READCOMMITED)
When looking at the results, you will see that this only shows you the first three characters of the phone number. While the results appear to be accurate in some cases, you should immediately see the exception cases that this doesn’t handle. This solution cannot handle cases with parenthesis, international numbers, NULL, blank, or invalid numbers.
To proceed on this problem you’ll need a few more tools in your SQL tool belt. Check out other articles I’ve published on string manipulation. There you will discover how REPLACE, LTRIM, RTRIM, and more techniques for manipulating text strings can help solve this problem.
References
http://msdn.microsoft.com/en-us/library/ms187748.aspx