String Manipulation — REPLACE, PATINDEX and Regular Expressions

Finding the exact patter can be like trying to a certain snowflake.

Finding the exact patter can be like trying to a certain snowflake.

In the previous SQL posts, I showed you SUBSTRING, then I showed you REPLACE. Now, with REPLACE, you had to know what characters you wanted to replace.  What if you only knew what characters you wanted to keep?  Well, let me introduce you to PATINDEX.  This is an advanced command that is easy to learn, difficult to master.  I’ll bring this tool out of my tool belt in future posts to provide you more ways to use it.

PATINDEX let’s you define a string pattern.  Many strings you’ll store in your database will have a pattern.  A phone number (in america) has a very distinct pattern 3 digits of area code, 3 digits of prefix, and 4 digits telroot.  You may store this in several different ways, but you’ll always store the same pattern every time, otherwise, how will your users interpret the data your storing.

The example I’m going to use to introduce you to PATINDEX is based on the idea of a setting up a fixed format or pattern for your phone numbers.

But first, I need to include my standard disclaimer:

I would like to point out that you have to be careful when and how often you use the techniques below.  A good rule to keep in mind is text manipulation is slow and painful to a server.  If you can leave the text manipulation to your middle-ware or front end, that would be better.  But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.

The Problem

phone
605-555-2862
(561)555-2700
904-555-5680
N/A
580-555-5371
2815558368
(254)555-8430
336-555-2797
3365557233
592-555-3181/4951
96615551222 Ext. 249
+44 7930 555271

Looking at the examples in the column we know there characters in there other than 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0.  How can we easily remove the non numeric values from this column?  We can use regular expressions.  We can say, if it isn’t 0-9, remove it.

Now, the solution to this problem will use a WHILE loop.  This is something I’m not really going to go into too much detail on, but using a while loop can really eat up processor time and memory.  This is one of those times where this solution should only be used as a last resort, and only during non-peak server usage.  Before implementing this solution, I’d be sure to spend time putting together a fix on the front end, to scrub out these bad characters before they got into my database.  But, let’s come off the soapbox and discuss the solution, shall we?

The Solution

Let’s first look at the syntax for the PATINDEX function.

PATINDEX ('%pattern%', sourceString)

The PATINDEX function will search for a pattern in the SourceString, and return a bigint of where that pattern is found in the sourceString.  If it’s not found, then you get 0.  With this you can also provide a pattern ‘%[^0-9]%’ and it will search for characters that are not numbers.

But that’s only part of the solution.  You then need to couple that with REPLACE, so you can remove anything that is not a number.

--variable to hold the current phone number being updated
DECLARE @phone VARCHAR(20)
--variable to hold the original value (so we can update it)
DECLARE @originalPhone VARCHAR(20)
--if there is a phone number with something other than a number in it
WHILE EXISTS (
        SELECT TOP 1
               phone
        FROM sourceTable (READCOMMITTED)
        WHERE
               PATINDEX('%[^0-9]%', phone) ) > 0)
BEGIN
        --store the original number
        SELECT TOP 1
               @originalPhone = phone
        FROM sourceTable (READCOMMITTED)
        WHERE
               PATINDEX('%[^0-9]%', phone) > 0
        --then copy it to a temporary variable
        SET @phone = @originalPhone
        --spin through the phone number one character at a time
        -- if there is a character other than a number
        WHILE(PATINDEX('%[^0-9]%', @phone) ) > 0
        BEGIN
               --then remove that one character, then continue
         SET @phone = REPLACE(@phone
               , SUBSTRING(@phone, PATINDEX('%[^0-9]%', @phone), 1)
              , '')
        END
      --once all the non numeric characters are gone, update
        UPDATE sourceTable SET
               Phone = @phone
        WHERE
               Phone = @originalPhone
END

If you’re having difficulties with the WHILE loop, search my blog for the entry on WHILE loops.  They’re pretty easy once you get to know them.

If you wanted to get fancy you could make the replace portion of this code a scalar function, and call it in an update step.  It would get rid of the explicit outer loop.  In future versions of SQL this may improve, but for SQL 2000, this is the best I’ve come up with.

There are many more string manipulation techniques to cover, so stay tuned to this same bat-channel!

References

http://msdn.microsoft.com/en-us/library/ms188395.aspx

http://msdn.microsoft.com/en-us/library/ms178642.aspx

Tags: ,

9 Responses to “String Manipulation — REPLACE, PATINDEX and Regular Expressions”

  1. Kulanthaivelu 20050507 at 20:38 #

    I saw the some sql functions. i have very usefull the site. thank u.

  2. nishant kumar 20050507 at 21:38 #

    Hi| Above example is best for beginners.

  3. ramesh 20050507 at 22:38 #

    very useful site

  4. Scott 20130402 at 10:34 #

    I am using SQL Server 2008 R2.

    I found a typo in the above code.

    –store the original number
    SELECT TOP 1
    @originalPhone = phone
    FROM sourceTable (READCOMMITTED)
    WHERE
    PATINDEX(‘%[^0-9]%’, phone)

    Should be:

    –store the original number
    SELECT TOP 1
    @originalPhone = phone
    FROM sourceTable (READCOMMITTED)
    WHERE
    PATINDEX(‘%[^0-9]%’, phone) > 0

    The missing code is the > 0 on the PATINDEX line. Just FYI.

    • Shannon Lowder 20130627 at 09:43 #

      Good catch! I really need to get back on here more often and share. Anything you want to see covered in particular?

  5. Vishal 20140321 at 00:18 #

    it does not work if there is sql character like ‘%’ in input string so what is the solution.
    i used replace but its not feasible solution as how many character replaced is a question.
    Collapse | Copy Code
    declare @phone varchar(50)
    set @phone=’212%@@f’

    select REPLACE(@phone
    , SUBSTRING(@phone, PATINDEX(‘%[^0-9a-zA-Z .()/-]%’, @phone), 1)
    , ”)

    • Shannon Lowder 20140326 at 08:41 #

      Am I right in understanding that you’re looking to remove any characters that are not an alpha, number, space, period, left or right parenthesis, forward slash, or dash?

      If so, you’ll need a loop to fire your replace multiple times, because it’s only going to find the first instance of your PATINDEX. Try this:

      DECLARE @phone VARCHAR(50)
      SET @phone='212%@@f'

      WHILE PATINDEX('%[^0-9a-zA-Z .()/-]%', @phone) > 0
      BEGIN
      SET @phone = REPLACE(
      @phone
      , SUBSTRING(@phone, PATINDEX('%[^0-9a-zA-Z .()/-]%', @phone), 1)
      , '')
      END
      SELECT @phone

Trackbacks/Pingbacks

  1. SQL201-The Syllabus | Shannon Lowder - 20110314

    [...] patindex [...]

  2. Remplazar caracteres | Diario de Ingeniería de @moicartagena - 20130627

    [...] Remplazar caractes [...]

Leave a Reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.