Click here to Skip to main content
Click here to Skip to main content

xp_regex: Regular Expressions in SQL Server 2000

By , 6 Oct 2003
Rate this:
Please Sign up or sign in to vote.

Introduction

Regular Expressions are probably the best tool for text parsing. Using Regular Expressions GREATLY simplifies anything for which you would normally use CHARINDEX, PATINDEX, SUBSTRING, STUFF, REPLACE, etc.

xp_regex is an Extended Stored Procedure for SQL Server 2000 written in Managed C++ that lets you use Regular Expressions from T-SQL. In case you’re skeptical about the performance when mixing non-.NET code (in this case SQL Server) and .NET code in the same process, don't be. I've written a non-.NET version and the performance is about the same (although I really haven't done any exhaustive testing). If you have other problems that are preventing you from using the .NET Framework on your SQL Server, you can get the non-.NET version, xp_pcre, which is also posted on Code Project.

Note to users of .NET 1.0: This DLL will not work on SQL Servers that are running with the /3GB switch in the boot.ini file. You'll probably get an error like:

Cannot load the DLL xp_regex.dll, or one of the DLLs it references.
Reason: 1114(A dynamic link library (DLL) initialization routine failed.).

This limitation has been removed in .NET 1.1.

Also note: There are some issues associated with running managed code in-process with SQL Server. Check out Clemens Vasters' blog entry here. Also, check out this MSKB article which discusses extended stored procedures written using .NET.

I can tell you that I have been running this on several servers since I wrote it and have not had any problems. The servers I run it on are used more for reporting than OLTP-style transactions. I'm not pushing either the CPU or RAM to the limit and am not using fibers. This may account for the fact that I have not encountered any issues.

Overview

There are four Extended Stored Procedures in the DLL:

  • xp_regex_match
  • xp_regex_format
  • xp_regex_split
  • xp_regex_replace

XP_REGEX_MATCH

Syntax:

xp_regex_match @input, @regex, @result OUTPUT

@input is the text to match against.
@regex is the regular expression to match with.
@result is an output parameter that will hold either 'Matched', 'Failed' or NULL in the case of an invalid regex.

All parameters are either (N)VARCHAR or (N)CHAR of any ength. ((N)TEXT might work too, haven't tried it.)

xp_regex_match is used to check an input string against a regular expression. It will return either 'Matched' or 'Failed'. If the regular expression was not valid, it will return NULL.

XP_REGEX_FORMAT

Syntax:

xp_regex_format @input, @regex, @format_string, @result OUTPUT

@input is the text to parse.
@regex is the regular expression to match.
@format_string is used to format the results.
@result is an output parameter that will hold the formatted results.

xp_regex_format is used to parse an input string and format the results. Probably the best example to demonstrate is by doing some telephone number parsing. I’ll assume you know the Regular Expression syntax for this documentation.

The regex [^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4}) will parse just about any phone-number-like string you throw at it. For instance, this code:

DECLARE @out VARCHAR(50)

EXEC xp_regex_format '(310)555-1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format '310.555.1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format ' 310!555 hey! 1212',
    '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format ' hello, ( 310 ) 555.1212 is my phone number. Thank you.',
    '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})', '($1) $2-$3', @out OUTPUT
PRINT @out

prints out:

(310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212

XP_REGEX_SPLIT

Syntax:

xp_regex_split @input, @regex, @column_number, @result OUTPUT

@input is the text to parse.
@regex is a regular expression that matches the delimiter.
@column_number: We’re basically doing a "text-to-columns" here, so @column_number lets you specify which of the resulting columns should be passed back in the @result parameter.
@result is an output parameter that will hold the formatted results

@input, @regex and @result are either (N)VARCHAR or (N)CHAR of any length.

@column_number is an INT. Columns are numbered starting at 1.

This function splits text data on some sort of delimiter (comma, pipe, whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect.

For example, take this line as your source data:

one ,two|three : four

In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: \s*[,|:]\s*.

For example:

DECLARE @out VARCHAR(8000)

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 1, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 2, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 3, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 4, @out OUTPUT
PRINT @out

prints out:

one  
two
three  
four

Please note that in the case above, there is no performance penalty for running the same split more than once. xp_regex_split caches both the input string and the regular expression, so calling it multiple times while changing only the @column_number parameter is perfectly fine. The actual split is only done the first time and the rest is pulled from cache.

XP_REGEX_REPLACE

Syntax:

xp_regex_replace @input, @regex, @replacement, @result OUTPUT

@input is the text to parse.
@regex is a regular expression.
@replacement will replace every piece of text that matches the regex
@result is an output parameter that will hold the results.

For example, this is how you would remove all whitespace from an input string:

DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace ' one  two    three four ', '\s*', '', @out OUTPUT
PRINT '[' + @out + ']'

prints out:

[onetwothreefour]

To replace all numbers (regardless of length) with "###":

DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace '12345 is less than 99999, but not 1, 12, or 123', 
                      '\d+', '###', @out OUTPUT
PRINT @out

prints out:

### is less than ###, but not ###, ###, or ###

FN_XP_REGEX_MATCH, FN_XP_REGEX_FORMAT, FN_XP_REGEX_SPLIT and FN_XP_REGEX_REPLACE

These are user-defined functions that wrap the stored procedures. This way you can use the function as part of a SELECT list or a WHERE clause:

USE pubs
GO

SELECT dbo.fn_regex_format(
  phone_number,
  '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
  '$1 $2 $3'
  )

FROM
  authors

This would format every phone number in the "authors" table.

Please note that FN_XP_REGEX_MATCH, returns a 1 or 0 instead of 'Matched' or 'Failed' (like XP_REGEX_MATCH does). You can certainly change this back in the INSTALL.SQL code. Also note, you'll need to create the UDFs in each database that you call them from.

Installation

  1. Copy xp_regex.dll to your Program Files\Microsoft SQL Server\MSSQL\bin folder.
  2. Run the SQL script INSTALL.SQL. This will register the procedures and create the user-defined functions.

Performance Counters

I've created a set of Performance Counters (the Performance Object is called "xp_regex"). You can use the Windows 2000 System Monitor to see various statistics about xp_regex while it is running.

If you want to avoid the small overhead of the performance counters, just comment out this line in PerfCounters.h and rebuild:

#define XP_REGEX_USING_PERF_COUNTERS

Misc

Comments/corrections/additions are welcome. Please let me know if you find this useful! Thanks!

History

  • 06 Oct 03 - Included a version of xp_regex.dll that was compiled without the performance counters. Added the warning about running managed code in-process with SQL Server.
  • 19 Jul 03 - Complete refactoring to a more object-oriented design. Also added support for Unicode parameters. Changed the OUTPUT parameter to Unicode. Greatly improved parameter checking and error handling.
  • 07 May 03 - Rebuilt using Visual Studio .NET 2003 and .NET Framework 1.1
  • 29 Mar 03 - Updated download

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Dan Farino
Web Developer
United States United States
No Biography provided

Comments and Discussions

 
Questionmssql 2008 express x64 PinmemberMember 267500616-Feb-12 21:28 
Questionxp_regex in SQL2008 x64 PinmemberMember 115893521-Mar-11 14:00 
GeneralCould not load the DLL xp_pcre.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.). PinmemberVenkat1.V18-Aug-09 21:40 
QuestionUsing Split PinmemberHiMik200314-Aug-09 2:03 
GeneralBroken Link PinmemberTim_Higgison6-Aug-09 21:29 
Generalxp_regex_replace PinmemberCalvin Wrightsman28-Jan-09 8:27 
GeneralAwesome, I thought this was built in to SQL 2005 Pinmemberphilmee9524-Sep-08 14:53 
QuestionCould not load the DLL....SQL 2005! PinmemberMember 177309016-Sep-08 9:14 
AnswerRe: Could not load the DLL....SQL 2005! Pinmemberphilmee9524-Sep-08 14:50 
QuestionRegEx to insert data every so many characters? PinmemberMember 56973910-Jun-08 3:15 
QuestionIs there a 64bit version of the xp_pcre.dll (regular expressions in SqlServer2005)? PinmemberJaspers Yvo14-May-08 6:07 
QuestionSQL Message Field Type Pinmemberm_r4-Jun-07 8:47 
QuestionWhat REGX format dose this code follow? Pinmember81u3ph14m318-May-06 21:04 
AnswerRe: What REGX format dose this code follow? PinmemberGarth J Lancaster18-May-06 21:16 
QuestionShould xp_regex work with .SQL Server 2005 Pinmembernetedk23-Mar-06 8:39 
GeneralRe: Should xp_regex work with .SQL Server 2005 Pinmemberajwilson229-Oct-06 21:10 
GeneralRe: Should xp_regex work with .SQL Server 2005 Pinmemberdombrowsky10-Oct-06 6:54 
QuestionRoman numerals PinmemberVeroVige17-Mar-06 5:14 
AnswerRe: Roman numerals PinmemberDan Farino17-Mar-06 5:25 
Generalerror message.... PinmemberMathew Arakal10-May-05 15:03 
GeneralRe: error message.... PinmemberDan Farino10-May-05 15:08 
GeneralFunctions returns 0 PinsussMaxime Ducharme14-Feb-05 9:14 
GeneralRe: Functions returns 0 PinmemberDan Farino14-Feb-05 9:24 
GeneralRe: Functions returns 0 PinsussMaxime Ducharme14-Feb-05 9:40 
GeneralRe: Functions returns 0 PinsussMaxime Ducharme15-Feb-05 9:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140415.2 | Last Updated 7 Oct 2003
Article Copyright 2002 by Dan Farino
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid