|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionRegular Expressions are probably the best tool for text parsing. Using Regular Expressions GREATLY simplifies anything for which you would normally use
Note to users of .NET 1.0: This DLL will not work on SQL Servers that are running with the /3GB switch in the boot.ini file. You'll probably get an error like: Cannot load the DLL xp_regex.dll, or one of the DLLs it references.
Reason: 1114(A dynamic link library (DLL) initialization routine failed.).
This limitation has been removed in .NET 1.1. Also note: There are some issues associated with running managed code in-process with SQL Server. Check out Clemens Vasters' blog entry here. Also, check out this MSKB article which discusses extended stored procedures written using .NET. I can tell you that I have been running this on several servers since I wrote it and have not had any problems. The servers I run it on are used more for reporting than OLTP-style transactions. I'm not pushing either the CPU or RAM to the limit and am not using fibers. This may account for the fact that I have not encountered any issues. OverviewThere are four Extended Stored Procedures in the DLL:
XP_REGEX_MATCHSyntax:
All parameters are either
XP_REGEX_FORMATSyntax:
The regex DECLARE @out VARCHAR(50)
EXEC xp_regex_format '(310)555-1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
'($1) $2-$3', @out OUTPUT
PRINT @out
EXEC xp_regex_format '310.555.1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
'($1) $2-$3', @out OUTPUT
PRINT @out
EXEC xp_regex_format ' 310!555 hey! 1212',
'[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
'($1) $2-$3', @out OUTPUT
PRINT @out
EXEC xp_regex_format ' hello, ( 310 ) 555.1212 is my phone number. Thank you.',
'[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})', '($1) $2-$3', @out OUTPUT
PRINT @out
prints out: (310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212
XP_REGEX_SPLITSyntax:
@input is the text to parse.@regex is a regular expression that matches the delimiter.@column_number: We’re basically doing a "text-to-columns" here, so @column_number lets you specify which of the resulting columns should be passed back in the @result parameter.@result is an output parameter that will hold the formatted results
This function splits text data on some sort of delimiter (comma, pipe, whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect. For example, take this line as your source data: one ,two|three : four
In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: For example: DECLARE @out VARCHAR(8000)
EXEC xp_regex_split 'one ,two|three : four', '\s*[,|:]\s*', 1, @out OUTPUT
PRINT @out
EXEC xp_regex_split 'one ,two|three : four', '\s*[,|:]\s*', 2, @out OUTPUT
PRINT @out
EXEC xp_regex_split 'one ,two|three : four', '\s*[,|:]\s*', 3, @out OUTPUT
PRINT @out
EXEC xp_regex_split 'one ,two|three : four', '\s*[,|:]\s*', 4, @out OUTPUT
PRINT @out
prints out: one
two
three
four
Please note that in the case above, there is no performance penalty for running the same split more than once. XP_REGEX_REPLACESyntax:
For example, this is how you would remove all whitespace from an input string: DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace ' one two three four ', '\s*', '', @out OUTPUT
PRINT '[' + @out + ']'
prints out: [onetwothreefour] To replace all numbers (regardless of length) with "###": DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace '12345 is less than 99999, but not 1, 12, or 123',
'\d+', '###', @out OUTPUT
PRINT @out
prints out: ### is less than ###, but not ###, ###, or ### FN_XP_REGEX_MATCH, FN_XP_REGEX_FORMAT, FN_XP_REGEX_SPLIT and FN_XP_REGEX_REPLACEThese are user-defined functions that wrap the stored procedures. This way you can use the function as part of a USE pubs
GO
SELECT dbo.fn_regex_format(
phone_number,
'[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
'$1 $2 $3'
)
FROM
authors
This would format every phone number in the "authors" table. Please note that Installation
Performance CountersI've created a set of Performance Counters (the Performance Object is called " If you want to avoid the small overhead of the performance counters, just comment out this line in PerfCounters.h and rebuild: #define XP_REGEX_USING_PERF_COUNTERS
MiscComments/corrections/additions are welcome. Please let me know if you find this useful! Thanks! History
|
||||||||||||||||||||||