|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Introductionxp_pcre is a follow-up to my extended stored procedure xp_regex. Both allow you to use regular expressions in T-SQL on Microsoft SQL Server 2000. This version was written because xp_regex uses the .NET Framework, which many people were reluctant to install on their SQL Servers. (It turns out they were the smart ones: although installing the .NET Framework on a SQL Server is no cause for concern, I've been informed by several people that hosting the CLR inside the SQL Server process is a Bad Idea™. Please see the warnings on the xp_regex page for more information.) xp_pcre is so named because it uses the "Perl Compatible Regular Expressions" library. This library is available at www.pcre.org. (You don't need to download the PCRE library in order to use xp_pcre. The library is statically linked.) OverviewThere are six extended stored procedures in the DLL:
The parameters of all of these procedures can be If any required parameters are 1. xp_pcre_matchSyntax:EXEC master.dbo.xp_pcre_match @input, @regex, @result OUTPUT
For example, this will determine whether the input string contains at least two consecutive digits: DECLARE @out CHAR(1)
EXEC master.dbo.xp_pcre_match 'abc123xyz', '\d{2,}', @out OUTPUT
PRINT @out
prints out: 1
This one will determine whether the input string is entirely comprised of at least two consecutive digits: DECLARE @out CHAR(1)
EXEC master.dbo.xp_pcre_match 'abc123xyz', '^\d{2,}$', @out OUTPUT
PRINT @out
prints out: 0
2. xp_pcre_match_countSyntax:EXEC master.dbo.xp_pcre_match_count @input, @regex, @result OUTPUT
For example, this will determine how many times a separate series of numbers (of any length) appears in the input: DECLARE @out VARCHAR(20)
EXEC master.dbo.xp_pcre_match_count '123abc4567xyz', '\d+', @out OUTPUT
PRINT @out
prints out: 2
3. xp_pcre_replaceSyntax:EXEC master.dbo.xp_pcre_replace @input, @regex, @replacement, @result OUTPUT
For example, this is how you would remove all white space from an input string: DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace 'one two three four ', '\s+', '', @out OUTPUT
PRINT '[' + @out + ']'
prints out: [onetwothreefour]
To replace all numbers (regardless of length) with "###": DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
'12345 is less than 99999, but not 1, 12, or 123',
'\d+',
'###',
@out OUTPUT
PRINT @out
prints out: ### is less than ###, but not ###, ###, or ###
Capturing parentheses is also supported. You can then use the captured text in your replacement string by using the variables $1, $2, $3, etc. For example: DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
'one two three four five six seven',
'(\w+) (\w+)',
'$2 $1,',
@out OUTPUT
PRINT @out
prints out: two one, four three, six five, seven
If you need to include a literal $ in your replacement string, escape it with a \. Also, if your replacement variable needs to be followed immediately by a digit, you'll need to put the variable number in braces. DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
'75, 85, 95',
'(\d+)',
'\$${1}00',
@out OUTPUT
PRINT @out
prints out: $7500, $8500, $9500
4. xp_pcre_formatSyntax:EXEC master.dbo.xp_pcre_format @input, @regex, @format, @result OUTPUT
For example, the regex DECLARE @out VARCHAR(100)
DECLARE @regex VARCHAR(50)
SET @regex = '(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})'
DECLARE @format VARCHAR(50)
SET @format = '($1) $2-$3'
EXEC master.dbo.xp_pcre_format
'(310)555-1212',
@regex,
@format,
@out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_format
'310.555.1212',
@regex,
@format,
@out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_format
' 310!555 hey! 1212 hey!',
@regex,
@format,
@out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_format
' hello, ( 310 ) 555.1212 is my phone number. Thank you.',
@regex,
@format,
@out OUTPUT
PRINT @out
prints out: (310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212
The capturing and escaping conventions are the same as with 5. xp_pcre_splitSyntax:EXEC master.dbo.xp_pcre_split @input, @regex, @column_number, @result OUTPUT
Column numbers start at 1. An error will be raised if This function splits text data on some sort of delimiter (comma, pipe, whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect. For example, take this line as your source data: one ,two|three : four
In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: For example: DECLARE @out VARCHAR(8000)
DECLARE @input VARCHAR(50)
SET @input = 'one ,two|three : four'
DECLARE @regex VARCHAR(50)
SET @regex = '\s*[,|:]\s*'
EXEC master.dbo.xp_pcre_split @input, @regex, 1, @out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_split @input, @regex, 2, @out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_split @input, @regex, 3, @out OUTPUT
PRINT @out
EXEC master.dbo.xp_pcre_split @input, @regex, 4, @out OUTPUT
PRINT @out
prints out: one
two
three
four
6. xp_pcre_show_cacheSyntax:EXEC master.dbo.xp_pcre_show_cache
In order to prevent repeated regex recompilation, xp_pcre keeps a cache of the last 50 regular expressions it has processed. (Look at the bottom of RegexCache.h to change this hard-coded value.) 7. fn_pcre_match, fn_pcre_match_count, fn_pcre_replace, fn_pcre_format and fn_pcre_splitThese are user-defined functions that wrap the stored procedures. This way you can use the function as part of a USE pubs
GO
SELECT master.dbo.fn_pcre_format(
phone,
'(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
'($1) $2-$3'
) as formatted_phone
FROM
authors
This would format every phone number in the "authors" table. Please note, you'll either need to create the UDFs in every database that you use them in or remember to always refer to them using their fully-qualified names (i.e., Also note that user-defined functions in SQL Server are not very robust when it comes to error handling. If xp_pcre returns an error, the UDF will suppress it and will return 8. Installation
9. Unicode supportUnfortunately, this version does not support Unicode arguments. Potential solutions include:
10. MiscTo build the code, you'll need to have the Boost libraries installed. You can download them from www.boost.org. Just change the "Additional Include Directories" entry under the project properties in VS.NET. It's under Configuration Properties | C/C++ | General. Comments/corrections/additions are welcome. Feel free to email me...you can find my email address in the header of any of the source files. Thanks! 11. History
| ||||||||||||||||||||