|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
This document describes the theory behind regular expressions (RE) as well as their practical usage. Table of Content
What are Regular Expressions?Regular expressions are a way to search for substrings ("matches") in strings. This is done by searching with "patterns" through the string. ExampleYou probably know the ' When using a pattern like "
But it will not find files like
This is exactly the way REs work. While the ' Why would you use Regular ExpressionsExample usages could be:
Standard Regular Expression OperationsBasically you can do the following operations on a string with REs:
Where to use Regular ExpressionsREs are one of the foundations of the Perl programming language and therefore built-into the compiler itself. There are many other languages that can use REs by using third-party libraries or add ons. Following are some other languages for which RE libraries exist:
Although being slightly different to use (because of the design of the languages), all are quite similar to Perl's implementation of REs. Therefore I use Perl code snippets in this document to describe examples. The RE syntax is not completely standardized. AFAIK there is a POSIX version of RE, defining the complete syntax. Perl's RE implementation is much more flexible than POSIX's, so having a library that is Perl-compatible as much as can be is normally what you want. The syntax itself can be sometimes different between the languages. I.e. one library implements only a subset of the POSIX-RE syntax, while other implements nearly all of the Perl-RE syntax. How to use Regular Expressions from PerlAs stated, I do all examples in Perl. Therefore here a quick overview over the most common methods on how to execute a regular expression in Perl. Search a string for a patternexpression =~ m/pattern/[switches] Searches the string For example $test = "this is just one test"; $test =~ m/(o.e)/ Would return " Replace a substringexpression =~ s/pattern/new text/[switches] Searches the string " For example $test = "this is just one test"; $test =~ s/one/my/ Would replace " Regular Expression's Syntax BasicsThis chapter is not trying to be a reference of all characters that can be used inside a RE pattern. There are other documents that do this quite well. Instead the basic meta characters are shown and explained. Meta characters that you want to use literal must be
escaped with the backslash, just as in C++ strings. E.g. to use the square
bracket Important Meta CharactersFollowing are the most important meta charachters, as from chapter "Regular Expression Syntax" on MSDN:
Character ClassesA character class is a group of one or multiple characters. These are written
in square brackets ' In other words a character class means "match any single character of that class". There are the opposite of character classes, too, the negotiated character
classes. Which means "match any single character that is not in
the class". E.g. ' See more examples at "Character Matching" on MSDN. QuantifiersIf you don't know exactly how many characters are coming, you can use
quantifiers to specify the number of times a character can occur. E.g. you can say " More Quantifiers, as from chapter "Quantifiers" on MSDN are
GreedyAn important fact about quantifiers is that the ' $test = "hello out there, how are you"; $test =~ m/h.*o/ means "find a ' You can explicitly say that a quantifier should be "ungreedy" by
appending a ' $test = "hello out there, how are you"; $test =~ m/h.*?o/ Would actually find " AnchorsLine Beginnings and Line EndsTo check for the beginning or the end of a line (or string), you use the meta
characters Word BoundariesThe meta characters ' $test =~ m/out/ would match not only match " $test =~ m/\bout/ Now, it only finds " Alternation and GroupingAlternation allows use of the ' Parantheses ifself are used for "capturing" substring for later
usage and store them in the Perl-built-in variables E.g. $test = "I like apples a lot"; $test =~ m/like (apples|pines|bananas)/ Will match, since " Backreferences, Lookahead- and Lookbehind-ConditionsBackreferencesOne of the most important features of REs is the ability to store
("capture") a part of
the matches substring for later reuse. This is done by placing the substring in
parantheses If you don't want to capture a substring but need parantheses to group the
substring, use the ' E.g. $test = "Today is monday the 18th."; $test =~ m/([0-9]+)th/ will store " $test = "Today is monday the 18th."; $test =~ m/[0-9]+th/ will store nothing in $test = "Today is monday the 18th."; $test =~ m/(?:[0-9]+)th/ will store nothing in $test = "Today is monday the 18th."; $test =~ s/ the ([0-9]+)th/, and the day is $1/ will result in You can also backreferences inside the query to previously found substrings by using $test = "the house is is big"; $test =~ s/\b(\S+)\b(\s+\1\b)+/$1/ Will result in Lookahead- and Lookbehind-ConditionsSometimes it is necessary to say "match this, but only if it is not preceded
by that" or "match this, but only if it is not followed by
that". When just single charactes are concerned, you can use the negotiated
character class But when it comes to more than just a single character, you need to use the so called lookahead-condition or the lookbehind-condition. There are four possibly types:
Examples: $test = "HTML is a document description-language and not a programming-language"; $test =~ m/(?<=description-)language/ Will match the first " $test = "HTML is a document description-language and not a programming-language"; $test =~ m/(?<!description-)language/ Will match the second " More ExamplesHere are some more real-world examples from the last chapter of the RE section of [3]. These more advanceds REs can be use as a starting point for your own REs, or just as detailed examples you can look at in more detail. Swap the first two words: s/(\S+)(\s+)(\S+)/$3$2$1/ Find name=value pairs: m/(\w+)\s*=\s*(.*?)\s*$/ Now name is in Read a date in the form YYYY-MM-DD: m/(\d{4})-(\d\d)-(\d\d)/
Now YYYY is in Remove the leading path from a filename: s/^.*\///
SummaryThis document tried to give you a brief introduction overview of what REs are and where and how to use them. Also being straightforward to get into using REs, there are quite a lot of traps and errors you probably will meet in "real life". It is highly recommended to refer to additional literature and examples to understand and use the full power of REs. Especially [4] is a very valuable (but somewhat fastidiously) resource you should read. Topics that were not covered in this document include:
For these and many others, please take a look at the resources below. Literature Resources
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||