Well-Defined Repetition - Page 10
March 9, 2001
If you want to be more precise about how many times a character
or groups of characters might be repeated, you can specify the
maximum and minimum number of repeats in curly brackets. '2 or 3
spaces' can be written as follows:
> perl matchtest.plx
Enter some text to find: \s{2,3}
'\s{2,3}' was not found.
>
So we have no doubled or trebled spaces in our string. Notice how
we construct that - the minimum, a comma, and the maximum, all
inside braces. Omitting either the maximum or the minimum
signifies 'or more' and 'or fewer' respectively. For example,
{2,} denotes '2 or more', while {,3} is
'3 or fewer'. In these cases, the same warnings apply as for the
star operator.
Finally, you can specify exactly how many things are to be in a
row by simply putting that number inside the curly brackets.
Here's the five-letter-word example tidied up a little:
> perl matchtest.plx
Enter some text to find: \b\w{5}\b
'\b\w{5}\b' was not found.
>
Summary Table
To refresh your memory, here are the various metacharacters we've
seen so far:
| Metacharacter
|
Meaning |
[abc]
|
any one of the characters a, b,
or c. |
[^abc]
|
any one character other than a, b,
or c. |
[a-z]
|
any one ASCII character between a and
z.
|
\d \D
|
a digit; a non-digit. |
\w \W
|
a 'word' character; a non-'word' character. |
\s \S
|
a whitespace character; a non-whitespace character. |
\b
|
the boundary between a \w character and a \W
character. |
.
|
any character (apart from a new line). |
(abc)
|
the phrase 'abc' as a group. |
?
|
preceding character or group may be present 0 or 1 times. |
+
|
preceding character or group is present 1 or more times. |
*
|
preceding character or group may be present 0 or more times. |
{x,y}
|
preceding character or group is present between x and y times. |
{,y}
|
preceding character or group is present at most y times. |
{x,}
|
preceding character or group is present at least x times. |
{x}
|
preceding character or group is present x times. |
Backreferences
What if we want to know what a certain regular expression
matched? It was easy when we were matching literal strings: we
knew that 'Case' was going to match those four letters and
nothing else. But now, what matches? If we have
/\w{3}/, which three word characters are getting
matched?
Perl has a series of special variables in which it stores
anything that's matched with a group in parentheses. Each time it
sees a set of parentheses, it copies the matched text inside into
a numbered variable - the first matched group goes in
$1, the second group in $2, and so on.
By looking at these variables, which we call the
backreference variables, we can see what triggered various
parts of our match, and we can also extract portions of the data
for later use.
Repetition - Page 9
Beginning Perl
Try It Out - Page 11
|