Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Reoccurring Character Classes - Page 7

February 23, 2001

Some character classes are going to come up again and again: the digits, the letters, and the various types of whitespace. Perl provides us with some neat shortcuts for these. Here are the most common ones, and what they represent:

Shortcut Expansion Description
\d [0-9] Digits 0 to 9.
\w [0-9A-Za-z_] A 'word' character allowable in a Perl variable name.
\s [ \t\n\r] A whitespace character that is, a space, a tab, a newline or a return.

also, the negative forms of the above:

Shortcut Expansion Description
\D [^0-9] Any non-digit.
\W [^0-9A-Za-z_] A non-'word' character.
\S [^ \t\n\r] A non-blank character.

So, if we wanted to see if there was a five-letter word in the sentence, you might think we could do this:

> perl matchtest.plx
Enter some text to find: \w\w\w\w\w
The text matches the pattern '\w\w\w\w\w'.
>

But that's not right - there are no five-letter words in the sentence! The problem is, we've only asked for five letters in a row, and any word with at least five letters contains five in a row will match that pattern. We actually matched 'wonde', which was the first possible series of five letters in a row. To actually get a five-letter word, we might consider deciding that the word must appear in the middle of the sentence, that is, between two spaces:

> perl matchtest.plx
Enter some text to find: \s\w\w\w\w\w\s
'\s\w\w\w\w\w\s' was not found.
>

Word Boundaries

The problem with that is, when we're looking at text, words aren't always between two spaces. They can be followed by or preceded by punctuation, or appear at the beginning or end of a string, or otherwise next to non-word characters. To help us properly search for words in these cases, Perl provides the special \b metacharacter. The interesting thing about \b is that it doesn't actually match any character in particular. Rather, it matches the point between something that isn't a word character (either \W or one of the ends of the string) and something that is (a word character), hence \b for boundary. So, for example, to look for one-letter words:

> perl matchtest.plx
Enter some text to find: \s\w\s
'\s\w\s' was not found.

> perl matchtest.plx
Enter some text to find: \b\w\b
The text matches the pattern '\b\w\b'.

As the I was preceded by a quotation mark, a space wouldn't match it - but a word boundary does the job. Later, we'll learn how to tell perl how many repetitions of a character or group of characters we want to match without spelling it out directly.

What, then, if we wanted to match anything at all? You might consider something like [\w\W] or [\s\S], for instance. Actually, this is quite a common operation, so Perl provides an easy way of specifying it - a full stop. What about an 'r' followed by two characters - any two characters - and then a 'h'?

> perl matchtest.plx
Enter some text to find: r..h
The text matches the pattern 'r..h'.
>

Is there anything after the full stop?

> perl matchtest.plx
Enter some text to find: \..
'\..' was not found.
>

What's that? One backslashed full stop to mean a full stop, then a plain one to mean 'anything at all'.

Try it out: Rhyming Dictionary - Page 6
Beginning Perl
Posix and Unicode Classes - Page 8


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers