Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Replacing Strings in Multiple Files - Page 4

July 16, 2001

Two things that I love to use in programming as much as possible are abstraction and recursion. Mainly because they save me time. On a more practical note, our previous script isn't going to help us much if we have to do a search and replace on multiple files. And there may be hundreds of files in many different subdirectories. It would still take alot of time to run the script against each file.

File globs enable us to expand an expression to include multiple files. You may have used them before. For example, to change all HTML files in a directory, we may want to say something like *.html. That's reasonable and Perl supports it.

Another feature that comes in handy is the ability to run the script against all subdirectories in addition to the local directory. If you have a large site, it would be inconceivable to have to manually change all your files by hand. So what we're going to do next is to update our script to use recursion and file globs. To do that, we'll need to add the ability to use command-line switches using the Getopt::Std module.

The first thing we'll do besides adding command-line switches is to add file globbing. To make the script more flexible, we'll be adding options for specifying the directory we want to search, the filename(s) we want to search, the string that we're searching for, and the string that we want to replace the search string with. Because we'll be adding recursion shortly, we'll be putting the processing routine into a subroutine. Let's take a look at the main body of the modified script below:

1 use strict;
2 use Getopt::Std;
3 use Cwd;
4 use vars qw($opt_f $opt_s $opt_r $opt_d $opt_R);
5
6 getopt('dfsr');
7 &Usage unless $opt_f && $opt_s && $opt_r;
8 my $dir = ($opt_d) ? $opt_d : getcwd;
9 &Usage unless -d $dir;
10
11 my @files=split(/ /,$opt_f);
12 &search_n_replace($dir,$opt_s,$opt_r,\@files);

In lines 1 through 4, we're importing the modules and variables we'll be using in the script. On line 6, we're making a call to the function from the Getopt::Std module that reads the command-line and processes the switches. The switches are d for the directory, f for the filename(s), s for the search string, and r for the replace string. At a minimum, we need to have the filenames that we will process and the search and replace strings. Line 7 prints an error message with the correct syntax for the script if we don't have at least those three pieces of information. Line 8 checks for the d switch which specifies the directory to search. It's also shorthand for saying:

if ($opt_d) {
$dir = $opt_d;
} else {
$dir = getcwd;
}

The getcwd function was imported from the Cwd modules and returns the full path of the current working directory that the script was executed from. This is used as the default directory if one was not specified on the command line. Line 9 double checks the value of the $dir variable to make sure it is actually a directory. Line 11 is where we pick up the list of files that need to be parsed. This can be one file, or multiple files or file globs separated by a space, '*.html index.wml' for example. Line 12 passes the command line parameters and filenames to the search_n_replace subroutine, which we'll cover next.

One of the features of the script is the ability to include a regular expression in the search string. So if we wanted to replace Jonathan with Bob in all HTML files in the current directory and the name of the script was this2that.pl, the command line argument would look like the following:

this2that.pl -f '*.html' -s 'Jon(ath[oa]n)?' -r 'Bob'

If you do use a regular expression, be sure to leave off the forward slashes and modifiers. Those already exist inside the script. Now it's time to take a look at the search_n_replace function, the real workhorse of the script.

1  sub search_n_replace {
2    my ($dir,$search,$replace,$files) = @_;
3    my @thesefiles = @$files;
4
5    for (my $i=0; $i < @thesefiles; $i++) {
6        $thesefiles[$i] = "$dir/$thesefiles[$i]";
7    }
8    while (my $file=<@thesefiles>) {
9        open(IN,$file);
10       open(OUT,">$file.$$");
11       my $changed = 0;
12       while (<IN>) {
13           s/$search/$replace/ig && ($changed = 1)
14                && print "$file - Changed $search to $replace\n";
15           print OUT;
16       }
17       close(IN);
18       close(OUT);
19       ($changed==1) ? rename("$file.$$",$file) :
           unlink "$file.$$";
20   }
21 }

[Line 19 is one long line. It has been split for formatting purposes.]

Line 2 pulls in the parameters that were passed from the main body of the script. Line 3 grabs the list of files to process from the array reference. Lines 5-7 prepend the directory name to each file in the list. Remember that the user can specify what directory to search. Lines 8-20 are the main body of the sub- routine. Line 8 loops through each filename in the list that was specified on the command line. This is also where the file globbing magic comes in. If one of the items was really a glob, like *.html, then Perl interpolates that into a list of files based on the glob.

Lines 9 and 10 open the source file for reading and the temporary file that the changes will be written to. Line 12 loops over each line of the source file that we're searching. Line 13 contains our search and replace operator and uses the values that were passed from the command line. If a change was made, the variable $changed is set to 1 to signify that a match was made on the source file. Lines 17 and 18 close the input and temporary filehandles. In Line 19, if a change was made, the temporary file overwrites the original source file, otherwise it's deleted.

Pretty neat so far. Now we need to add recursion to enable the script to search sub-directories. Working with recursion can be magical and frustrating at the same time. The basic concept for our purposes is that when we are processing files inside the search_n_replace subroutine, it will call itself when it finds a subdirectory so that it can process those files and so on. It will recurse as deep as there are subdirectories. We will want to add a switch for recursion since it won't be a feature that's used all the time.

To make recursion work, we need to add some extra code between Lines 20 and 21 in the subroutine listing above:

1   if ($opt_R) {
2       opendir DIR,$dir || die "Cannot open $dir: $!\n";
3       my @dirs = grep -d, map "$dir/$_", grep
         !/^\./, readdir DIR;
4       closedir DIR;
5       foreach my $dir (@dirs) {
6           print "checking $dir\n";
7           &search_n_replace($dir,$search,$replace,$files);
8       }
9   }

[Line 3 is one long line. It has been split for formatting purposes.]

This little tidbit of code is very powerful and potentially confusing so let's step through it. The first thing we're doing is checking to see if the user turned on the command-line R switch. If they didn't we just ignore the whole block of code. If they did, we want to open a directory handle to get a list of directories inside the current directory that we're processing in Line 2. Line 3 has multiple statements wrapped into one so we'll read it slowly from right to left. First, the goal here is to get a list of the subdirectories. So the first thing we do is get that list with the readdir function. The next thing we need to do is to get rid of any directories that start with a "." character. Imagine recursing through the ".." directory. You would loop forever. This elimination is done with a regular expression that eliminates those unwanted directories. The next thing we need to do is append the current directory onto each subdirectory name, otherwise we won't know the true path to the subdirectory when it's time to process the files. Lastly, we double check each directory listing returned by readdir to make sure it's in fact a real directory. The remaining list is returned to the @dirs array.

Now, we need to loop over each of those subdirectories on Line 5 and call the search_n_replace subroutine, passing all of the necessary parameters. And that's it. Now we can recursively process all of our files:

this2that.pl -R -d '/var/www' -f '*.html index.wml'
 -s 'Jon(ath[oa]n)?' -r 'Bob'

[The lines above are one line. They have been split for formatting purposes.]

Replacing Strings in Files - Page 3
Weaving Magic With Regular Expressions
Conclusion - Page 5


Up to => Home / Authoring / Languages / Perl / Weave




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers