Replacing Strings in Multiple Files - Page 4
July 16, 2001
Two things that I love to use in programming as much as possible
are abstraction and recursion. Mainly because they
save me time. On a more practical note, our previous script isn't
going to help us much if we have to do a search and replace on
multiple files. And there may be hundreds of files in many
different subdirectories. It would still take alot of time to run
the script against each file.
File globs enable us to expand an expression to include multiple
files. You may have used them before. For example, to change all
HTML files in a directory, we may want to say something like
*.html. That's reasonable and Perl supports it.
Another feature that comes in handy is the ability to run the
script against all subdirectories in addition to the local
directory. If you have a large site, it would be inconceivable to
have to manually change all your files by hand. So what we're
going to do next is to update our script to use recursion and
file globs. To do that, we'll need to add the ability to use
command-line switches using the Getopt::Std module.
The first thing we'll do besides adding command-line switches is
to add file globbing. To make the script more flexible, we'll be
adding options for specifying the directory we want to search,
the filename(s) we want to search, the string that we're
searching for, and the string that we want to replace the search
string with. Because we'll be adding recursion shortly, we'll be
putting the processing routine into a subroutine. Let's take a
look at the main body of the modified script below:
1 use strict;
2 use Getopt::Std;
3 use Cwd;
4 use vars qw($opt_f $opt_s $opt_r $opt_d $opt_R);
5
6 getopt('dfsr');
7 &Usage unless $opt_f && $opt_s && $opt_r;
8 my $dir = ($opt_d) ? $opt_d : getcwd;
9 &Usage unless -d $dir;
10
11 my @files=split(/ /,$opt_f);
12 &search_n_replace($dir,$opt_s,$opt_r,\@files);
In lines 1 through 4, we're importing the modules and variables
we'll be using in the script. On line 6, we're making a call to
the function from the Getopt::Std module that reads
the command-line and processes the switches. The switches are
d for the directory, f for the filename(s),
s for the search string, and r for the replace
string. At a minimum, we need to have the filenames that we will
process and the search and replace strings. Line 7 prints an
error message with the correct syntax for the script if we don't
have at least those three pieces of information. Line 8 checks
for the d switch which specifies the directory to search.
It's also shorthand for saying:
if ($opt_d) {
$dir = $opt_d;
} else {
$dir = getcwd;
}
The getcwd function was imported from the Cwd
modules and returns the full path of the current working
directory that the script was executed from. This is used as the
default directory if one was not specified on the command line.
Line 9 double checks the value of the $dir variable
to make sure it is actually a directory. Line 11 is where we pick
up the list of files that need to be parsed. This can be one
file, or multiple files or file globs separated by a space,
'*.html index.wml' for example. Line 12 passes the command line
parameters and filenames to the search_n_replace
subroutine, which we'll cover next.
One of the features of the script is the ability to include a
regular expression in the search string. So if we wanted to
replace Jonathan with Bob in all HTML files in the current
directory and the name of the script was this2that.pl, the
command line argument would look like the following:
this2that.pl -f '*.html' -s 'Jon(ath[oa]n)?' -r 'Bob'
If you do use a regular expression, be sure to leave off the
forward slashes and modifiers. Those already exist inside the
script. Now it's time to take a look at the search_n_replace
function, the real workhorse of the script.
1 sub search_n_replace {
2 my ($dir,$search,$replace,$files) = @_;
3 my @thesefiles = @$files;
4
5 for (my $i=0; $i < @thesefiles; $i++) {
6 $thesefiles[$i] = "$dir/$thesefiles[$i]";
7 }
8 while (my $file=<@thesefiles>) {
9 open(IN,$file);
10 open(OUT,">$file.$$");
11 my $changed = 0;
12 while (<IN>) {
13 s/$search/$replace/ig && ($changed = 1)
14 && print "$file - Changed $search to $replace\n";
15 print OUT;
16 }
17 close(IN);
18 close(OUT);
19 ($changed==1) ? rename("$file.$$",$file) :
unlink "$file.$$";
20 }
21 }
[Line 19 is one long line. It has been split for formatting
purposes.]
Line 2 pulls in the parameters that were passed from the main
body of the script. Line 3 grabs the list of files to process
from the array reference. Lines 5-7 prepend the directory name to
each file in the list. Remember that the user can specify what
directory to search. Lines 8-20 are the main body of the sub-
routine. Line 8 loops through each filename in the list that was
specified on the command line. This is also where the file
globbing magic comes in. If one of the items was really a glob,
like *.html, then Perl interpolates that into a list of files
based on the glob.
Lines 9 and 10 open the source file for reading and the temporary
file that the changes will be written to. Line 12 loops over each
line of the source file that we're searching. Line 13 contains
our search and replace operator and uses the values that were
passed from the command line. If a change was made, the variable
$changed is set to 1 to signify that a match was
made on the source file. Lines 17 and 18 close the input and
temporary filehandles. In Line 19, if a change was made, the
temporary file overwrites the original source file, otherwise
it's deleted.
Pretty neat so far. Now we need to add recursion to enable the
script to search sub-directories. Working with recursion can be
magical and frustrating at the same time. The basic concept for
our purposes is that when we are processing files inside the
search_n_replace subroutine, it will call itself
when it finds a subdirectory so that it can process those files
and so on. It will recurse as deep as there are subdirectories.
We will want to add a switch for recursion since it won't be a
feature that's used all the time.
To make recursion work, we need to add some extra code between
Lines 20 and 21 in the subroutine listing above:
1 if ($opt_R) {
2 opendir DIR,$dir || die "Cannot open $dir: $!\n";
3 my @dirs = grep -d, map "$dir/$_", grep
!/^\./, readdir DIR;
4 closedir DIR;
5 foreach my $dir (@dirs) {
6 print "checking $dir\n";
7 &search_n_replace($dir,$search,$replace,$files);
8 }
9 }
[Line 3 is one long line. It has been split for formatting
purposes.]
This little tidbit of code is very powerful and potentially
confusing so let's step through it. The first thing we're doing
is checking to see if the user turned on the command-line R
switch. If they didn't we just ignore the whole block of code. If
they did, we want to open a directory handle to get a list of
directories inside the current directory that we're processing in
Line 2. Line 3 has multiple statements wrapped into one so we'll
read it slowly from right to left. First, the goal here is to get
a list of the subdirectories. So the first thing we do is get
that list with the readdir function. The next thing
we need to do is to get rid of any directories that start with a
"." character. Imagine recursing through the ".." directory. You
would loop forever. This elimination is done with a regular
expression that eliminates those unwanted directories. The next
thing we need to do is append the current directory onto each
subdirectory name, otherwise we won't know the true path to the
subdirectory when it's time to process the files. Lastly, we
double check each directory listing returned by
readdir to make sure it's in fact a real directory.
The remaining list is returned to the @dirs array.
Now, we need to loop over each of those subdirectories on Line 5
and call the search_n_replace subroutine, passing
all of the necessary parameters. And that's it. Now we can
recursively process all of our files:
this2that.pl -R -d '/var/www' -f '*.html index.wml'
-s 'Jon(ath[oa]n)?' -r 'Bob'
[The lines above are one line. They have been split for
formatting purposes.]
Replacing Strings in Files - Page 3
Weaving Magic With Regular Expressions
Conclusion - Page 5
|