Enter the Perl - Page 4
August 28, 2000
We haven't seen much Perl yet, in this alleged Perl You Need to Know
installment. Fair enough ... we've merely been setting the stage, priming
the pump, sowing the seeds, raking the muck ...
To demonstrate the integration of HTMLDOC into a Perl CGI script, we've
concocted a simple but useful example. Our script, when executed via a web
server, accepts a
CGI parameter (such as a form field) named
innerHTML. The script will then, on-the-fly without need of reading
from or writing to any temporary files, convert innerHTML to a PDF
file that is sent directly to the user's browser.
On a machine with Adobe Acrobat reader installed, the browser should
recognize the data stream of the MIME type application/pdf, and
launch the Acrobat reader. A browser without this capability will probably
ask the user what he or she would like to do with the file and/or where to
save it. These two screenshots illustrate a front-end page we built to the
CGI script, and the results produced from the script.
Rough front-end to the CGI script:

Results in PDF as rendered by Adobe Acrobat reader:

Where's the magic? The concise script below brings together the wonderful
worlds of Perl and HTMLDOC.
#!/usr/bin/perl
#Generate on-the-fly PDF with HTMLDOC
use CGI;
#grab input data from CGI
my $cgiobj=new CGI;
my $innerHTML=$cgiobj->param("innerHTML");
#process input data to escape special characters
my %specialChars=(
"'","´",
"\$","\\\$",
"\@","\\\@",
"\!","\\\!",
"\n","",
"\r","");
my $specialCharList=join("",keys %specialChars);
$innerHTML=~s/\\/\\\\\\\\/g;
$innerHTML=~s/([$specialCharList])/$specialChars{$1}/g;
#Output PDF header to browser
print "Content-Type: application/pdf\r\n\r\n";
#Create and output PDF binary stream to browser
if ($innerHTML) {
my $command="`echo -e '".$innerHTML."' | htmldoc --webpage -t pdf - `";
print eval($command);
}
Our script begins by accepting input from the CGI environment, for the
parameter innerHTML, which contains the escaped HTML code that we
want to convert to PDF. You might skip this code if you incorporate this
script into a larger program that internally generates the HTML that you
wish to convert, rather than needing to receive the HTML from an outside
source.
However you get the HTML source text, we need to process special characters
which could cause problems if not properly escaped. An efficient way to
search-and-replace from a set of substitutions is using a hash with a
substitution regular expression. Here we create a hash with each of the
key-value substitutions that we want to perform on the original source text.
We figured out these problem characters and their solutions through trial
and error, mostly. Once the hash is created, a scalar string
$specialCharList holds each of the substitution keys, which we'll use
momentarily.
One character that posed serious problems was the back slash (\), and the
only successful means of escaping it required a substitution match of its
own, which we see in the odd line marked with a series of back slashes.
Having escaped those, we then escape the rest of the substitution set using
the intriguing line of code:
$innerHTML=~s/([$specialCharList])/$specialChars{$1}/g;
Because of the square brackets in the regexp match clause, this expression
will match any of the substitution keys in our hash. For each matched key,
we substitute it with the character pulled from the hash, relying on
parenthetical grouping to "remember" the matched key, used in the
righthand expression as the variable $1.
Finally, the HTML source text has been adequately escaped. The next and very
important step is to prepare the browser to receive data of the MIME type
application/pdf. The browser will take the appropriate action when it
receives this header -- hopefully, will launch the Adobe Acrobat reader if
the user has installed it.
Because HTMLDOC is an external program, we need to execute a system call to
run it. First, we construct the command complete with the echo
statement that pipes the HTML source text into HTMLDOC. Notice the
backtics (`) surrounding the command -- this is how we execute a system
call in Perl for which we want to receive the results that the command
outputs. In this case, after the HTML is piped into HTMLDOC, we want to
receive the output from HTMLDOC (which will be the binary PDF data), so
that we can send the output to the browser.
Perl's eval() function will interpret a string as Perl code. So, with
the command loaded into $command, we execute the system call and
print the results to the browser -- in effect, sending the binary PDF data
directly to the user's browser.
It's important to remember that HTTP headers can only be sent once per page
-- therefore, the application/pdf header must be the first line
of output sent to the browser.
HTML Becomes PDF - Page 3
The Perl You Need to Know
Conclusion - Page 5
|