Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Enter the Perl - Page 4

August 28, 2000

We haven't seen much Perl yet, in this alleged Perl You Need to Know installment. Fair enough ... we've merely been setting the stage, priming the pump, sowing the seeds, raking the muck ...

To demonstrate the integration of HTMLDOC into a Perl CGI script, we've concocted a simple but useful example. Our script, when executed via a web server, accepts a CGI parameter (such as a form field) named innerHTML. The script will then, on-the-fly without need of reading from or writing to any temporary files, convert innerHTML to a PDF file that is sent directly to the user's browser.

On a machine with Adobe Acrobat reader installed, the browser should recognize the data stream of the MIME type application/pdf, and launch the Acrobat reader. A browser without this capability will probably ask the user what he or she would like to do with the file and/or where to save it. These two screenshots illustrate a front-end page we built to the CGI script, and the results produced from the script.

Rough front-end to the CGI script:


Results in PDF as rendered by Adobe Acrobat reader:

Where's the magic? The concise script below brings together the wonderful worlds of Perl and HTMLDOC.

#!/usr/bin/perl
#Generate on-the-fly PDF with HTMLDOC

use CGI;

#grab input data from CGI
my $cgiobj=new CGI;
my $innerHTML=$cgiobj->param("innerHTML");

#process input data to escape special characters
my %specialChars=(
  "'","´",
  "\$","\\\$",
  "\@","\\\@",
  "\!","\\\!",
  "\n","",
  "\r","");
my $specialCharList=join("",keys %specialChars);
$innerHTML=~s/\\/\\\\\\\\/g;
$innerHTML=~s/([$specialCharList])/$specialChars{$1}/g;

#Output PDF header to browser
print "Content-Type: application/pdf\r\n\r\n";

#Create and output PDF binary stream to browser
if ($innerHTML) {
  my $command="`echo -e '".$innerHTML."' | htmldoc --webpage -t pdf - `";
  print eval($command);
}

Our script begins by accepting input from the CGI environment, for the parameter innerHTML, which contains the escaped HTML code that we want to convert to PDF. You might skip this code if you incorporate this script into a larger program that internally generates the HTML that you wish to convert, rather than needing to receive the HTML from an outside source.

However you get the HTML source text, we need to process special characters which could cause problems if not properly escaped. An efficient way to search-and-replace from a set of substitutions is using a hash with a substitution regular expression. Here we create a hash with each of the key-value substitutions that we want to perform on the original source text. We figured out these problem characters and their solutions through trial and error, mostly. Once the hash is created, a scalar string $specialCharList holds each of the substitution keys, which we'll use momentarily.

One character that posed serious problems was the back slash (\), and the only successful means of escaping it required a substitution match of its own, which we see in the odd line marked with a series of back slashes. Having escaped those, we then escape the rest of the substitution set using the intriguing line of code:

$innerHTML=~s/([$specialCharList])/$specialChars{$1}/g;

Because of the square brackets in the regexp match clause, this expression will match any of the substitution keys in our hash. For each matched key, we substitute it with the character pulled from the hash, relying on parenthetical grouping to "remember" the matched key, used in the righthand expression as the variable $1.

Finally, the HTML source text has been adequately escaped. The next and very important step is to prepare the browser to receive data of the MIME type application/pdf. The browser will take the appropriate action when it receives this header -- hopefully, will launch the Adobe Acrobat reader if the user has installed it.

Because HTMLDOC is an external program, we need to execute a system call to run it. First, we construct the command complete with the echo statement that pipes the HTML source text into HTMLDOC. Notice the backtics (`) surrounding the command -- this is how we execute a system call in Perl for which we want to receive the results that the command outputs. In this case, after the HTML is piped into HTMLDOC, we want to receive the output from HTMLDOC (which will be the binary PDF data), so that we can send the output to the browser.

Perl's eval() function will interpret a string as Perl code. So, with the command loaded into $command, we execute the system call and print the results to the browser -- in effect, sending the binary PDF data directly to the user's browser.

It's important to remember that HTTP headers can only be sent once per page -- therefore, the application/pdf header must be the first line of output sent to the browser.

HTML Becomes PDF - Page 3
The Perl You Need to Know
Conclusion - Page 5


Up to => Home / Authoring / Languages / Perl / PerlfortheWeb




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers