Tips for Building a Cache-Aware Site
June 21, 1999
Besides using freshness information and validation, there are a number
of other things you can do to make your site more cache-friendly.
- Refer to objects consistently - this is the
golden rule of caching. If you serve the same content on different
pages, to different users, or from different sites, it should use
the same URL. This is the easiest and most effective may to make
your site cache-friendly. For example, if you use /index.html in
your HTML as a reference once, always use it that way.
- Use a common library of images and other
elements and refer back to them from different places.
- Make caches store images and pages that don't change
often by specifying either a far-away Expires header.
- Make caches recognize regularly updated pages
by specifying an appropriate expiration time.
- If a resource (especially a downloadable file) changes,
change its name. That way, you can make it expire far in
the future, and still guarantee that the correct version is served;
the page that links to it is the only one that will need a short
expiry time.
- Don't change files unnecessarily. If you do, everything
will have a falsely young Last-Modified date. For instance, when updating
your site, don't copy over the entire site; just move the files that you've
changed.
- Use cookies only where necessary - cookies are
difficult to cache, and aren't needed in most situations. If you
must use a cookie, limit its use to dynamic pages.
- Minimize use of SSL - because encrypted pages
are not stored by shared caches, use them only when you have to,
and use images on SSL pages sparingly.
- use the Cacheability
Engine - it can help you apply many of the concepts in this
tutorial.
By default, most scripts won't return a validator (e.g., a Last-Modified
or Etag HTTP header) or freshness information (Expires or Cache-Control).
While some scripts really are dynamic (meaning that they return a different
response for every request), many (like search engines and database-driven
sites) can benefit from being cache-friendly.
Generally speaking, if a script produces output that is reproducible with
the same request at a later time (whether it be minutes or days later), it
should be cacheable. If the content of the script changes only depending on
what's in the URL, it is cacheable; if the output depends
on a cookie, authentication information or other external criteria, it
probably isn't.
- The best way to make a script cache-friendly (as well as perform
better) is to dump its content to a plain file whenever it changes. The Web
server can then treat it like any other Web page, generating and using
validators, which makes your life easier. Remember to only write files that
have changed, so the Last-Modified times are preserved.
- Another way to make a script cacheable in a limited fashion is
to set an age-related header for as far in the future as practical.
Although this can be done with Expires, it's probably easiest to do
so with Cache-Control: max-age, which will make the request fresh for an
amount of time after the request.
- If you can't do that, you'll need to make the script generate a
validator, and then respond to If-Modified-Since and/or
If-None-Match requests. This can be done by parsing the HTTP
headers, and then responding with 304 Not Modified when
appropriate. Unfortunately, this is not a trivial task.
Some other tips;
- If you have to use scripting, don't POST unless
it's appropriate. The POST method is (practically) impossible to cache; if
you send information in the path or query (via GET), caches can store that
information for the future. POST, on the other hand, is good for sending
large amount of information to the server (which is why it won't be cached;
it's very unlikely that the same exact POST will be made twice).
- Don't embed user-specific information in the
URL unless the content generated is completely unique to
that user.
- Don't count on all requests from a user coming from the
same host, because caches often work together.
- Generate Content-Length response headers. It's
easy to do, and it will allow the response of your script to be
used in a persistent connection. This allows a client (whether
a proxy or a browser) to request multiple objects on one TCP/IP connection,
instead of setting up a connection for every request. It makes your site
seem much faster.
See the
Implementation Notes for more
specific information.
How (and how not) to Control Caches
Caching Tutorial for Web Authors and Webmasters
Frequently Asked Questions
|