Frequently Asked Questions
June 21, 1999
What are the most important things to make cacheable?
A good strategy is to identify the most popular, largest objects
(especially images) and work with them first.
How can I make my pages as fast as possible with caches?
The most cacheable object is one with a long freshness time set.
Validation does help reduce the time that it takes to see an object, but the
cache still has to contact the origin server to see if it's fresh. If the
cache already knows it's fresh, it will be served directly.
I understand that caching is good, but I need to keep
statistics on how many people visit my page!
If you must know every time a page is accessed, select ONE small object
on a page (or the page itself), and make it uncacheable, by giving it a
suitable headers. For example, you could refer to a 1x1 transparent
uncacheable image from each page. The Referer header will contain
information about what page called it.
Be aware that even this will not give truly accurate statistics
about your users, and is unfriendly to the Internet and your users;
it generates unnecessary traffic, and forces people to wait for
that uncached item to be downloaded. For more information about
this, see On Interpreting Access Statistics in the
references.
I've got a page that is updated often. How do I keep caches
from giving my users a stale copy?
The Expires header is the best way to do this. By setting the
server to expire the document based on its modification time, you
can automatically have caches mark it as stale a set amount of time
after it is changed.
For example, if your site's home page changes every day at 8am,
set the Expires header for 23 hours after the last modification
time. This way, your users will always get a fresh copy of the
page.
See also the Cache-Control: max-age header.
How can I see which HTTP headers are set for an object?
To see what the Expires and Last-Modified headers are, open the
page with Netscape and select 'page info' from the View menu. This
will give you a menu of the page an any objects (like images)
associated with it, along with their details.
To see the full headers of an object, you'll need to manually
connect to the Web server using a Telnet client. Depending on what
program you use, you may need to type the port into a separate
field, or you may need to connect to www.myhost.com:80 or
www.myhost.com 80 (note the space). Consult your Telnet client's
documentation.
Once you've opened a connection to the site, type a request for
the object. For instance, if you want to see the headers for
http://www.myhost.com/foo.html, connect to www.myhost.com, port 80,
and type:
GET /foo.html HTTP/1.1 [return]
Host: www.myhost.com [return][return]
Press the Return key every time you see [return]; make sure to
press it twice at the end. This will print the headers, and then
the full object. To see the headers only, substitute HEAD for
GET.
My pages are password-protected; how do proxy caches deal with
them?
By default, pages protected with HTTP authentication are marked
private; they will not be cached by shared caches. However, you can
mark authenticated pages public with a Cache-Control header; HTTP
1.1-compliant caches will then allow them to be cached.
If you'd like the pages to be cacheable, but still authenticated
for every user, combine the Cache-Control: public and
no-cache headers. This tells the cache that it must submit
the new client's authentication information to the origin server
before releasing the object from the cache.
Whether or not this is done, it's best to minimize use of
authentication; for instance, if your images are not sensitive, put
them in a separate directory and configure your server not to force
authentication for it. That way, those images will be naturally
cacheable.
Should I worry about security if my users access my site
through a cache?
SSL pages are not cached (or unencrypted) by proxy caches, so
you don't have to worry about that. However, because caches store
non-SSL requests and URLs fetched through them, you should be
conscious of security on unsecured sites; an unscrupulous
administrator could conceivably gather information about their
users.
In fact, any administrator on the network between your server
and your clients could gather this type of information. One
particular problem is when CGI scripts put usernames and passwords
in the URL itself; this makes it trivial for others to find and
user their login.
If you're aware of the issues surrounding Web security in
general, you shouldn't have any surprises from proxy caches.
I'm looking for an integrated Web publishing solution. Which
ones are cache-aware?
It varies. Generally speaking, the more complex a solution is,
the more difficult it is to cache. The worst are ones which
dynamically generate all content and don't provide validators; they
may not be cacheable at all. Speak with your vendor's technical
staff for more information, and see the Implementation notes
below.
My images expire a month from now, but I need to change them in
the caches now!
The Expires header can't be circumvented; unless the cache
(either browser or proxy) runs out of room and has to delete the
objects, the cached copy will be used until then.
The most effective solution is to rename the files; that way,
they will be completely new objects, and loaded fresh from the
origin server. Remember that the page that refers to an object will
be cached as well. Because of this, it's best to make static images
and similar objects very cacheable, while keeping the HTML pages
that refer to them on a tight leash.
If you want to reload an object from a specific cache, you can
either force a reload (in Netscape, holding down shift while
pressing 'reload' will do this by issuing a Pragma: no-cache
request header) while using the cache. Or, you can have the cache
administrator delete the object through their interface.
I run a Web Hosting service. How can I let my users publish
cache-friendly pages?
If you're using Apache, consider allowing them to use .htaccess
files, and provide appropriate documentation.
Otherwise, you can establish predetermined areas for various
caching attributes in each virtual server. For instance, you could
specify a directory /cache-1m that will be cached for one month
after access, and a /no-cache area that will be served with headers
instructing caches not to store objects from it.
Whatever you are able to do, it is best to work with your
largest customers first on caching. Most of the savings (in
bandwidth and in load on your servers) will be realized from
high-volume sites.
Tips for Building a Cache-Aware Site
Caching Tutorial for Web Authors and Webmasters
A Note About the HTTP
|