How (and how not) to Control Caches
June 21, 1999
Although the Expires header is useful, it is still somewhat
limited; there are many situations where content is cacheable, but
the HTTP 1.0 protocol lacks methods of telling caches what it is,
or how to work with it.
HTTP 1.1 introduces a new class of headers, the
Cache-Control response headers, which allow Web publishers to
define how pages should be handled by caches. They include
directives to declare what should be cacheable, what may be stored
by caches, modifications of the expiration mechanism, and
revalidation and reload controls.
Interesting Cache-Control response headers include:
- max-age=[seconds] - specifies the maximum amount of
that an object will be considered fresh. Similar to Expires, this directive
allows more flexibility. [seconds] is the number of seconds from the time of
the request you wish the object to be fresh for.
- s-maxage=[seconds] - similar to max-age, except that it
only applies to proxy (shared) caches.
- public - marks the response as cacheable, even
if it would normally be uncacheable. For instance, if your pages
are authenticated, the public directive makes them cacheable.
- no-cache - forces caches (both proxy
and browser) to submit the request to the origin server for
validation before releasing a cached copy, every time. This is
useful for to assure that authentication is respected (in
combination with public), or to maintain rigid object freshness,
without sacrificing all of the benefits of caching.
- must-revalidate - tells caches that they must obey
any freshness information you give them about an object. The HTTP allows
caches to take liberties with the freshness of objects; by specifying this
header, you're telling the cache that you want it to strictly follow your
rules.
- proxy-revalidate - similar to must-revalidate,
except that it only applies to proxy caches.
For example:
Cache-Control: max-age=3600, must-revalidate
If you plan to use the Cache-Control headers, you should have a
look at the excellent documentation in the HTTP 1.1 draft; see
References and Further Information.
In How Web Caches Work, we said that
validation is used by servers and caches to communicate when an
object has changed. By using it, caches avoid having to download
the entire object when they already have a copy locally, but
they're not sure if it's still fresh.
Validators are very important; if one isn't
present, and there isn't any freshness information (Expires or Cache-Control)
available, most caches will not store an object at all.
The most common validator is the time that the document last
changed, the Last-Modified time. When a cache has an
object stored that includes a Last-Modified header, it can use it
to ask the server if the object has changed since the last time it
was seen, with an If-Modified-Since request.
HTTP 1.1 introduced a new kind of validator called the ETag.
Etags are unique identifiers that are generated by the server and
changed every time the object does. Because the server controls how
the ETag is generated, caches can be surer that if the ETag matches
when they make a If-None-Match request, the object really is the
same.
Almost all caches use Last-Modified times in determining if an
object is fresh; as more HTTP/1.1 caches come online, Etag headers
will also be used.
Most modern Web servers will generate both ETag and
Last-Modified validators for static content automatically; you
won't have to do anything. However, they don't know enough about
dynamic content (like CGI, ASP or database sites) to generate them;
see Writing Cache-Aware Scripts.
How (and how not) to Control Caches
Caching Tutorial for Web Authors and Webmasters
Tips for Building a Cache-Aware Site
|