Overview of proxy server caching

Caching is a feature in which the proxy server saves local copies of the files that clients request so that it can serve them quickly from the cache when they are requested again by the same or other clients.

Caching Proxy is HTTP 1.1 compliant and generally follows the HTTP 1.1 protocol for caching and determining the freshness of documents.

This section contains information about some features of the proxy server cache. For those features that can be configured, details about how to set the appropriate values are included in subsequent chapters.

Cache storage

The proxy server can store the cache on a physical storage device or in system memory. The type of cache storage that is better for your system depends on the capabilities of your hardware and on whether it is more important to have fast cache response or to have a larger number of items that are stored in the cache. Response time for a memory cache typically is faster than for a disk cache, but the size of a memory cache is limited by the amount of RAM in the proxy server machine. The size of a disk cache is limited by the size of the storage device, which typically is much larger than the amount of RAM.

For disk caches, Caching Proxy uses raw disk caching, which means that the proxy server writes directly to the cache device without using the operating system's read and write protocols. The storage device for a disk cache must be prepared by using the htcformat command.

The cache index

In either a memory or disk cache, Caching Proxy also uses system memory space to hold an index of the cache, which reduces the processing time to find cached files.

Caching Proxy's cache directory structure and its lookup methods are different from those of other proxy servers. Caching Proxy maintains an index in memory with information about the files in the cache. Using RAM for lookup instead of a disk or other medium results in faster file lookup and retrieval.

The index includes URLs, cache locations, and expiry information for cached objects. For this reason, the amount of memory necessary to hold the index is proportional to the number of objects in the cache.

When a request is received from a client, the proxy checks the cache index in memory for that URL.

If the file is not in the index, the request is made to the destination server.
- The URL is then checked to determine whether the retrieved file can be cached. If allowed, the proxy server caches the retrieved file.
- The cache index is then updated with URL, location, and expiry information for the newly cached object.
If the file is in the index:
- The expiry information is checked to determine whether the cached file is fresh.
  - If the object has expired, the destination server is contacted, and the expired object is replaced by the newly retrieved document. Expiry information is updated in the cache index.
  - If the object has not expired, the document is served from the proxy cache.

FTP caching

When the proxy is configured to cache requests, it can cache FTP file requests as well as HTTP file requests. However, because FTP files do not contain the same type of header information as HTTP files, expiration dates for cached FTP files are calculated differently than for other cached files.

When a request is made to the FTP server to retrieve a file, the proxy first sends to the FTP server a LIST request for the file to obtain FTP directory information about the file. If the FTP server responds to the LIST request with a positive completion reply and the directory information for the file, the proxy creates an HTTP Last-Modified header with the date parsed from the FTP directory information. The caching function of the proxy then uses this Last-Modified header, together with the value set in the CacheLastModifiedFactor directive in the configuration file to determine the length of time that the FTP file remains in the cache before expiring.

FTP files that are retrieved for a specific user ID rather than by anonymous login are considered to be private files and are not cached.

DNS caching

In addition to caching web content, the proxy server performs domain name server (DNS) caching. For example, when a client requests a URL from www.myWebsite.com, the proxy asks its DNS server to resolve the host name www.myWebsite.com to an IP address. The IP address is then cached to improve response time for subsequent requests to that host name. DNS caching is automatic and cannot be reconfigured.

Cache exclusions

Some files and documents are never cached. These include the following:

Files that are returned from requests by using HTTP methods other than GET, such as POST and PUT.
Any documents that require authentication, unless caching such documents is allowed by the origin server.
The dynamic output of any CGI script (because this is unique each time it is requested). Dynamically generated results from servlets and JavaServer Pages (JSP) ran by IBM® WebSphere® Application Server can be cached if dynamic caching is enabled. Refer to Caching dynamically generated content for details.
Any information that is passed on an SSL tunneling connection (because the proxy cannot decrypt the data passing through it).
Any file that is returned from a URL containing a question mark (?), unless query caching is allowed. (Refer to Controlling what is cached for information about configuring query results caching.)

It is possible to further restrict the items that are cached by setting caching filters. For example, you might not want the proxy server to cache files that are served locally from the proxy.

Cache management

Managing a cache involves many factors. As a server administrator, you can specify the following:

What documents are cached (refer to Controlling what is cached for details).
How many documents can be cached (refer to Configuring basic caching for details).
How long cached documents are considered current (refer to Maintaining cache content for details).
How frequently the cache is purged (garbage collection) and what type of files tend to be kept (refer to Maintaining cache content for details).
How cached documents are indexed (refer to Configuring basic caching for details).
When the cache is refreshed (refer to Configuring the cache agent for automatic refreshing and preloading for details).
Remote cache access (refer to Using a shared cache for details).
How logs are kept and archived (refer to Configuring basic caching for details).

In addition, adjustments can be made to cache configuration to improve the overall performance of Caching Proxy.