Encyclopedia > Cache

Article Content

Cache

In it's original meaning, a cache (pronounced "cash") is a store or supply, usually hidden, that can be used later.

In computer science it is a short-term memory in a computer with quick access. A cache is intended to speed up access to a set of data. The cache will be a piece of memory that is faster (hence more expensive, hence smaller) than the principal data storage area for the data in question. The cache operates by storing a part of the data, allowing that part to be accessed more quickly. A speed-up is achieved if many accesses to the data can access the data in the cache. The reason caches work at all is that many access patterns in typical computer applications have locality of reference. There are several sorts of locality, but we mainly mean that often the same data is accessed frequently or with accesses that are close together in time, or that data near to each other are accessed close together in time.

One example is a RAM that caches the accesses to the hard drive, so that when two subsequent reads of the same file from the hard disk occur, only one physical read from the disk is performed. Modern disk drives have such a cache built in to the electronics of the drive. Most modern operating systems also operate some sort of disk cache using the computer's main memory.

Caches are also used between the CPU and the (slower) main RAM in modern PC. There are often several levels of cache, with different sizes and access speed. Generally the caches get smaller, faster, and more expensive, the closer to the CPU they are.

Caches do not have to be thought about in purely low level/OS terms, software can cache return values that are likely to be re-used in the near future. A good example of this is the BIND DNS daemon which caches domain name replies for a period before going back to the authorative source of the name to IP mapping. Resolver libraries also perform almost exactly the same caching.

Web browsers also use caches. Some browses implement their own cache of recently visited web pages; this saves having to download them again when you revisit. Most browsers can be configured to use an external proxy web cache[?], a server program through which all web requests are routed so that it can cache frequently accessed pages for everyone in an organization.

The search engine Google takes a snapshot of each page examined as it crawls the web and caches these as a back-up in case the original page is unavailable. If you click on the "Cached" link in a Google search result, you will see the web page as it looked when Google indexed it.

When designing a cache there are a number of points to consider:

What data should be cached?
What are the typical access patterns to this data?
How big should I make the cache?
How should data be brought into the cache?
When the cache is full, what data should be removed (evicted) from the cache?

A number of different algorithms are used for optimizing the contents of the cache, including to remove the "least recently used" - LRU[?] - contents when the cache is full.

CPU memory cache

This section discusses in more detail the typical caches used by a CPU to cache data from main memory.

Typically the fastest piece of memory on a CPU is internal flip-flops and buffers. These are not generally accessible to the programmer. Next fastest is the architecture registers. These are the registers that are accessible to the programmer (if she is programming in assembler). Next there is usually at least one cache between those registers and main memory (RAM); often there are two caches, one on-chip (on the same piece of silicon as the rest of the CPU) and one off-chip. When there is more than one cache they are usually called level-1 (for the one nearest the CPU), level-2, level-3 and so on.

The level-1 cache might be between 32 kibibits and 512 kibibits, say. In a 32 bit architecture that would be between 1024 and 16384 words (each word being 32 bits).

For example, an Intel 80486 has an 8-KB on-chip cache, and most Pentiums have a 16-KB on-chip level one cache that consists of an 8-KB instruction cache and an 8-KB data cache.

A cache will usually arrange data in "lines" (sometimes also called blocks), a line is a contiguous sequence of words starting at an address that is a multiple of the line size. A line might be 8 words say. So we might have 1024 lines in our cache. A "line" in main memory when accessed will end up as a line in the cache. Usually a cache will transfer data to and from memory in chunks of a whole line, this is more efficient than transferring each word individually. This is both a source of efficiency and inefficiency: if we end up accessing all of the data in a line then bring the line into cache all at once will have been more efficient than bring each word into the cache; on the other hand, if we only access one word from the line then we paid the cost of having to bring the whole line in which will be slower.

When a line from main memory is stored in the cache the cache has to associate the address of the data (where it is in main memory) with the data, so that it can correctly process accesses to the data later on. Most caches are arranged so that a particular line in main memory can end up in only a few places in the cache. If any line in main memory can go in any line in the cache then the cache is said to be "fully associative". If any line in main memory can go in only one particular line in the cache then the cache is said to be "direct mapped". Usually the situation is somewhere in between, in which case we have an N-way set associative cache. For example, a cache is "8-way set associative" if a particular line in main memory can go in one of up to eight lines in the cache.

A direct mapped cache is easier to implement: let's say we have 1024 lines of 32 bytes each on a 32-bit byte-addressed architecture. Each address in main memory must go into a corresponding line of the cache, as determined by some direct mapping scheme. If the cache line location corresponds to bits 5-14 of the main memory address (that is the address modulo 2¹⁵, divided by 2⁵), the correspondence between main memory addresses and cache lines looks like this:

 main memory address           cache line
 0-31                          0
 32-63                         1
 64-95                         2
 ...                           ...
 32736-32767                   1023
 32768-32799                   0
 32780-32811                   1

and so the pattern repeats through memory.

In a direct mapped cache when we store a line in the cache all we need to associate with it are bits 15-31 of its address. The low order bits are determined by where it is in the cache. The real saving comes when an access is performed. When a location in main memory is accessed we need to check the contents of the cache to see whether we have that data cached; in a direct mapped cache we only have one line to check, because there is only one place in the cache that it can be; we simply check the high order address bits that we have associated with that line in the cache to see if they match the address of the data being accessed.

The more ways associative a cache is the more address bits we need to associate with each line, but more importantly, the more lines we need to check when an access is performed. In a eight-way set associative cache we need to check eight different lines in the cache to see if the data is stored. Because of the speeds at which caches operate the cost in power to drive the electronics of each line to check its contents becomes significant. See also cache coherency.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!