





















## 8. Merging Write Buffer to Reduce Miss Penalty

- Write buffer to allow processor to continue while waiting to write to memory
- When a new entry is loaded in the buffer, its address is checked against the other blocks in the buffer
- · If there's a match, blocks are combined

Adapted from Patterson and Hennessey (Morgan Kauffman Pubs)





















| Technique                                        | Hit<br>Time | Band-<br>width | Mi<br>ss<br>pe<br>nal<br>ty | Miss<br>rate | HW cost/<br>complexity           | Comment                                                            |
|--------------------------------------------------|-------------|----------------|-----------------------------|--------------|----------------------------------|--------------------------------------------------------------------|
| Small and simple caches                          | +           |                |                             | -            | 0                                | Trivial; widely used                                               |
| Way-predicting caches                            | +           |                |                             |              | 1                                | Used in Pentium 4                                                  |
| Trace caches                                     | +           |                |                             |              | 3                                | Used in Pentium 4                                                  |
| Pipelined cache access                           | -           | +              |                             |              | 1                                | Widely used                                                        |
| Nonblocking caches                               |             | +              | +                           |              | 3                                | Widely used                                                        |
| Banked caches                                    |             | +              |                             |              | 1                                | Used in L2 of Opteron and<br>Niagara                               |
| Critical word first and early<br>restart         |             |                | +                           |              | 2                                | Widely used                                                        |
| Merging write buffer                             |             |                | +                           |              | 1                                | Widely used with write through                                     |
| Compiler techniques to reduce cache misses       |             |                |                             | +            | 0                                | Software is a challenge;<br>some computers have<br>compiler option |
| Hardware prefetching of<br>instructions and data |             |                | +                           | +            | 2 instr., <mark>3</mark><br>data | Many prefetch instructions;<br>AMD Opteron prefetches<br>data      |
| Compiler-controlled<br>prefetching               |             |                | +                           | +            | 3                                | Needs nonblocking cache; in<br>many CPUs                           |