dcbt (Data Cache Block Touch) instruction
Purpose
Allows a program to request a cache block fetch before it is actually needed by the program.
Syntax
Bits | Value |
---|---|
0-5 | 31 |
6-10 | TH |
11-15 | RA |
16-20 | RB |
21-30 | 278 |
31 | / |
Description
The dcbt instruction may improve performance by anticipating a load from the addressed byte. The block containing the byte addressed by the effective address (EA) is fetched into the data cache before the block is needed by the program. The program can later perform loads from the block and may not experience the added delay caused by fetching the block into the cache. Executing the dcbt instruction does not invoke the system error handler.
If general-purpose register (GPR) RA is not 0, the effective address (EA) is the sum of the content of GPR RA and the content of GPR RB. Otherwise, the EA is the content of GPR RB.
Consider the following when using the dcbt instruction:
- If the EA specifies a direct store segment address, the instruction is treated as a no-op.
- The access is treated as a load from the addressed cache block
with respect to protection. If protection does not permit access to
the addressed byte, the dcbt instruction performs no operations.
Note: If a program needs to store to the data cache block, use the dcbtst (Data Cache Block Touch for Store) instruction.
TH Values | Description |
---|---|
0000 | The program will probably soon load from the byte addressed by EA. |
0001 | The program will probably soon load from the data stream consisting of the block containing the byte addressed by EA and an unlimited number of sequentially following blocks. The sequentially preceding blocks are the bytes addressed by EA + n * block_size, where n = 0, 1, 2, and so on. |
0011 | The program will probably soon load from the data stream consisting of the block containing the byte addressed by EA and an unlimited number of sequentially preceding blocks. The sequentially preceding blocks are the bytes addressed by EA - n * block_size where n = 0, 1, 2, and so on. |
1000 | The dcbt instruction provides a hint that describes certain attributes of a data stream, and optionally indicates that the program will probably soon load from the stream. The EA is interpreted as described in Table 1. |
1010 | The dcbt instruction provides a hint that describes certain attributes of a data stream, or indicates that the program will probably soon load from data streams that have been described using dcbt instructions in which TH[0] = 1 or probably no longer load from such data streams. The EA is interpreted as described in Table 2. |
The dcbt instruction serves as both a basic and extended mnemonic. The dcbt mnemonic with three operands is the basic form, and the dcbt with two operands is the extended form. In the extended form, the TH field is omitted and assumed to be 0b0000.
Bit(s) | Name | Description |
---|---|---|
0-56 | EA_TRUNC | High-order 57 bits of the effective address of the first unit of the data stream. |
57 | D | Direction
|
58 | UG |
|
59 | Reserved | Reserved |
60–63 | ID | Stream ID to use for this stream. |
Bit(s) | Name | Description |
---|---|---|
0-31 | Reserved | Reserved |
32 | GO |
|
33-34 | S | Stop
|
35-46 | Reserved | Reserved |
47-56 | UNIT_CNT | Number of units in the data stream. |
57 | T |
|
58 | U |
|
59 | Reserved | Reserved |
60-63 | ID | Stream ID to use for this stream. |
The dcbt instruction has one syntax form and does not affect the Condition Register field 0 or the Fixed-Point Exception register.
Parameters
Item | Description |
---|---|
RA | Specifies source general-purpose register for EA computation. |
RB | Specifies source general-purpose register for EA computation. |
TH | Indicates when a sequence of data cache blocks might be needed. |
Examples
The following code sums the content of a one-dimensional vector:
# Assume that GPR 4 contains the address of the first element
# of the sum.
# Assume 49 elements are to be summed.
# Assume the data cache block size is 32 bytes.
# Assume the elements are word aligned and the address
# are multiples of 4.
dcbt 0,4 # Issue hint to fetch first
# cache block.
addi 5,4,32 # Compute address of second
# cache block.
addi 8,0,6 # Set outer loop count.
addi 7,0,8 # Set inner loop counter.
dcbt 0,5 # Issue hint to fetch second
# cache block.
lwz 3,4,0 # Set sum = element number 1.
bigloop:
addi 8,8,-1 # Decrement outer loop count
# and set CR field 0.
mtspr CTR,7 # Set counter (CTR) for
# inner loop.
addi 5,5,32 # Computer address for next
# touch.
lttlloop:
lwzu 6,4,4 # Fetch element.
add 3,3,6 # Add to sum.
bc 16,0,lttlloop # Decrement CTR and branch
# if result is not equal to 0.
dcbt 0,5 # Issue hint to fetch next
# cache block.
bc 4,3,bigloop # Branch if outer loop CTR is
# not equal to 0.
end # Summation complete.