Test and set instruction

On occasion within a z/Architecture® configuration, programs executing simultaneously in two or more I-stream engines attempt to execute or change the same area in shared main storage. A common technique for controlling access to a critical region of code is to set a bit, called a lock indicator, indicating the critical region of code is in use and that modifications of shared storage are being done. For example, a bit can be used to indicate that a certain data area is to be exclusively modified by only one I-stream engine at a time. Setting the bit is done prior to entering the critical region of code. Other I-streams ready to modify the same data area check the bit prior to entering the critical region. If they find the bit set, they wait until the bit is free. Unfortunately, testing and setting of the bit can take more than one machine cycle. An I-stream engine that has tested a bit and found the area it controls available can be interrupted before it can set the bit for its own use. When the interrupted I-stream engine returns from the interruption, it (perhaps erroneously) can regard the critical region as available, set the bit, and continue into the critical region. If this happens, more than one I-stream engine could use the critical region, causing severe damage.

The solution to this problem is quite important for operating systems in general. Normally, in the z/TPF system, the controlled data area is a system table shared among all the I-stream engines. Proper serialization of modifications to shared data is critical to the correct operation of the system.

An example showing the need for exclusive control of a shared system table is given in Processor lock. For the moment, there is the additional problem of just synchronizing the bit setting among several I-stream engines.

In a multiple I-stream environment, a problem can result if more than one instruction is used to:
  1. Fetch the bit
  2. Check the bit setting:
    • If off (bit = 0), turn the bit on
    • If on (bit = 1), wait or do something else until the bit is turned off
Let's look at a situation that illustrates the problem: It is necessary for a program that is executing on two I-stream engines at the same time to modify a shared system table without corrupting it. Here is the program, but first note that some liberty has been taken with the instruction formats to avoid the need for introducing unnecessary coding detail.
   BUSY: TM(0)      (Test main storage lock bit for 0)
         BZ OFF     (Branch if off)
         B  BUSY    (Wait for lock bit to become 0)
    OFF: OI(1)      (Set lock bit in main storage to 1)
         'critical' (Critical Region to modify the shared table)
         NI(0)      (Reset the lock bit to 0)
         'exit'

Table 1 shows the relationship of instruction execution within each I-stream engine to time and the setting of the lock indicator at each step. Remember that each instruction requires I-stream engine cycles to gain access to shared main storage under the arbitration previously described. For this timing sequence, assume that the program is executed on two I-streams separated by one cycle (one tick of the I-stream engine clock). The granularity of the test under mask/branch/and-immediate instruction sequence in Table 1 allows the lock indicator to be defeated.

Table 1. Timing sequence
Time I-stream A Lock Indicator (after execution) I-stream B
t(1) TM(0) 0 delayed by arbitration
t(2) BZ OFF 0 TM(0)
t(3) NI(0) 1 BZ OFF
t(4) enter critical 1 NI(1)
t(5) ––– 1 enter critical
This problem can be solved by using one of the following instructions:
  • Test and set (TS)
  • Compare and swap (CS)
  • Compare double and swap (CDS)
  • Compare swap and purge (CSP).

In essence, all of these instructions permit a field to be interrogated reliably and modified in a multiple I-stream engine environment. Test and set operates on a bit, compare and swap operates on a 32-bit field, and both compare double and swap and compare swap and purge operate on a 64-bit field. Their commonality is that they serialize as part of their operation.

Serialization is the process of prioritizing requests that are made at exactly the same instant, causing the requests to occur one after the other. This ensures that main storage is not going to be changed by two different I-stream engines at the same time.

For example, if programs running in multiple I-stream engines simultaneously issue a test and set instruction to check an indicator that is 0, only one I-stream engine is informed that the bit is 0 when the instruction started. All other I-stream engines are shown a 1. Furthermore, when all the I-stream engines finish the execution of the Test and Set instruction, the bit is set to 1 in main storage. Unless you intend to modify some critical system code, learning the details of these instructions is unnecessary. The general idea presented here is necessary to understand some of the system locking procedures, control blocks, tables, and macros.

These instructions, test and set, compare and swap, and compare double and swap, are sometimes called interlocking instructions because they can result in a coordinated delay of several I-stream engines.

The z/TPF system favors the use of the test and set (TS) instruction because frequently, lock indicators are bits that are set to control access to critical regions of system code. The test and set (TS) instruction requires fewer registers than the compare and swap (CS) instruction and requires less execution time because only a single byte needs to be set.

Statistically speaking, in most cases, a shared table is needed by only one I-stream engine at any point in time. If an attempt to access a shared table that is locked occurs, then within the z/TPF system, the program in the other I-stream engine generally enters a loop. The loop consists of testing the indicator for the right to access the shared table. Such a loop is called a spin lock. This loop, a software granule of time for synchronization, takes longer than the time the hardware takes to synchronize the bit setting. However, critical regions of z/TPF system code that update a shared table are usually only a few instructions. Very little time is wasted with spinning, because the spinning is seldom invoked, and if spinning is invoked, it lasts for only a few instructions.