News
Abstract
March 2019 PTF for Enterprise COBOL V6.2 improves performance impact of using NUMCHECK option used for detecting invalid COBOL data or invalid COBOL programs at run-time.
Content
Original Behavior
NUMCHECK is an option added to Enterprise COBOL V5.2, V6.1, and V6.2 to help detect invalid PACKED-DECIMAL, USAGE DISPLAY, and BINARY/COMP/COMP-4 data. NUMCHECK works by inserting checks that behave like IS NUMERIC (for packed and zoned) and ON SIZE ERROR (for binary) into the code. If a check fails, NUMCHECK will either produce a runtime message, when NUMCHECK(*,MSG) is specified, or an abend, when NUMCHECK(*,ABD) is specified. NUMCHECK adds a check in every COBOL statement where a data item of the correct type(s) was used as a sender. NUMCHECK is thorough but a lot of the checks can be redundant (checking values that have already been checked) or check values that are known to be valid or known to be invalid at compile time.
01 B PIC 9(3) BINARY VALUE 60. 01 P1 PIC X(3) VALUE x'FFFFFF'. 01 P REDEFINES P1 PIC S9(5) PACKED-DECIMAL. 01 Z PIC 9(5) USAGE DISPLAY. 01 X1 PIC X(5) VALUE SPACES. 01 Z2 REDEFINES X1 PIC 9(5) USAGE DISPLAY. 01 W PIC 9(5). 01 Q PIC 9(5) BINARY. 01 R PIC S9(5) BINARY. PROCEDURE DIVISION. 01 DISPLAY Z. 02 DISPLAY P 03 DISPLAY B. 04 MOVE 3 TO Q. 05 DISPLAY Q. 06 IF Z > 5 THEN 07 MOVE W TO Z 08 MOVE 0 TO R 09 ELSE 10 MOVE W TO R 11 END-IF. 12 DISPLAY W. 13 DISPLAY R. 14 MOVE Z TO P. 15 COMPUTE R = B + P. 16 DISPLAY Z2. 17 STOP RUN.
In this sample program, there will be checks on all lines but 4 (MOVE 3 TO Q), 8 (MOVE 0 TO R), 9 (ELSE), 11 (END-IF), and 17 (STOP RUN); those are the statements that don't have a numeric data item used as a sender.
Performance Concerns
Some of our clients want to use NUMCHECK in production, just as some use SSRANGE in production already. Both options add additional checks to the code, and those checks are executed each time the code for a statement is executed at runtime. In the worst case, a check within a loop will be made many times. For some programs, the performance impact of NUMCHECK is going to be quite noticeable. Uses of BINARY, PACKED, and USAGE-DISPLAY data are also likely to be much more common than accessing tables and doing reference modifications, so the impact of NUMCHECK is generally worse than the impact of SSRANGE, especially when using NUMCHECK to check all three types at once.
With the March 2019 PTF for Enterprise COBOL V6.2, we have improved NUMCHECK so that fewer checks are made, while still ensuring that invalid data is always detected.
Redundant Checks
Some of the checks for this program are obviously redundant. For example, Z is checked on line 1, and checked again on line 6, despite not having changed in between. Q is checked on line 5, despite having a known good value moved into it on line 4. The improved version of NUMCHECK identifies and removes redundant checks and known-good checks, identifies known-bad checks, and, if NUMCHECK(ABD) is set, removes known-bad checks.
In the program above, the following checks are redundant:
5: DISPLAY Q: Q was given the value 3 on line 4; we know it's good
6: IF Z > 5: Z was checked on line 1 and hasn't changed in between
12: DISPLAY W: W has been checked on lines 7 and 10, regardless of whether the IF or ElSE is taken
14: MOVE Z TO P: Z has been checked on line 1 and line 6. It may have changed in between on line 7, but W is checked on line 7, so the value moved into Z was already checked. Thus, Z has been checked, one way or another, by line 14.
15: COMPUTE R = B + P: B hasn't changed in the program and was checked on line 3. P was changed on line 14, but Z was checked on line 14, so the value in P doesn't also need to be checked.
Note the comments about lines 14 and 15. If a data item is checked and its value is moved or used in subsequent statements, subsequent receivers could also have invalid values. However, those values come from an invalid source that has already been checked; displaying another message about them doesn't help in fixing the problem. By removing these checks, the amount of investigation required to validate and correct the program after getting NUMCHECK errors is reduced.
The checks on lines 2 and 3 aren't considered redundant, despite B and P having VALUE clauses. This is because without the program being defined as IS INITIAL, there's no guarantee that those data items will have the same value as specified in their VALUE clause at the start of the program, so they must still be checked.
What does the improved NUMCHECK do?
The compiler frontend (the part that analyzes the COBOL syntax) is the part that generates checks. It behaves like it used to, generating checks each time a numeric item is used as a sender. The optimizer then has a new optimization that removes redundant checks.
At OPT(0), this optimization does simple analysis, only removing checks that are redundant in straight-line code (such as the check for Z on line 6). It wouldn't be able to remove checks like DISPLAY W on line 12 because the simpler analysis wouldn't detect that W is checked on both paths to line 12 (lines 7 and line 10). Rather, the optimization would see the check on line 12 as a check made at the start of a new section of straight-line code. OPT(0) does less analysis in order to have minimal impact to the time required to compile programs.
At OPT(1) and OPT(2), this optimization does more advanced analysis, looking at the flow of data through the whole program. Checks like the DISPLAY W on line 12 would be removed. Also, at OPT (1|2) only, we run a different optimization, before the NUMCHECK optimization, that identifies data items with a VALUE clause that are never written to, directly or through parent items. In this case, we can guarantee that not only will those items have the value specified in their VALUE clause on the first run of a program, but on subsequent runs as well. The prior optimization replaces all uses of those data items with the values specified in the VALUE clauses. This also enables NUMCHECK to remove more known-good and known-bad tests, such as the checks for B on lines 3 and 15; they are both known good, so they are removed.
Known-Bad Checks
The behavior of the compiler when a known-bad check is found by the NUMCHECK optimization at compile time depends on whether NUMCHECK(MSG) or NUMCHECK(ABD) is specified. With NUMCHECK(ABD), runtime checks that fail will cause an abend, and the problem must be fixed and the program recompiled, if the invalid data came from within the program, in order for the check to pass in the future. Keeping with this behavior, compile-time known failures cause the compiler to produce an error message, preventing the program from being compiled.
With NUMCHECK(MSG), runtime checks don't cause an abend, so with compile-time known failures, the compiler produces a warning message, which doesn't prevent the program from being compiled. In addition, the check is preserved, which will cause a message to be generated at runtime as well. It is ideal to fix known-bad checks without having to do any testing first, but NUMCHECK(MSG) allows clients to be aware of invalid data issues and fix them at a time of their choosing. So, for known-bad checks with NUMCHECK(MSG), the compiler preserves the check (so someone viewing the runtime logs will be aware of the issues) and gives a compile-time message (so COBOL developers can make a change before testing if they want), without forcing compilation to fail.
Summary: Differences In Behavior
- There will be fewer checks made after the March 2019 PTF for V6.2 than before.
- There may be more checks made at OPT(0) than OPT(1), and at OPT(1) than at OPT(2).
- There may be fewer compile-time messages reported at OPT(0) than at OPT(1), and at OPT(1) than at OPT(2).
- Checks for data items with known bad data will be removed and a compile-time error will be generated when compiling with NUMCHECK(ABD).
- Checks for data items with known bad data will be preserved unless redundant, and a compile-time message and runtime message will be generated when compiling with NUMCHECK(MSG).
- If invalid data is moved between data items, only the first use is checked; this can also cause fewer messages after the March 2019 PTF for V6.2 than before.
Related Information
Was this topic helpful?
Document Information
Modified date:
25 June 2021
UID
ibm10879025