SGEMM, DGEMM, CGEMM, and ZGEMM (Combined Matrix Multiplication and Addition for General Matrices, Their Transposes, or Conjugate Transposes)

Purpose

SGEMM and DGEMM can perform any one of the following combined matrix computations, using scalars α and β, matrices A and B or their transposes, and matrix C:

SGEMM and DGEMM Combined Matrix Computations
C ← αAB+βC	C ← αAB^T+βC
C ← αA^TB+βC	C ← αA^TB^T+βC

CGEMM and ZGEMM can perform any one of the following combined matrix computations, using scalars α and β, matrices A and B, their transposes or their conjugate transposes, and matrix C:

CGEMM and ZGEMM Combined Matrix Computations
C ← αAB+βC	C ← αAB^T+βC	C ← αAB^H+βC
C ← αA^TB+βC	C ← αA^TB^T+βC	C ← αA^TB^H+βC
C ← αA^HB+βC	C ← αA^HB^T+βC	C ← αA^HB^H+βC

Table 1. Data Types
Data Types
A, B, C, α, β	Subroutine
Short-precision real	SGEMM
Long-precision real	DGEMM
Short-precision complex	CGEMM
Long-precision complex	ZGEMM

Note: On certain processors, SIMD algorithms may be used if alignment requirements are met. For further details, see Use of SIMD Algorithms by Some Subroutines in the Libraries Provided by ESSL.

Syntax

Language	Syntax
Fortran	CALL SGEMM \| DGEMM \| CGEMM \| ZGEMM (`transa`, `transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`)
C and C++	sgemm \| dgemm \| cgemm \| zgemm (`transa`, `transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);
CBLAS	cblas_sgemm \| cblas_dgemm \| cblas_cgemm \| cblas_zgemm (`cblas_layout`, `cblas_transa`, `cblas_transb`, `l`, `n`, `m`, `alpha`, `a`, `lda`, `b`, `ldb`, `beta`, `c`, `ldc`);

On Entry

cblas_layout

indicates whether the input and output matrices are stored in row major order or column major order, where:

If cblas_layout = CblasRowMajor, the matrices are stored in row major order.
If cblas_layout = CblasColMajor, the matrices are stored in column major order.

Specified as: an object of enumerated type CBLAS_LAYOUT. It must be CblasRowMajor or CblasColMajor.

transa

indicates the form of matrix A to use in the computation, where:

If transa = 'N', A is used in the computation.

If transa = 'T', A^T is used in the computation.

If transa = 'C', A^H is used in the computation.

Specified as: a single character; transa = 'N', 'T', or 'C'.

cblas_transa

indicates the form of matrix A to use in the computation, where:

If cblas_transa = CblasNoTrans, A is used in the computation.

If cblas_transa = CblasTrans, A^T is used in the computation.

If cblas_transa = CblasConjTrans, A^H is used in the computation.

Specified as: an object of enumerated type CBLAS_TRANSPOSE. It must be CblasNoTrans, CblasTrans, or CblasConjTrans.

transb

indicates the form of matrix B to use in the computation, where:

If transb = 'N', B is used in the computation.

If transb = 'T', B^T is used in the computation.

If transb = 'C', B^H is used in the computation.

Specified as: a single character; transb = 'N', 'T', or 'C'.

cblas_transb

indicates the form of matrix B to use in the computation, where:

If cblas_transb = CblasNoTrans, B is used in the computation.

If cblas_transb = CblasTrans, B^T is used in the computation.

If cblas_transb = CblasConjTrans, B^H is used in the computation.

Specified as: an object of enumerated type CBLAS_TRANSPOSE. It must be CblasNoTrans, CblasTrans, or CblasConjTrans.

l

is the number of rows in matrix C.

Specified as: an integer; 0 lldc.

n

is the number of columns in matrix C.

Specified as: an integer; n0.

m

has the following meaning, where:

If transa = 'N', it is the number of columns in matrix A.

If transa = 'T' or 'C', it is the number of rows in matrix A.

In addition:

If transb = 'N', it is the number of rows in matrix B.

If transb = 'T' or 'C', it is the number of columns in matrix B.

Specified as: an integer; m0.

alpha

is the scalar α.

Specified as: a number of the data type indicated in Table 1.

a

is the matrix A, where:

If transa = 'N', A is used in the computation, and A has l rows and m columns.

If transa = 'T', A^T is used in the computation, and A has m rows and l columns.

If transa = 'C', A^H is used in the computation, and A has m rows and l columns.

Note: No data should be moved to form A^T or A^H; that is, the matrix A should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1, where:

If transa = 'N', its size must be lda by (at least) m.

If transa = 'T' or 'C', its size must be lda by (at least) l.

lda

is the leading dimension of the array specified for a.

Specified as: an integer; lda > 0 and:

If transa = 'N', ldal.

If transa = 'T' or 'C', ldam.

b

is the matrix B, where:

If transb = 'N', B is used in the computation, and B has m rows and n columns.

If transb = 'T', B^T is used in the computation, and B has n rows and m columns.

If transb = 'C', B^H is used in the computation, and B has n rows and m columns.

Note: No data should be moved to form B^T or B^H; that is, the matrix B should always be stored in its untransposed form.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1, where:

If transb = 'N', its size must be ldb by (at least) n.

If transb = 'T' or 'C', its size must be ldb by (at least) m.

ldb

is the leading dimension of the array specified for b.

Specified as: an integer; ldb > 0 and:

If transb = 'N', ldbm.

If transb = 'T' or 'C', ldbn.

beta

is the scalar β.

Specified as: a number of the data type indicated in Table 1.

c

is the l by n matrix C.

Specified as: a two-dimensional array, containing numbers of the data type indicated in Table 1.

ldc

is the leading dimension of the array specified for c.

Specified as: an integer; ldc > 0 and ldcl.

On Return
c: is the l by n matrix C, containing the results of the computation. Returned as: an ldc by (at least) n array, containing numbers of the data type indicated in Table 1.

Notes

All subroutines accept lowercase letters for the transa and transb arguments.
For SGEMM and DGEMM, if you specify 'C' for the transa or transb argument, it is interpreted as though you specified 'T'.
Matrix C must have no common elements with matrices A or B; otherwise, results are unpredictable. See Vector concepts.

Function

The combined matrix addition and multiplication is expressed as follows, where a_ik, b_kj, and c_ij are elements of matrices A, B, and C, respectively:

Combined Matrix Addition and Multiplication Graphic

See references [42] and [48]. In the following three cases, no computation is performed:

l is 0.
n is 0.
β is 1 and α is 0.

Assuming the above conditions do not exist, if β ≠ 1 and m is 0, then βC is returned.

Special Usage

Equivalence Rules: The equivalence rules, defined for matrix multiplication of A and B in Special Usage, also apply to the matrix multiplication part of the computation performed by this subroutine. You should use the equivalent rules when you want to transpose or conjugate transpose the multiplication part of the computation. When coding the calling sequences for these cases, be careful to code your matrix arguments and dimension arguments in the order indicated by the rule. Also, be careful that your input and output array C has dimensions large enough to hold the resulting matrix. See Example 4.

Error conditions

Resource Errors

Unable to allocate internal work area.

Computational Errors

None

Input-Argument Errors

cblas_layout ≠ CblasRowMajor or CblasColMajor
lda, ldb, ldc0
l, m, n < 0
l > ldc
transa, transb ≠ 'N', 'T', or 'C'
transa = 'N' and l > lda
transa = 'T' or 'C' and m > lda
cblas_transa ≠ CblasNoTrans, CblasTrans, or CblasConjTrans
cblas_transa = CblasNoTrans and l > lda
cblas_transa = CblasTrans, or CblasConjTrans and m > lda
transb = 'N' and m > ldb
transb = 'T' or 'C' and n > ldb
cblas_transb ≠ CblasNoTrans, CblasTrans, or CblasConjTrans
cblas_transb = CblasNoTrans and m > ldb
cblas_transb = CblasTrans, or CblasConjTrans and n > ldb

Examples

Example 1

This example shows the computation C←αAB+βC, where A, B, and C are contained in larger arrays A, B, and C, respectively.

Call Statement and Input:

           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'N'  , 6 , 4 , 5 , 1.0 , A , 8 , B , 6 , 2.0 , C , 7 )

                                       
        |  1.0   2.0  -1.0  -1.0   4.0 |
        |  2.0   0.0   1.0   1.0  -1.0 |
        |  1.0  -1.0  -1.0   1.0   2.0 |
A    =  | -3.0   2.0   2.0   2.0   0.0 |
        |  4.0   0.0  -2.0   1.0  -1.0 |
        | -1.0  -1.0   1.0  -3.0   2.0 |
        |   .     .     .     .     .  |
        |   .     .     .     .     .  |

                                 
        |  1.0  -1.0   0.0   2.0 |
        |  2.0   2.0  -1.0  -2.0 |
B    =  |  1.0   0.0  -1.0   1.0 |
        | -3.0  -1.0   1.0  -1.0 |
        |  4.0   2.0  -1.0   1.0 |
        |   .     .     .     .  |

                             
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        | 0.5  0.5  0.5  0.5 |
        |  .    .    .    .  |

Output:

                                 
        | 24.0  13.0  -5.0   3.0 |
        | -3.0  -4.0   2.0   4.0 |
        |  4.0   1.0   2.0   5.0 |
C    =  | -2.0   6.0  -1.0  -9.0 |
        | -4.0  -6.0   5.0   5.0 |
        | 16.0   7.0  -4.0   7.0 |
        |   .     .     .     .  |

Example 2

This example shows the computation C←αAB^T+βC, where A and C are contained in larger arrays A and C, respectively, and B is the same size as array B in which it is contained.

Call Statement and Input:

           TRANSA TRANSB  L   N   M  ALPHA  A  LDA  B  LDB  BETA  C  LDC
             |      |     |   |   |    |    |   |   |   |    |    |   |
CALL SGEMM( 'N'  , 'T'  , 3 , 3 , 2 , 1.0 , A , 4 , B , 3 , 2.0 , C , 5 )

                    
        | 1.0  -3.0 |
A    =  | 2.0   4.0 |
        | 1.0  -1.0 |
        |  .     .  |

                    
        | 1.0  -3.0 |
B    =  | 2.0   4.0 |
        | 1.0  -1.0 |

                        
        | 0.5  0.5  0.5 |
        | 0.5  0.5  0.5 |
C    =  | 0.5  0.5  0.5 |
        |  .    .    .  |
        |  .    .    .  |

Output:

                           
        | 11.0  -9.0   5.0 |
        | -9.0  21.0  -1.0 |
C    =  |  5.0  -1.0   3.0 |
        |   .     .     .  |
        |   .     .     .  |

Example 3

This example shows the computation C←αAB+βC using complex data, where A, B, and C are contained in larger arrays, A, B, and C, respectively.

Call Statement and Input:

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'N'  , 6 , 2 , 3 , ALPHA , A , 8 , B , 4 , BETA , C , 8 )

ALPHA    =  (1.0, 0.0)
BETA     =  (2.0, 0.0)
 
                                             
        | (1.0, 5.0)  (9.0, 2.0)  (1.0, 9.0) |
        | (2.0, 4.0)  (8.0, 3.0)  (1.0, 8.0) |
        | (3.0, 3.0)  (7.0, 5.0)  (1.0, 7.0) |
A    =  | (4.0, 2.0)  (4.0, 7.0)  (1.0, 5.0) |
        | (5.0, 1.0)  (5.0, 1.0)  (1.0, 6.0) |
        | (6.0, 6.0)  (3.0, 6.0)  (1.0, 4.0) |
        |     .           .           .      |
        |     .           .           .      |

                                 
        | (1.0, 8.0)  (2.0, 7.0) |
B    =  | (4.0, 4.0)  (6.0, 8.0) |
        | (6.0, 2.0)  (4.0, 5.0) |
        |     .           .      |

                                 
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
C    =  | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        | (0.5, 0.0)  (0.5, 0.0) |
        |     .           .      |
        |     .           .      |

Output:

                                         
        | (-22.0, 113.0)  (-35.0, 142.0) |
        | (-19.0, 114.0)  (-35.0, 141.0) |
        | (-20.0, 119.0)  (-43.0, 146.0) |
C    =  | (-27.0, 110.0)  (-58.0, 131.0) |
        |   (8.0, 103.0)    (0.0, 112.0) |
        | (-55.0, 116.0)  (-75.0, 135.0) |
        |       .               .        |
        |       .               .        |

Example 4

This example shows how to obtain the conjugate transpose of AB^H.

This shows the conjugate transpose of the computation performed in Example 8 for CGEMUL, which uses the following calling sequence:

CALL CGEMUL( A , 4 , 'N' , B , 3 , 'C' , C , 4 , 3 , 2 , 3 )

You instead code the calling sequence for C←βC+αBA^H, where β = 0, α = 1, and the array C has the correct dimensions to receive the transposed matrix. Because β is zero, βC = 0. For a description of all the matrix identities, see Special Usage.

Call Statement and Input:

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'N'  , 'C'  , 3 , 3 , 2 , ALPHA , B , 3 , A , 3 , BETA , C , 4 )

ALPHA    =  (1.0, 0.0)
BETA     =  (0.0, 0.0)
 
                                  
        | (1.0, 3.0)  (-3.0, 2.0) |
B    =  | (2.0, 5.0)   (4.0, 6.0) |
        | (1.0, 1.0)  (-1.0, 9.0) |

                                  
        | (1.0, 2.0)  (-3.0, 2.0) |
A    =  | (2.0, 6.0)   (4.0, 5.0) |
        | (1.0, 2.0)  (-1.0, 8.0) |
        |     .            .      |

C =(not relevant)

Output:

                                                     
        | (20.0,   1.0)  (18.0, 23.0)  (26.0,  23.0) |
C    =  | (12.0, -25.0)  (80.0,  2.0)  (56.0, -37.0) |
        | (24.0, -26.0)  (49.0, 37.0)  (76.0,  -2.0) |
        |      .              .             .        |

Example 5

This example shows the computation C←αA^TB^H+βC using complex data, where A, B, and C are the same size as the arrays A, B, and C, in which they are contained. Because β is zero, βC = 0. (Based on the dimensions of the matrices, A is actually a column vector, and C is actually a row vector.)

Call Statement and Input:

           TRANSA TRANSB  L   N   M   ALPHA   A  LDA  B  LDB  BETA   C  LDC
             |      |     |   |   |     |     |   |   |   |    |     |   |
CALL CGEMM( 'T'  , 'C'  , 1 , 3 , 3 , ALPHA , A , 3 , B , 3 , BETA , C , 1 )

ALPHA    =  (1.0, 1.0)
BETA     =  (0.0, 0.0)
 
                      
        | (1.0,  2.0) |
A    =  | (2.0,  5.0) |
        | (1.0,  6.0) |

                                               
        | (1.0, 6.0)  (-3.0, 4.0)   (2.0, 6.0) |
B    =  | (2.0, 3.0)   (4.0, 6.0)   (0.0, 3.0) |
        | (1.0, 3.0)  (-1.0, 6.0)  (-1.0, 9.0) |

C =(not relevant)

Output:

                                                  
C    =  | (86.0, 44.0) (58.0, 70.0) (121.0, 55.0) |