IBM Support

Generating MVN data with a specific covariance matrix

Troubleshooting


Problem

How can I generate data which are multivariate normal and have a covariance or correlation matrix that I specify?

Resolving The Problem

This job can be performed in SPSS (versions 4.0 and above) with the MATRIX command language of SPSS, which is part of the SPSS syntax command language. The general algorithm is described in

Rubinstein, R. (1981), "Simulation and the Monte Carlo Method", New York: Wiley.

The basic strategy is:

1. Before entering MATRIX, generate the required number of normally-distributed variables by using compute commands. These variables should be multivariate normal (MVN) with correlations approaching 0, i.e. approaching independence.

2. To compute a set of variables which are multivariate normal and strictly uncorrelated, run the FACTOR procedure with the variables created in Step 1. Use the default extraction method of principal components, extracting as many components as you included in the analysis. Save these components to a new set of variables with the FACTOR /SAVE subcommand. These new variables will be MVN and uncorrelated, with means of 0 and variances of 1.

3. Enter the MATRIX procedure and read the set of standard normal variables from Step 2 as a matrix, Z for example. Then define and enter the target covariance matrix, S.

4. Calculate the Cholesky factor for the target covariance matrix. The Cholesky factor is an upper triangular matrix which is the "square root" of the covariance matrix. If V is the Cholesky factor of matrix S, and V' is the transpose of V, then V'*V=S , where * is the matrix multiplication operator.

5. Post-multiply the uncorrelated standard normals, Z, by the Cholesky factor, V, to give a new data matrix, X. If the target matrix was a correlation matrix and you want the new variables to have a standard deviation other than 1, then multiply X by the desired SD. If you want the new variables to have a nonzero mean, add the desired mean to X (but not before multiplication by SD).

If the target covariance matrix is not a correlation matrix, then the new variables will have variances equal to their respective diagonal elements in the target matrix. Their means will be 0. The desired mean can be added to the variables with a compute statement, either before or after leaving MATRIX.

6. Save the X matrix to variables in the SPSS active file and leave MATRIX.

An example of the implementation of this algorithm is shown below. The target matrix is a correlation matrix for the five variables to be generated. I added some statements to print the determinant and the
condition number of the input covariance matrix to ensure that it was not singular or ill-conditioned. I also printed the product of V'*V to ensure that the target matrix was reproduced. All the new variables were scaled to have means of 100 and standard deviations of 15 before leaving the MATRIX language.

input program.
+ loop #i = 1 to 10000.
+ do repeat response=r1 to r5 .
+ compute response = normal(1) .
+ end repeat.
+ end case.
+ end loop.
+ end file.
end input program.

correlations r1 to r5 / statistics=descriptives.

* Factor procedure computes pr1 to pr5, which are standard MVN .
factor variables = r1 to r5 / print = default det
/criteria = factors(5) /save=reg (all,pr).

* use matrix to set corr matrix.
* x is a 10,000 by 5 matrix of uncorrelated standard normals .
* cor is the target covariance matrix.
* cho is the Cholesky factor of cor .
* newx is the 10,000 by 5 data matrix which has target covariance matrix .

matrix.
get x / variables=pr1 to pr5.
compute cor={1, 0.4, 0.3, 0.2, 0.1 ;
0.4, 1, 0.4, 0.3, 0.2 ;
0.3, 0.4, 1, 0.4, 0.3 ;
0.2, 0.3, 0.4, 1, 0.4 ;
0.1, 0.2, 0.3, 0.4, 1 }.
compute deter=det(cor).
print deter / title "determinant of corr matrix" / format=f10.7 .
print sval(cor) / title "singular value decomposition of corr".
print eval(cor) / title "eigenvalues of input corr".

* In a symmetric matrix sval and eigenvalues are identical - choose 1 .

compute condnum=mmax(sval(cor))/mmin(sval(cor)).
print condnum / title "condition number of corr matrix" / format=f10.2 .
compute cho=chol(cor).
print cho / title "cholesky factor of corr matrix" .
compute chochek=t(cho)*cho.
print chochek / title "chol factor premult by its transpose " /format=f10.2 .
compute newx=x*cho.
compute newx=newx*15 + 100.
save newx /outfile=* /variables= nr1 to nr5.
end matrix.

correlations nr1 to nr5 / statistics=descriptives.

[{"Product":{"code":"SSLVC7","label":"IBM SPSS Amos"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

19178

Document Information

Modified date:
16 June 2018

UID

swg21476678