Linear regression

Linear regression estimates the coefficients of the linear equation, involving one or more independent variables, that best predict the value of the dependent variable. For example, you can try to predict a salesperson's total yearly sales (the dependent variable) from independent variables such as age, education, and years of experience.

Example
Is the number of games won by a basketball team in a season related to the average number of points the team scores per game? A scatterplot indicates that these variables are linearly related. The number of games won and the average number of points scored by the opponent are also linearly related. These variables have a negative relationship. As the number of games won increases, the average number of points scored by the opponent decreases. With linear regression, you can model the relationship of these variables. A good model can be used to predict how many games teams will win.
Statistics
For each variable: number of valid cases, mean, and standard deviation. For each model: regression coefficients, correlation matrix, part and partial correlations, multiple R, R 2, adjusted R 2, change in R 2, standard error of the estimate, analysis-of-variance table, predicted values, and residuals. Also, 95%-confidence intervals for each regression coefficient, variance-covariance matrix, variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook, and leverage values), DfBeta, DfFit, prediction intervals, and casewise diagnostic information. Plots: scatterplots, partial plots, histograms, and normal probability plots.

Data considerations

Data
The dependent and independent variables should be quantitative. Categorical variables, such as religion, major field of study, or region of residence, need to be recoded to binary (dummy) variables or other types of contrast variables.
Assumptions
For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear, and all observations should be independent.

Obtaining a linear regression analysis

This feature requires Statistics Base Edition.

  1. From the menus choose:

    Analyze > Association and prediction > Linear regression

  2. Click Select variable under the Dependent variable section and select a single, numeric dependent variable whose variation will be studied. Click OK after selecting the variable.
  3. Click Select variables under the Independent variables section and select one or more numeric independent variables. Independent variables predict the value of the dependent variable. Click OK after selecting the variables.
  4. Optionally, you can group independent variables into blocks and specify different entry methods for different variable subsets.
    1. Click Add under the Independent variables section to produce additional variable group blocks.
    2. Click Select variables and select one or more numeric independent variables to add to the new block. Click OK after selecting the variables.
    3. For each block, select an entry Method. The method determines how independent variables are entered into the analysis. For more information, see Linear regression: Variable selection methods.
  5. Optionally, you can click Case selection variable to choose a selection variable that limits the analysis to a subset of cases having a particular values for the selected variable. You can set the value of the selection variable by clicking the rule link (for example, Define selection rule*) that displays next to the case selection variable. For more information, see Linear regression: Define selection rule. Click OK after selecting the variable.
  6. Optionally, you can click Case identification variable to select a case identification variable for identifying points on plots. Click OK after selecting the variable.
  7. Optionally, you can click WLs weight variable to select a variable that performs weighted least squares analysis. Allows you to obtain a weighted least-squares model. Data points are weighted by the reciprocal of their variances. This means that observations with large variances have less impact on the analysis than observations associated with small variances. If the value of the weighting variable is zero, negative, or missing, the case is excluded from the analysis. Click OK after selecting the variable.
  8. Optionally, you can select the following options from the Additional settings menu:
    • Click Statistics to select which statistics are included in the procedure.
    • Click Plots to specify chart settings.
    • Click Options to specify constants, stepping methods, and missing value settings.
    • Click Save to dataset to add values predicted by the model, residuals, and related measures to the dataset as new variables.
    • Click Model export to export parameter estimates and their covariances to the specified file in XML format.
    • Click Bootstrap for deriving robust estimates of standard errors and confidence intervals for estimates such as the mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.
  9. Click Run analysis.

This procedure pastes REGRESSION command syntax.