Exploring the intricacies of confounding and suppression effects in linear regression: A deeper look
In this post, I want to write about some aspects of confounding effects and suppression effects that are often neglected in discussions surrounding these concepts in the context of linear regression. I was prompted to investigate this topic by the following question:
When we run a regression of \(y\in \mathbb{R}^n\) on both \(x_1, x_2\in\mathbb{R}^n\), the \(t\)-tests for both predictors are significant. However, when we run a regression on each predictor individually, the \(t\)-test is insignificant. What could be the reason for this? |
While the answer to this is suppression effects which are visually explained in the StackExchange answers by Jake Westfall and ttnphns, and in Friedman and Wall (2005), I realize some fundamental issues need to emphasize beyond the technical consideration.
This question raises fundamental issues in understanding linear regression. In a typical linear regression book, when introducing linear regression and the related results, it only considers one model at hand with fixed predictor variables. Therefore, the interpretation of regression coefficients and their \(t\)-tests are crystal clear, and there is no ambiguity. However, when considering different linear models with different predictors simultaneously, the interpretation of regression coefficients and their \(t\)-tests becomes ambiguous and subtle.
In this post, I aim to illustrate the three points:
- The interpretation of linear regression coefficients can change when considering different models.
- Changes in the significance of \(t\)-tests are not purely related to the change in the explanation of the variability in \(Y\) after incorporating a confounder or suppressor. It is also closely tied to changes in the target of inference.
- The significance of the \(t\)-test should be carefully considered and not hastily regarded as an indicator of a good predictor.