Welcome to my personal website! Here, I share my thoughts, ideas, and projects related to statistics, probability, machine learning, and more. Check out the Blog to view my full list of articles.

Exploring the intricacies of confounding and suppression effects in linear regression: A deeper look

In this post, I want to write about some aspects of confounding effects and suppression effects that are often neglected in discussions surrounding these concepts in the context of linear regression. I was prompted to investigate this topic by the following question:

When we run a regression of \(y\in \mathbb{R}^n\) on both \(x_1, x_2\in\mathbb{R}^n\), the \(t\)-tests for both predictors are significant. However, when we run a regression on each predictor individually, the \(t\)-test is insignificant. What could be the reason for this?

While the answer to this is suppression effects which are visually explained in the StackExchange answers by Jake Westfall and ttnphns, and in Friedman and Wall (2005), I realize some fundamental issues need to emphasize beyond the technical consideration.

This question raises fundamental issues in understanding linear regression. In a typical linear regression book, when introducing linear regression and the related results, it only considers one model at hand with fixed predictor variables. Therefore, the interpretation of regression coefficients and their \(t\)-tests are crystal clear, and there is no ambiguity. However, when considering different linear models with different predictors simultaneously, the interpretation of regression coefficients and their \(t\)-tests becomes ambiguous and subtle.

In this post, I aim to illustrate the three points:

  • The interpretation of linear regression coefficients can change when considering different models.
  • Changes in the significance of \(t\)-tests are not purely related to the change in the explanation of the variability in \(Y\) after incorporating a confounder or suppressor. It is also closely tied to changes in the target of inference.
  • The significance of the \(t\)-test should be carefully considered and not hastily regarded as an indicator of a good predictor.

Continue reading →

Mastering two pointers technique in LeetCode: 'and' vs 'or' conditions

In this post, we will explore the two pointers technique, focusing on the differences and advantages of using ‘and’ and ‘or’ conditions within while loops. Understanding the subtle differences between these two approaches can be the key to solving LeetCode problems more efficiently.

To illustrate these concepts, we will examine LeetCode problem 21, merging two sorted lists, a classic example for the two pointers technique.

Problem description: Merge two sorted lists

The problem of merging two sorted lists is as follows: given two linked lists list1 and list2, merge them into a single sorted linked list and return the head of the new list.

Here’s an example of how the input and output should look like:

Input: list1 = [1,2,4], list2 = [1,3,4]
Output: [1,1,2,3,4,4]

Continue reading →

Exploring the effects of data replication in linear regression

In this post, we explore the effects of data replication in linear regression, specifically examining how the least squares estimates, noise variance estimates, and \(t\)-statistics change after duplicating data. We consider two variations of this problem and provide solutions to both cases:

  • Case 1: We duplicate both \(x\) and \(y\).
  • Case 2: We only duplicate \(x\) but append \(y\) with 0.

I want to clarify that the solutions presented here follow a principled statistical analysis and may differ from the standard interview solutions that will also be provided for comparison purposes. This difference is due to the fact that the interview setting often involves incomplete or unclear assumptions, such as whether the iid assumption still holds after duplicating data. It is important to keep these differences in mind when considering the solutions presented here.

Continue reading →