Use Stata to do propensity score matching (PSM)

Most propensity score matching (PSM) examples are using cross-sectional data instead of panel data. However, accounting research often uses panel data (i.e., observations with two subscripts i and t, e.g. firm-years) in a difference-in-differences (DID) research design, so that there are two dummy variables, TREATMENT and POST, in the following regression:

Outcome = TREATMENT + POST + TREATMENT * POST

where TREATMENT often indicates an event and POST indicates before or after that event. It is common that we do a one-to-one matching, and it arguably makes more sense that such one-to-one matching is done by using selected pre-event and firm-level variables (Xs). The pre-event variables can be measured either at the most recent date before the event (e.g., the total assets at the most recent quarter end before the event) or at the average over the pre-event period (e.g., the average total assets in the four quarters preceding the event).

We need to do a probit or logit regression for PSM:

TREATMENT = X1 + X2 + …

The single nearest neighbour in terms of propensity score will be selected as the matched control, and then DID regressions can be done subsequently.

psmatch2 is a user-written module to find out matched controls using PSM. First, we need to install the module in Stata by typing:

Then the following command should work in most cases:

There are three options in the above command:

  • noreplacement – perform one-to-one matching without replacement. I would add this option to find more unique matched controls.
  • logit – use logit instead of the default probit to estimate the propensity score. I am indifference on this option
  • descending – more details about this option can be found in Lunt (2014). The author concludes that “in the absence of a caliper (another option I would omit to maximize matched controls), the descending method provides the best matches, particularly when there is a large separation between exposed (treated) and unexposed (untreated) subjects.” So, I would add this option.

psmatch2 creates a number of variables, of which the following two are the most useful for subsequent DID regressions:

  • _id – In the case of one-to-one and nearest-neighbors matching, a new identifier created for all observations.
  • _n1 – In the case of one-to-one and nearest-neighbors matching, for every treatment observation, it stores the new identifier (_id) of the matched control observation.

There is a limitation with psmatch2. Sometimes we may want the treatment and its matched control to have the same value on a variable X. For example, we may want the treatment and its matched control to be drawn from the same industry, or both to be male or female. psmatch2 seems incapable on this. Some imperfect solutions are discussed in this post (i.e., adding i.INDUSTRY or i.GENDER in Xs). In contrast, the PSMATCH procedure in SAS seems to have a perfect solution by providing the EXACT= statement (although I don’t know if SAS implements a stratification method. If yes, psmatch2 can also do so by tweaking its options.) More details about the SAS procedure can be found in this manual.

Another conclusion is that psmatch2 is preferable to Stata’s built-in command teffects, because we need the variables generated by psmatch2 (e.g., _id and _n1) for subsequent DID regressions, while teffects do not return such variables.

This article aims at providing a quick how-to and thus ignore some necessary steps for PSM, such as assessing covariate’s balance. More rigorous discussion on PSM in accounting research can be found in Shipman, Swanquist, and Whited (2017).

I benefit from the following articles and Thanks to both authors:

This entry was posted in Stata. Bookmark the permalink.

4 Responses to Use Stata to do propensity score matching (PSM)

  1. C says:

    Many thanks for sharing this!
    I just got one questions: how do I perform DiD after the psmatch2? Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *