Stata command to perform propensity score matching (PSM)

Most propensity score matching (PSM) examples typically use cross-sectional data rather than panel data. However, in accounting research, panel data (observations with two subscripts i and t, e.g., firm-years) are often used in a difference-in-differences (DID) research design. This involves two dummy variables, TREATMENT and POST , in the following regression:

Outcome = TREATMENT + POST + TREATMENT * POST

where TREATMENT indicates a treatment event and POST indicates before or after that event. In this context, it is common to perform one-to-one matching using selected pre-event and firm-level variables (Xs). These pre-event variables can be measured either at the most recent date before the event (e.g., total assets at the most recent quarter end before the event) or as an average over the pre-event period (e.g., average total assets in the four quarters preceding the event).

To conduct PSM, a probit or logit regression is needed:

TREATMENT = X1 + X2 + …

The single nearest neighbour based on propensity score is selected as the matched control observation. The treatment observations and their respective matched control observations then form the sample for subsequent DID regressions.

In Stata, the third-party module psmatch2 is commonly used to find matched control observations using PSM. To install the module, the following command can be used:

ssc install psmatch2

Once installed, the following command is typically used:

psmatch2 TREATMENT X1 X2 ..., [noreplacement logit descending]

There are three options in the above command:

noreplacement – Perform one-to-one matching without replacement. I would add this option if I want to find more unique matches.
logit – Uses logit instead of the default probit regression to estimate the propensity score. I would be indifference between using logit and probit.
descending – More details about this option can be found in Lunt (2014). The author concludes that “in the absence of a caliper (another option that I would omit to maximize the number of matches), the descending method provides the best matches, particularly when there is a large separation between exposed (treated) and unexposed (untreated) subjects.” Therefore, I would add this option.

psmatch2 creates several variables, with _id and _n1 being the most useful for subsequent DID regressions:

_id is a new identifier created for all observations in the case of one-to-one and nearest-neighbors matching.
_n1 stores the new identifier (_id) of the matched control observation for every treatment observation.

There is one limitation with psmatch2. Sometimes, we may want the treatment and its matched control to have the same value on a variable X. For example, we may want the treatment and its matched control to be drawn from the same industry, or both to be male or female. psmatch2 lacks a direct solution for this requirement. Some imperfect workarounds, such as adding i.industry or i.gender in Xs, are discussed in this post. In contrast, the PSMATCH procedure in SAS provides a perfect solution by offering the EXACT= statement. I am not sure if SAS achieves this by implementing a stratification method, but if it does, it is possible that psmatch2 in Stata could achieve similar results by tweaking its options. More details on the PSMATCH procedure in SAS can be found in this manual.

It is worth noting that that psmatch2 is preferable to Stata’s built-in command teffects because the variables generated by psmatch2 (particularly _id and _n1) are necessary for subsequent DID regressions, whereas teffects does not return such variables.

This article aims to provide a quick how-to and may omit some necessary steps for PSM, such as assessing covariate balance. A more rigorous discussion on PSM in accounting research can be found in Shipman, Swanquist, and Whited (2017).

I would like to express my gratitude to the authors of the following articles that have been beneficial in preparing this post:

10 Responses to Stata command to perform propensity score matching (PSM)

C says:

November 1, 2021 at 1:48 pm

Many thanks for sharing this!
I just got one questions: how do I perform DiD after the psmatch2? Thanks!

- Khan says:
  
  January 11, 2022 at 12:21 pm
  
  Did you ever get an answer?
  
  - Don says:
    
    April 11, 2022 at 4:05 pm
    
    Khan, I am still confused how to use (_id, _n1) by psmatch2 and do post DiD analysis. I was wondering if you could give us pseudo stata code with explanation
    
    - tintun says:
      
      December 6, 2022 at 3:39 am
      
      waiting for the answer too
      
      - Kai Chen says:
        
        May 12, 2023 at 11:07 am
        
        Basically, there are two steps involved. In Step 1, we have treatment firms and all other firms in the pre-treatment period. These non-treatment firms are potential candidates for control firms. Our goal is to find the best-matched control firm(s) for each treatment firm using only pre-treatment data. There are multiple matching methods available, and one of them is PSM, which is the focus of this post.
        
        In Step 2, we include the observations from treatment firms and the selected control firms in DID regressions using all the data from both the pre-treatment and post-treatment periods.
- hannes says:
  
  May 15, 2023 at 4:09 pm
  
  https://www.statalist.org/forums/forum/general-stata-discussion/general/1669473-how-to-use-results-of-psmatch2-in-regression
  
  reghdfe dv treated controls [w=_weight], absorb(…) vce(…)
  
  - Kai Chen says:
    
    May 15, 2023 at 5:02 pm
    
    Thank you for your input. However, reghdfe is a third-party module for regressions with many levels of fixed effects, which I don’t think is necessary here.
    
Simon says:

October 2, 2023 at 9:42 am

Hi Chen,

Your information is very helpful. I am still uncertain, however, how the output of this can be put into a diff-in-diff regression.

If i understand the theory right, i need a variable indicating two observations are in a match, is that correct?

If that is correct then i need to use _id and _n1 to produce such a variable, with a unique value for every matched pair. Is that correct? If so how would i go about creating such a variable? Or if i’m wrong, where have i gone wrong? Thanks in advance.

Best regards

Victor says:

October 6, 2023 at 1:33 am

There is a new option – kmatch, which allows you to exactly match on certain variables (e.g., in the same industry or geographic area).

For more, please visit:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1392744-kmatch-new-command-for-multivariate-distance-and-propensity-score-matching

Pontus Hällgren says:

April 15, 2024 at 10:17 am

Did you get an answer on how to run a DiD regression based on the new variables?

Stata command to perform propensity score matching (PSM)

10 Responses to Stata command to perform propensity score matching (PSM)

Leave a Reply Cancel reply

Categories

Archives

Site Admin