Most propensity score matching (PSM) examples are using cross-sectional data instead of panel data. However, accounting research often uses panel data (i.e., observations with two subscripts i and t, e.g. firm-years) in a difference-in-differences (DID) research design, so that there are two dummy variables, TREATMENT and POST, in the following regression:
Outcome = TREATMENT + POST + TREATMENT * POST
where TREATMENT often indicates an event and POST indicates before or after that event. It is common that we do a one-to-one matching, and it arguably makes more sense that such one-to-one matching is done by using selected pre-event and firm-level variables (Xs). The pre-event variables can be measured either at the most recent date before the event (e.g., the total assets at the most recent quarter end before the event) or at the average over the pre-event period (e.g., the average total assets in the four quarters preceding the event).
We need to do a probit or logit regression for PSM:
TREATMENT = X1 + X2 + …
The single nearest neighbour in terms of propensity score will be selected as the matched control, and then DID regressions can be done subsequently.
psmatch2 is a user-written module to find out matched controls using PSM. First, we need to install the module in Stata by typing:
ssc install psmatch2
Then the following command should work in most cases:
psmatch2 TREATMENT X1 X2 ..., [noreplacement logit descending]
There are three options in the above command:
- noreplacement – perform one-to-one matching without replacement. I would add this option to find more unique matched controls.
- logit – use logit instead of the default probit to estimate the propensity score. I am indifference on this option
- descending – more details about this option can be found in Lunt (2014). The author concludes that “in the absence of a caliper (another option I would omit to maximize matched controls), the descending method provides the best matches, particularly when there is a large separation between exposed (treated) and unexposed (untreated) subjects.” So, I would add this option.
psmatch2 creates a number of variables, of which the following two are the most useful for subsequent DID regressions:
- _id – In the case of one-to-one and nearest-neighbors matching, a new identifier created for all observations.
- _n1 – In the case of one-to-one and nearest-neighbors matching, for every treatment observation, it stores the new identifier (_id) of the matched control observation.
There is a limitation with psmatch2. Sometimes we may want the treatment and its matched control to have the same value on a variable X. For example, we may want the treatment and its matched control to be drawn from the same industry, or both to be male or female. psmatch2 seems incapable on this. Some imperfect solutions are discussed in this post (i.e., adding i.INDUSTRY or i.GENDER in Xs). In contrast, the PSMATCH procedure in SAS seems to have a perfect solution by providing the EXACT= statement (although I don’t know if SAS implements a stratification method. If yes, psmatch2 can also do so by tweaking its options.) More details about the SAS procedure can be found in this manual.
Another conclusion is that psmatch2 is preferable to Stata’s built-in command teffects, because we need the variables generated by psmatch2 (e.g., _id and _n1) for subsequent DID regressions, while teffects do not return such variables.
This article aims at providing a quick how-to and thus ignore some necessary steps for PSM, such as assessing covariate’s balance. More rigorous discussion on PSM in accounting research can be found in Shipman, Swanquist, and Whited (2017).
I benefit from the following articles and Thanks to both authors: