*Outcome = TREATMENT + POST + TREATMENT * POST*

where *TREATMENT* often indicates an event and *POST* indicates before or after that event. It is common that we do a one-to-one matching, and it arguably makes more sense that such one-to-one matching is done by using selected **pre-event** and **firm-level** variables (*X*s). The pre-event variables can be measured either at the most recent date before the event (e.g., the total assets at the most recent quarter end before the event) or at the average over the pre-event period (e.g., the average total assets in the four quarters preceding the event).

We need to do a probit or logit regression for PSM:

*TREATMENT = X1 + X2 + …*

The single nearest neighbour in terms of propensity score will be selected as the matched control, and then DID regressions can be done subsequently.

** psmatch2** is a user-written module to find out matched controls using PSM. First, we need to install the module in Stata by typing:

ssc install psmatch2

Then the following command should work in most cases:

psmatch2 TREATMENT X1 X2 ..., [noreplacement logit descending]

There are three options in the above command:

- noreplacement – perform one-to-one matching without replacement. I would add this option to find more unique matched controls.
- logit – use logit instead of the default probit to estimate the propensity score. I am indifference on this option
- descending – more details about this option can be found in Lunt (2014). The author concludes that “in the absence of a caliper (another option I would omit to maximize matched controls), the descending method provides the best matches, particularly when there is a large separation between exposed (treated) and unexposed (untreated) subjects.” So, I would add this option.

** psmatch2** creates a number of variables, of which the following two are the most useful for subsequent DID regressions:

- _id – In the case of one-to-one and nearest-neighbors matching, a new identifier created for all observations.
- _n1 – In the case of one-to-one and nearest-neighbors matching, for every treatment observation, it stores the new identifier (_id) of the matched control observation.

There is a limitation with ** psmatch2**. Sometimes we may want the treatment and its matched control to have the same value on a variable

Another conclusion is that * psmatch2* is preferable to Stata’s built-in command

This article aims at providing a quick how-to and thus ignore some necessary steps for PSM, such as assessing covariate’s balance. More rigorous discussion on PSM in accounting research can be found in Shipman, Swanquist, and Whited (2017).

I benefit from the following articles and Thanks to both authors:

]]>The macro I use is borrowed from Adrian’s work. Thanks Adrian.

A related post can be found here: http://kaichen.work/?p=1365.

libname local "path_to_folder"; options mprint; %macro lowcase(dsn); %let dsid=%sysfunc(open(&dsn)); %let num=%sysfunc(attrn(&dsid,nvars)); %put # data &dsn; set &dsn(rename=( %do i = 1 %to # /*function of varname returns the name of a SAS data set variable*/ %let var&i=%sysfunc(varname(&dsid,&i)); &&var&i=%sysfunc(lowcase(&&var&i)) /*rename all variables*/ %end;)); %let close=%sysfunc(close(&dsid)); run; %mend lowcase; data temp; set local.filename; run; %lowcase(temp) proc export data= temp outfile= "path_to_folder/filename" dbms= dta replace; run;

]]>

Both WRDS and Dick-Nielsen’s codes remove cancellations, corrections, reversals, and double counting of agency trades. Dick-Nielsen’s code provides a few more options, e.g., remove commissioned trades.

]]>I find two useful articles from Stata’s official website:

Can you explain Chow tests?

How can I compute the Chow test statistic?

Suppose we do following regressions separately in two groups:

`regress y x1 x2 if group==1`

and `regress y x1 x2 if group==2`

Then following commands will test the equality of coefficients on `x1`

and `x2`

:

`ge g2=(group==2)`

`regress y c.x1##i.g2 c.x2##i.g2`

`contrast g2 g2#c.x1 g2#c.x2, overall`

Stata’s official website gives an example of the output:

In this example, to test the equality of coefficients on `x1`

and `x2`

, 6.06 and 2.80 are the F-stats that we are looking for.

First, it is only meaningful to count the number at a specified date.

Second, how to define “an analyst is actually following a firm”? I use the following definition: if an analyst issued any forecast (EPS or stock price or sales, anything) within a certain window (e.g., 180 days) before the specified date, then the analyst will be counted in. This definition ensures that the analyst is “actively” following the firm.

That is why my macro requires two arguments: DATE and WINDOW. This macro is used to answer such question—at a specified date, how many analysts are actively following Firm A, B, …?

%MACRO ANALYST_COUNT(INFILE=, TICKER=, DATE=, WINDOW=, OUTFILE=); /* This macro is used to count the number of analysts who followed a */ /* specific firm at a specified date (DATE). Any analyst who issued any */ /* forecast during the window (WINDOW) before the specified date (DATE) */ /* will be counted in. */ /* This macro use both Detailed History Unadjusted (EPS for US Region) */ /* and Unadjusted (Non-EPS for US Region). INFILE should contain IBES */ /* Ticker (TICKER) and DATE. */ options mprint; /* Stack Detailed History Unadjusted (EPS for US Region) and */ /* Unadjusted (Non-EPS for US Region). */ data detu; set ibes.detu_epsus ibes.detu_xepsus; run; /* Merge analysts who issued a forecast during the window. */ proc sql; create table ibes1 as select a.*, b.estimator, b.analys, b.anndats from (select distinct &TICKER, &DATE from &INFILE) a, detu b where not missing(a.&TICKER) and a.&TICKER=b.ticker and not missing(a.&DATE) and a.&DATE-&WINDOW+1<=b.anndats<=a.&DATE and not missing(b.value); quit; /* Retain the most recent forecast from a specific analyst. */ proc sort data=ibes1; by &TICKER &DATE estimator analys descending anndats; run; proc sort data=ibes1 out=ibes2 nodupkey; by &TICKER &DATE estimator analys; run; /* Count the number of analysts who issued a forecast during the window. */ proc sql; create table ibes3 as select distinct &TICKER, &DATE, count(anndats) as analyst_count from ibes2 group by &TICKER, &DATE; quit; /* Merge INFILE with number of analysts */ proc sql; create table &OUTFILE as select a.*, b.analyst_count from &INFILE a left join ibes3 b on a.&TICKER=b.&TICKER and a.&DATE=b.&DATE; quit; proc sql; drop table detu, ibes1, ibes2; quit; %MEND;

]]>

Writing regular expression is work of art! You can find building blocks of regular expression here. I create this post to gather examples of regular expression that will solve certain text search questions. I will grow this post continuously.

]]>**Change variable names to all lowercase**

We need to use the command `rename`

. Instead of renaming variables one at a time, we can rename all variables in a single command (thanks Steve):

rename _all, lower

A related post can be found here: http://kaichen.work/?p=1483.

**Change values of string variables to all lowercase**

`ustrlower(string_variable)`

or `strlower(string_variable)`

will do the trick. Instead of applying `ustrlower`

or `strlower`

function to string variables one by one, we can benefit from lowercasing values of all string variables in a short loop. The following loop will first check the type of a variable. If it is a string variable, then change the value of the variable to all lowercase.

foreach var of varlist _all { local vartype: type `var' if substr("`vartype'",1,3)=="str" { replace `var'=ustrlower(`var') } }

]]>

`egen compdatadate=eom(fiscalmonth fiscalyear)`

`format compdatadate %td`

To be continued …

]]>`tabulate varname`

command is handy in Stata, but sometimes it returns a too long result, if `varname`

contains too many unique values.
The third-party command, `groups`

, will solve the problem by showing top values only. Please use `ssc install groups`

to install `groups`

. The usage of `group`

is very similar to `tabulate`

. Here are some examples:

. sysuse auto (1978 Automobile Data) . groups mpg, order(h) select(5) +-------------------------------+ | mpg Freq. Percent Cum. | |-------------------------------| | 18 9 12.16 12.16 | | 19 8 10.81 22.97 | | 14 6 8.11 31.08 | | 21 5 6.76 37.84 | | 22 5 6.76 44.59 | +-------------------------------+ . groups mpg, order(h) select(f >= 3) +-------------------------------+ | mpg Freq. Percent Cum. | |-------------------------------| | 18 9 12.16 12.16 | | 19 8 10.81 22.97 | | 14 6 8.11 31.08 | | 21 5 6.76 37.84 | | 22 5 6.76 44.59 | |-------------------------------| | 25 5 6.76 51.35 | | 16 4 5.41 56.76 | | 17 4 5.41 62.16 | | 24 4 5.41 67.57 | | 20 3 4.05 71.62 | |-------------------------------| | 23 3 4.05 75.68 | | 26 3 4.05 79.73 | | 28 3 4.05 83.78 | +-------------------------------+

]]>

`and`

and `or`

in `if`

statement, compared to SAS. For example:
In SAS, we can write `if 2001 <= fyear <= 2010`

. But in Stata, we usually write: `if fyear >= 2001 & fyear <= 2010`

.

In fact, Stata provides a handy `inrange`

function. The above `if`

statement can be written as: `if inrange(fyear, 2001, 2010)`

.

Similarly, Stata provides another `inlist`

function. The syntax is `inlist(z, a, b, ...)`

, which returns 1 if z = a or z = b … In `if`

statement, it is equivalent to `if z = a | z = b | ...`