Look into CRSP/Compustat link table

Posted on March 14, 2015 by Kai Chen

The link history table (CCMXPF_LNKHIST) is the primary table used for WRDS CCM web queries. In this post, I explain this table in detail.

Background

We know that a company may issue multiple securities, one of which is considered primary for the company. On CRSP, this suggests that one PERMCO (the company-lever identifier) may have multiple PERMNOs (the security-level identifier). Because CRSP collects security-level data such as price and trading volume, we should consider CRSP as breaking down to the security level.

It is well known that Compustat provides financial statement data of a company. The micro unit on Compustat is each and every company. However, it is less known that Compustat also provides security data. In addition, because the coverage of Compustat is more extensive than that of CRSP, Compustat contains addtional security data that are unavailable on CRSP.

Historically, Compustat included only one (primary) security per company. Since mid-April in 2007, all securities issued by a company are available on Compustat with a new identifier, IID, which is used along with GVKEY to identify all securities tracked by Compustat. A marker item, PRIMISS, indicates whether a security is primary or secondary. Therefore, like PERMCO on CRSP, one GVKEY may have multiple IIDs currently.

Let me summarize the identifiers used by Compustat and CRSP:

Database	Identifier	Description
Compustat	GVKEY	Compustat’s permanent company identifier.
Compustat	IID	Compustat’s permanent security identifier. An identifying relationship exists between IID and GVKEY. Both must be accessed as a pair to properly identify a Compustat security. One GVKEY may have multiple IIDs.
Compustat	PRIMISS	This item indicates whether a security is primary (P) or secondary (J). P identifies a primary security with the highest average trading volume over a period of time. J identifies a joiner (secondary) security.
CRSP	PERMCO	CRSP’s permanent company identifier.
CRSP	PERMNO	CRSP’s permanent security identifier. One PERMNO belongs to only one PERMCO. One PERMCO may have one or more PERMNOs.

The last piece of background information is that the link between CRSP and Compustat (at both company level and security level) may change over time.

The linking process

Prior to the introduction of IID, Compustat included only one (primary) security per company. The link between CRSP and Compustat was between CRSP PERMNO and Compustat GVKEY. Because PERMNO is a security identifier and GVKEY is a company identifier, this link may be a many-to-one relationship, i.e., multiple PERMNOs may be linked to a single GVKEY.

Because Compustat security-level information is now available, CRSP started to build security-level links in April 2007.

The linking history table

The main product of CRSP’s laborious linking efforts is the link history table. This table is Compustat-centric, that is, this table is organized and identified by Compustat identifiers which are then linked to CRSP identifiers. All Compustat records are retained, regardless of whether or not the securities (defined by GVKEY-IID) are in the CRSP universe.The following is a slice of the table (Please note that IID, PERMCO, and PERMNO have the prefix “L” in the link history table.):

GVKEY	LINKPRIM	LIID	LINKTYPE	LPERMNO	LPERMCO	LINKDT	LINKENDDT
COMPUSTAT global company key	Primary link marker	Security-level identifier	Link type code	Historical CRSP PERMNO link to COMPUSTAT record	Historical CRSP PERMCO link to COMPUSTAT record	First effective date of link	Last effective date of link
10411	P	1	LC	63773	5230	19811215	.E
10411	P	1	NR			19741129	19811214
10411	J	7	LC	90655	5230	20050516	20120131
10411	J	6	NR			20050429	20060131
10411	J	2	NR			19940331	.E
10411	J	4	NR			20020131	20060131

LIID is based on Compustat’s IID. Because Compustat’s company data range extends earlier than its security data range, there are time periods during which no IID is assigned by Compustat for a GVKEY. In these cases, CRSP assigns a dummy IID ending in “X” as a placeholder in the link table. This GVKEY-dummy IID may or may not be linked to a CRSP PERMNO.

LINKPRIM is a marker item that indicates whether a GVKEY-LIID is a primary security. This marker is based on Compustat’s Primary/Joiner flag (PRIMISS). However, due to missing primary issue markers from Compustat for early history, calendar ranges of overlapping, and different treatment for US and Canadian security issues, CRSP overides Compustat’s primary issue marker in many cases. The purpose is to produce one primary security throughout the company history. “P” represents the primary security issue identified by Compustat, while “C” represents the primary security issue identified or overridden by CRSP. In most applications, we only need the primary security.

Another important item is LINKTYPE. In short, LC and LU are considered as the most accurate links. They are also the default link types used for WRDS CCM web queries. LX and LD are considered as “soft” links of low accuracy. Old merging sample codes also include LS in addition to LC and LU. But by definition below, I do not think that LS should be included.

LINKDT and LINKENDDT are straightforward. They mark the period during which the link is valid.

Please see the detailed description of each item:

ITEM NAME	TYPE	DESCRIPTION
GVKEY	integer, primary key (1)	Compustat GVKEY
LIID	char(3), primary key (2)	Compustat IID. A dummy IID with an “X” suffix is assigned by CRSP as a placeholder if no IID is assigned by Compustat for a GVKEY in early history.
LINKDT	integer (date), primary key (3)	First effective calendar date of link record range
LINKENDDT	integer (date)	Last effective calendar date of link record range
LPERMNO	integer	Linked CRSP PERMNO, 0 if no CRSP security link exists
LPERMCO	integer	Linked CRSP PERMCO, 0 if no CRSP company link exists
LINKPRIM	char(3)	Primary issue marker for the link. This marker is based on Compustat Primary/Joiner flag (PRIMISS), but may be overridden by CRSP in some cases. Values are: P – Primary, identified by Compustat in monthly security data. J – Joiner secondary issue of a company, identified by Compustat in monthly security data. C – Primary, assigned by CRSP to resolve ranges of overlapping or missing primary markers from Compustat in order to produce one primary security throughout the company history. N – Secondary, assigned by CRSP to override Compustat. Compustat allows a US security and a Canadian security issued by the same company to both be marked as Primary at the same time. For purposes of the link, CRSP allows only one primary at a time and marks the others as N.
LINKTYPE	char(3)	Link type code. Each link is given a code describing the connection between the CRSP and Compustat data. Values are: LC – Link research complete. Standard connection between databases. LU – Unresearched link to issue by CUSIP LX – Link to a security that trades on another exchange system not included in CRSP data. LD – Duplicate Link to a security. Another GVKEY/IID is a better link to that CRSP record. LN – Primary link exists but Compustat does not have prices. LS – Link valid for this security only. Other CRSP PERMNOs with the same PERMCO will link to other GVKEYs. NR – No link available, confirmed by research NU – No link available, not yet confirmed

I download the link history table as of January 30, 2015. I delete records without a link found (about 67% of all records; remember the link history table is Compustat-centric and the Compustat universe is bigger than the CRSP universe). For remaining records with a link found, I present the following statistics to give you a sense of the values of LINKPRIM, LINKTYPE, and LIID. As you can see, the vast majority of the primary issue marker is identified by Compustat, and “LC” and “LU” types of links constitute about 90% of all identified links.

. tab linkprim

    Primary |
Link Marker |      Freq.     Percent        Cum.
------------+-----------------------------------
          C |      7,345       23.41       23.41
          J |        359        1.14       24.55
          N |        907        2.89       27.45
          P |     22,764       72.55      100.00
------------+-----------------------------------
      Total |     31,375      100.00

. tab linktype

  Link Type |
       Codd |      Freq.     Percent        Cum.
------------+-----------------------------------
         LC |     11,524       36.73       36.73
         LD |        141        0.45       37.18
         LN |        274        0.87       38.05
         LS |      1,845        5.88       43.93
         LU |     16,622       52.98       96.91
         LX |        969        3.09      100.00
------------+-----------------------------------
      Total |     31,375      100.00

. tab liid

Security-le |
        vel |
 Identifier |      Freq.     Percent        Cum.
------------+-----------------------------------
        00X |      2,369        7.55        7.55
         01 |     25,645       81.74       89.29
        01C |        897        2.86       92.15
         02 |        961        3.06       95.21
        02C |         36        0.11       95.32
         03 |        209        0.67       95.99
        03C |         21        0.07       96.06
         04 |         62        0.20       96.25
        04C |          3        0.01       96.26
         05 |         23        0.07       96.34
        05C |          1        0.00       96.34
         06 |          7        0.02       96.36
        06C |          2        0.01       96.37
         07 |          4        0.01       96.38
        07C |          1        0.00       96.39
         08 |          2        0.01       96.39
         09 |          1        0.00       96.40
         10 |          2        0.01       96.40
         19 |          1        0.00       96.40
        19C |          1        0.00       96.41
         90 |        929        2.96       99.37
        90C |          7        0.02       99.39
         91 |         72        0.23       99.62
         92 |         14        0.04       99.67
         93 |          9        0.03       99.69
         95 |          2        0.01       99.70
         96 |          1        0.00       99.70
         99 |          2        0.01       99.71
        99X |         91        0.29      100.00
------------+-----------------------------------
      Total |     31,375      100.00

. tab linkprim

Primary |

Link Marker | Freq. Percent Cum.

------------+-----------------------------------

C | 7,345 23.41 23.41

J | 359 1.14 24.55

N | 907 2.89 27.45

P | 22,764 72.55 100.00

------------+-----------------------------------

Total | 31,375 100.00

. tab linktype

Link Type |

Codd | Freq. Percent Cum.

------------+-----------------------------------

LC | 11,524 36.73 36.73

LD | 141 0.45 37.18

LN | 274 0.87 38.05

LS | 1,845 5.88 43.93

LU | 16,622 52.98 96.91

LX | 969 3.09 100.00

------------+-----------------------------------

Total | 31,375 100.00

. tab liid

Security-le |

vel |

Identifier | Freq. Percent Cum.

------------+-----------------------------------

00X | 2,369 7.55 7.55

01 | 25,645 81.74 89.29

01C | 897 2.86 92.15

02 | 961 3.06 95.21

02C | 36 0.11 95.32

03 | 209 0.67 95.99

03C | 21 0.07 96.06

04 | 62 0.20 96.25

04C | 3 0.01 96.26

05 | 23 0.07 96.34

05C | 1 0.00 96.34

06 | 7 0.02 96.36

06C | 2 0.01 96.37

07 | 4 0.01 96.38

07C | 1 0.00 96.39

08 | 2 0.01 96.39

09 | 1 0.00 96.40

10 | 2 0.01 96.40

19 | 1 0.00 96.40

19C | 1 0.00 96.41

90 | 929 2.96 99.37

90C | 7 0.02 99.39

91 | 72 0.23 99.62

92 | 14 0.04 99.67

93 | 9 0.03 99.69

95 | 2 0.01 99.70

96 | 1 0.00 99.70

99 | 2 0.01 99.71

99X | 91 0.29 100.00

------------+-----------------------------------

Total | 31,375 100.00

The merging code

You may notice the following announcement on the CCM product:

As of the February 2014 release, USEDFLAG is no longer used in the WRDS CCM web queries. Please select LINKTYPES LC, LU, and LS for the same results. These represent the vast majority of the links between CRSP securities and Compustat companies, without introducing duplicate data.

The WRDS-created linking dataset (ccmxpf_linktable) has been deprecated. It will continue to be created for a transition period of 1 year. SAS programmers should use the Link History dataset (ccmxpf_lnkhist) from CRSP.

This suggests that many old merging codes should be updated accordingly. Based on the explanation above, the most important query filters are LINKPRIM, LINKTYPE, LINKDT and LINKENDDT. LINKPRIM is used to select only primary security. LINKTYPE is used to ensure accuracy. LINKDT and LINKENDDT are used to ensure validity of a link at a give time. I believe that the following code is better than most sample codes I have seen:

/************************************************************************************
* STEP ONE: Extract Compustat data;
************************************************************************************/

data comp1;
  set comp.funda;
  where fyear>=1986
    and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C';
  keep gvkey datadate fyear;
run;
 
/*******************************************************************************************
* STEP TWO: Link GVKEYS to CRSP identifiers;                                               *
* Use CCMXPF_LNKHIST table to obtain CRSP identifiers                                      *
********************************************************************************************/

proc sql;
  create table comp2 as
  select a.*, b.lpermno as permno
  from comp1 as a, crsp.ccmxpf_lnkhist as b
  where a.gvkey=b.gvkey and b.linkprim in ('P', 'C') and
  b.LINKTYPE in ('LU', 'LC') and
  a.datadate >= b.LINKDT and (a.datadate <= b.LINKENDDT or missing(b.LINKENDDT))
  order by gvkey, datadate;
quit;

/************************************************************************************

* STEP ONE: Extract Compustat data;

************************************************************************************/

data comp1;

set comp.funda;

where fyear>=1986

and indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C';

keep gvkey datadate fyear;

run;

/*******************************************************************************************

* STEP TWO: Link GVKEYS to CRSP identifiers; *

* Use CCMXPF_LNKHIST table to obtain CRSP identifiers *

********************************************************************************************/

proc sql;

create table comp2 as

select a.*, b.lpermno as permno

from comp1 as a, crsp.ccmxpf_lnkhist as b

where a.gvkey=b.gvkey and b.linkprim in ('P', 'C') and

b.LINKTYPE in ('LU', 'LC') and

a.datadate >= b.LINKDT and (a.datadate <= b.LINKENDDT or missing(b.LINKENDDT))

order by gvkey, datadate;

quit;

Finally, I acknowledge that the information mainly comes from the official CRSP/Compustat Merged Database Guide.

This entry was posted in Learning Resources, SAS. Bookmark the permalink.

45 Responses to Look into CRSP/Compustat link table

Carolina Magda says:

November 20, 2015 at 7:11 am

Very good!!!

Reply
Jing Xu says:

September 21, 2016 at 8:50 pm

Thanks. This is helpful.

Reply
Soheila says:

February 25, 2017 at 6:07 am

Thanks so much. These codes are very helpful.

Reply
Soheila says:

February 25, 2017 at 6:08 am

Thanks. These codes are very helpful.

Reply
Victor says:

May 19, 2017 at 10:17 am

Good stuff! Very helpful!

I have a question for you. I saw this on the web:

usually linkdt and linkenddt is a date, but linkdt can be ‘B’ (beginning) and linkenddt
can be ‘E’ (end).

How do you handle this when linkdt and linkenddt have values of ‘B’ or ‘E’?

Thank you!

Reply
- Kai Chen says:
  
  May 19, 2017 at 11:33 am
  
  .B and .E mean “missing” in SAS.
  
  Reply
- Xiaorui Zhu says:
  
  December 16, 2017 at 11:54 am
  
  I would suggest you to also match “.B” and “.E” with datadate as following:
  
  ”
  (b.LINKDT <= a.datadateor b.LINKDT = .B) and
  (a.datadate<= b.LINKENDDT or b.LINKENDDT = .E)
  "
  
  Reply
Chandler says:

October 11, 2017 at 2:56 am

Thanks for sharing, it was really helpful.
Just was wondering if you have the merging code in STATA , as well?

Cheers,

Reply
- Kai Chen says:
  
  October 24, 2017 at 12:57 pm
  
  It may be done in Stata, but I highly recommend not doing that in Stata. Data merge will be best done via SQL, but Stata only has “baby” SQL functionality.
  
  Reply
LIZ says:

October 11, 2017 at 11:11 am

Hi Professor,

Thank you for sharing! It’s very helpful. I’m currently working on Stata though. Do you know the code to merge using Stata? Thank you very much!

Reply
- Kai Chen says:
  
  October 24, 2017 at 12:58 pm
  
  It may be done in Stata, but I highly recommend not doing that in Stata. Data merge will be best done via SQL, but Stata only has “baby” SQL functionality.
  
  Reply
Simone says:

April 20, 2018 at 3:03 pm

Hi Kai,

I would like to ask a question. I have a CCM link data base that contains many information at gvkey level. Can I just use that database to merge with CRSP via permno for convenience? Would it be much different from what you created here? There is no thorough description about ccm link data base. I am so confused about it. Thanks!

Reply
Allen Wang says:

May 26, 2018 at 12:32 am

Why not also include ‘LS’ in linktype?

Reply
Tahs says:

December 11, 2018 at 9:33 pm

Hi professor, I cannot seem to be able to find CCMXPF_LNKHIST file / table anywhere. I scoured through WRDS and all they tell me is to use it but no mention of how to actually obtain it. Any help would be appreciated.

Reply
S says:

January 17, 2019 at 9:04 pm

I found this website to be very intriguing and useful! Thank you very much for sharing. I know you must be aware that Wharton is offering a Compustat/CRSP merged dataset. Can we use it directly? I know this dataset has multiple duplicates to be cleaned. But do you think this merged dataset is better or more accurate than the one we merged manually?

Reply
Shuo says:

January 17, 2019 at 11:11 pm

Thank you very much for sharing. But my question is that WRSD is offer a merged compustat and CRSP dataset. Why don’t we use it directly?

Reply
- Kai Chen says:
  
  March 19, 2019 at 12:40 am
  
  WRDS allows users to download a “merged” dataset through web query. But under the hood, WRDS actually runs the code and returns the SQL results to users. If you look into all datasets on WRDS, there are three separate datasets (compustat, crsp and linktable) only. You won’t find a standalone “merged” compustat-crsp dataset. Having said that, it is perfectly fine to download “merged” dataset through web query and just use it. The downside is the reduced portability of code, which sometimes may be an issue for co-author projects or review/sharing processes.
  
  Reply
Alberto says:

March 18, 2019 at 8:14 pm

Dear Professor,

Thank you a lot for your help!One question. I need to merge crsp monthly stock return data to the CCMXPF_LNKHIST file. Will it be correct if I do it as follows (using stata):
use “D:\Research\ CCMXPF_LNKHIST.dta”
rangejoin date linkdt linkenddt using “D:\Research\crsp.dta”, by(permno)

Reply
- Kai Chen says:
  
  March 18, 2019 at 8:21 pm
  
  I just read the HELP file of this user-written command. “rangejoin” can only take in one SQL condition at a time. But the original SAS code contains several SQL conditions. It’s possible to achieve the same result by using multiple Stata commands. But generally speaking, Stata only has “baby” SQL capability and will be very inefficient to do SQL like this.
  
  Reply
  - Alberto says:
    
    March 18, 2019 at 8:32 pm
    
    Thank you! Then if I use SAS to obtain “gvkey” for each CRSP observation do i do as follows (msf – mnt-ly crsp):
    
    proc sql;
    create table comp2 as
    select a.*, b.lpermno as permno
    from msf as a, crsp.ccmxpf_lnkhist as b
    where a.permno=b.permno and b.linkprim in (‘P’, ‘C’) and
    b.LINKTYPE in (‘LU’, ‘LC’) and
    a.date>= b.LINKDT and (a.date <= b.LINKENDDT or missing(b.LINKENDDT))
    order by gvkey, date;
    quit;
    
    Reply
    - Kai Chen says:
      
      March 18, 2019 at 8:35 pm
      
      Looks good to me.
      
      Reply
      - Alberto says:
        
        March 19, 2019 at 8:58 am
        
        Thank you for your help! Just last clarifying question.
        
        After I merged crsp to hist_link I find that for a given date and gvkey I have some duplicates. Like there might be two permno’s for a given date within unique gvkey. How do you deal with such situations? Basically, my idea is to calculate stock returns for a given firm and to merge this data to compustat.
      - Kai Chen says:
        
        March 19, 2019 at 9:21 am
        
        It is possible that a gvkey (represent a firm) have multiple permnos (represent a stock), because a firm may issue multiple stocks. However, this code only picks up the primary issue, so it should be rare that you get a duplicate (though it’s still possible. How many percentage you get a duplicate?). Since it sounds like you are just adding financial data to a CRSP stock (i.e., CRSP-centric), the duplicates won’t bother me. But if you want to keep a unique pair only, just apply a reasonable rule, for example, pick up the biggest market cap permno for a given gvkey. Remember, no data is perfect. If duplicates are rare, how you deal with duplicates will not change your empirical results anyway.
Alberto says:

March 19, 2019 at 10:02 am

Thank you for the comment. You are right, basically I have 0.88% of duplicates. So, I just drop the duplicated items.

Reply
Alberto says:

March 19, 2019 at 11:50 am

Dear Professor,

Sorry for bothering you. There is a following issue that I encounter. When I merge monthly crsp file to historical linking table I obtain the following duplicates within:

1. permno gvkey date
2. permno date

For me this looks a bit awkward since every permno is linked to a single gvkey. It is not the case that permno’s switch from one firm (gvkey) to another through year. There are might be some firms (gvkey) with multiple permnos, but not wise versa. And I use the code below:

proc sql;
create table comp2 as
select a.*, b.lpermno as permno, b.gvkey
from msf as a, crsp.ccmxpf_lnkhist as b
where a.permno=b.permno and b.linkprim in (‘P’, ‘C’) and
b.LINKTYPE in (‘LU’, ‘LC’) and
a.date>= b.LINKDT and (a.date <= b.LINKENDDT or missing(b.LINKENDDT))
order by gvkey, date;
quit;

Can you, please help me to identify the problem.

Reply
- Kai Chen says:
  
  March 19, 2019 at 12:36 pm
  
  I noticed minor errors in your code and corrected yours as follows:
  
  proc sql; create table comp2 as select a.*, b.gvkey from crsp.msf as a, crsp.ccmxpf_lnkhist as b where a.permno=b.lpermno and b.linkprim in ('P', 'C') and b.LINKTYPE in ('LU', 'LC') and a.date>= b.LINKDT and (a.date <= b.LINKENDDT or missing(b.LINKENDDT)) order by gvkey, date; quit;
  
  I didn't find any duplicates within "permno gvkey date" or "permno date". Where is your "msf" dataset coming from? Do duplicate check in your "msf".
  
  Reply
  - Alberto says:
    
    March 19, 2019 at 1:19 pm
    
    Thank you very much for your help!
    
    The msf comes from WRDS, crsp monthly. Before merging I check if there duplicates within “permno date” and there are no such cases. The linking table also does not contain any duplicates within “lpermno gvkey linkdt linkenddt”. However, once I run the code above what I get is there duplicates within “permno gvkey date” and “permno date”. I msf data from (1990-2016) with 2444284 observations. After merge I got 2415933.
    
    Reply
    - Alberto says:
      
      March 19, 2019 at 1:57 pm
      
      Just one addition, if I merge crsp + hist_file through “rangejoin” in stata it does not create duplicates..
      
      use “D:\Research\Hist_link.dta”
      
      rangejoin date linkdt linkenddt using “D:\Research\msf.dta”, by(permno)
      
      Reply
      - Kai Chen says:
        
        March 19, 2019 at 2:18 pm
        
        I tried the code on my end and couldn’t find any duplicate in an even larger sample (about 3.6 million obs). So I am not sure why you got duplicates. If you want to figure this out, pick up those duplicate pairs, reverse-engineer into both source datasets and usually you can see why. You need to read more technical details about “rangejoin” in its HELP file to see how it handles duplicates. For me, if it doesn’t generate duplicates in your case, I would cast doubt on its completeness to return matches—no duplicates by “rangejoin” in your case is just not right.
Alberto says:

March 19, 2019 at 2:52 pm

Thank you for you helpfull comments!

I think I figured out the problem. For example, for permno == 10057, the hist_link is as follows:

gvkey linkdt linkenddt
001098 01-Jan-68 02-Jul-96
007701 31-Jan-67 29-Dec-67
007701 01-Jan-68 03-Jan-67

when I run the code above for the whole sample, this firm results in duplicates. So, I tried to check what is the problem and here is what I find (Why does it merge to gvkey 007701 ?):

permno date year fyrc gvkey
10057 31jan1996 1996 9 1098
10057 29feb1996 1996 9 1098
10057 29mar1996 1996 9 1098
10057 30apr1996 1996 9 1098
10057 31may1996 1996 9 1098
10057 28jun1996 1996 9 1098
10057 31jul1996 1996 9 1098
10057 31jan1996 1996 12 7701
10057 31jan1996 1996 12 7701
10057 29feb1996 1996 12 7701
10057 29feb1996 1996 12 7701
10057 29mar1996 1996 12 7701
10057 29mar1996 1996 12 7701
10057 30apr1996 1996 12 7701
10057 30apr1996 1996 12 7701
10057 31may1996 1996 12 7701
10057 31may1996 1996 12 7701
10057 28jun1996 1996 12 7701
10057 28jun1996 1996 12 7701
10057 31jul1996 1996 12 7701
10057 31jul1996 1996 12 7701

Reply
- Kai Chen says:
  
  March 19, 2019 at 5:56 pm
  
  Show me all records in your ccmxpf_lnkhist dataset with lpermno=10057
  
  Reply
  - Alberto says:
    
    March 20, 2019 at 7:12 am
    
    Thank you for the reply. These are the only records that I have:
    
    gvkey linkdt linkenddt
    001098 01-Jan-68 02-Jul-96
    007701 31-Jan-67 29-Dec-67
    007701 01-Jan-68 03-Jan-67
    
    Is it possible if I send you my hist_link file just to see that we have the same files? (I think the problem might be in the files itself)
    
    Reply
- AN says:
  
  December 24, 2019 at 6:22 pm
  
  Hi Alberto,
  This is AN,I am a first year PhD student in Finance, recently I am also trying to translate SAS code into Stata(1992-2019)0, but I met several problems with detail. Two most important steps are linking compustat to crsp and adding crsp monthly return. Can you tell me your stata codes of these two steps? Thank you very much!
  
  Reply
Alberto says:

March 21, 2019 at 7:05 am

Dear Professor,

Thank you for the blog! I figured out the mistake that I had. Now I dont have any duplicates when I use your code. For those who are interested, exactly the same results can be achieved in stata using “rangejoin” command.

Reply
Arijit says:

April 26, 2019 at 8:16 am

Professor – Very informative post!
A simple question – If Compustat has wider coverage and includes both fundamentals and security prices, then why not use Compustat for all analysis and avoid merging CRSP/Compustat? What advantage dies CRSP have over Compustat for security prices, apart from the very early history in CRSP?

Reply
Dung says:

April 7, 2020 at 5:08 am

Dear Professor,

If I cannot get access to this linking database, how can I merge CRSP and Compustat.
I am looking forward to hearing from you soon.

Thank you

Reply
Kinga says:

July 17, 2020 at 6:25 pm

Dear Professor,

I would like to merge CRSP daily stock data with CRSP Compustat merged securities daily (CCM). I have downloaded the CCM daily table that contains such variables as for instance GVKEY, LPERMNO and LPERMCO.
Since I need to merge the database for the certain sample of companies (around 5000), would it be possible to rename in CRSP database (daily stock data) PERMNO for LPERMNO and PERMCO for LPERMCO and merge it (many to many) based on the LPERMNO LPERMCO and datadate with CRSP/COMPUSTAT merged daily securities in STATA?

Thank you in advance for your response

Reply
- Kai Chen says:
  
  July 17, 2020 at 9:27 pm
  
  Stata may be able to do this by using a user-written module “rangejoin” (https://ideas.repec.org/c/boc/bocode/s458162.html).
  
  Reply
joey says:

August 31, 2020 at 4:53 am

Dear Dr Chen

I find your website to be extremely useful for my PhD studies.

I am sure you are aware that there is a CRSP/Compustat Merged Database=Security Monthly on WRDS. If I recall correctly, this product is relatively new. From the output, there is a unique match between Compustat identifier and each CRSP monthly observation. This looks extremely useful because all I need to do is to use Stata and one-to-one match my CRSP with the file to get the Gvkey.
Do you see any issue with my approach?

Reply
- Kai Chen says:
  
  August 31, 2020 at 9:56 am
  
  Your approach is perfectly fine. In fact, when you send a web query in CRSP/Compustat Merged, similar codes are running on server and then results are returned.
  
  Reply
  - joey says:
    
    September 4, 2020 at 4:59 am
    
    Hi Dr Chen
    For your information, I found some days are missing at random in CRSP/Compustat Merged Database=Security Daily
    This will not be a problem for someone that is doing a quarterly-level study, but this is annoying for someone who is dealing with daily-level data.
    
    Reply
Alivia says:

September 22, 2020 at 5:25 am

This file is really useful!
Here is my question: when we work on some event studies, for example, the earnings announcement quarterly events, do we usually focus on security level or company level? For most of the literature I think the authors seldom mentioned this problem. I checked several companies with multiple securities, the prices and shares outstanding are different. Do we suppose to treat these securities of a common firm at the same quarter as distinct events?

Reply
- Kai Chen says:
  
  September 22, 2020 at 10:13 am
  
  Thanks for your comment. I believe most studies use the primary issue. There is a flag for the primary issue in CRSP.
  
  Reply
Kathryn says:

December 10, 2020 at 5:57 pm

Dear prof. Chen,

Based on my understanding, your code in step two connects gvkey in compustat and Permno using link table. So is it right that next we need to connect the result with crsp?(since the result is a combination of compustat data and permno)

Reply
Sam says:

January 8, 2021 at 10:42 pm

Hi Professor,

I was just wondering about your thoughts on doing this in R. I am trying to merge them now with dplyr (using filter, distinct, inner join) but am having some trouble really because I think I am over simplifying it.

Reply

Look into CRSP/Compustat link table

45 Responses to Look into CRSP/Compustat link table

Leave a Reply Cancel reply

Categories

Archives

Site Admin