Both FactSet and CRSP have identifier files that contain primary identifiers at the entity level and the security level (note: an entity may issue multiple securities) and provide a map between their primary identifiers and all other historical identifiers (e.g., CUSIP, Ticker). We can use these identifiers to construct a link table.
Relevant CRSP Variables
CRSP STOCKNAMES file provides a map between CRSP permanent identifiers and all historical CUSIPs, Company Names, and Exchange Tickers, along with their effective date ranges.
- PERMCO is the unique identifier for CRSP entities. PERMCO does not change over time even if the company changes its name, CUSIP and exchange ticker.
- PERMNO is the unique identifier for CRSP securities. PERMNO does not change over time too.
- COMNAM is the company name in CRSP at a specific point in time.
- NCUSIP is the 8-digit historical CUSIP for CRSP securities. A change in CUSIP, for example, could be triggered by any change in the security, including non-fundamental events such as splits and company name changes. CRSP also has a header variable named “CUSIP” that only reflects the most recent CUSIP for the entire time series.
- NAMEDT and NAMEENDDT are the first and last effective dates of each set of name structure, which consists of CUSIP, Company Name, Exchange Code, Exchange Ticker, Share Class, and SIC Code. Each PERMNO has at least one name structure in STOCKNAMES file.
Relevant FactSet Variables
FactSet SYM_V1_SYM_CUSIP_HIST file provides a map between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.
- FSYM_ID is the unique identifier for FactSet securities (similar to PERMNO in CRSP). FSYM_ID does not change over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a map between FactSet securities and their issuing entities (uniquely identified by FACTSET_ENTITY_ID, which does not change over time like PERMCO in CRSP).
- CUSIP is the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable named “MOST_RECENT” that only reflects the most recent CUSIP for the entire time series.
- START_DATE and END_DATE are the first and last effective dates of each set of name structure.
CUSIP Method to Link FactSet FSYM_ID and CRSP PERMNO
Both FactSet FSYM_ID and CRSP PERMNO are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, the common identifier in both FactSet and CRSP.
The following two-part code generates a link table that provides a map between FactSet FSYM_ID and CRSP PERMNO, along with their effective date ranges (START_DATE AND END_DATE). The first part uses SAS, and the second uses Stata. An SAS expert should be able to complete the task in SAS entirely. I use Stata for the second part simply because a handy command (tsspell) is available in Stata.
First-part SAS code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
proc sort data=sym_v1_sym_cusip_hist out=fs (keep=fsym_id cusip start_date end_date); where not(missing(cusip)); by fsym_id cusip start_date; run; proc sort data=stocknames out=crsp1 (keep=permno ncusip comnam namedt nameenddt); where not missing(ncusip); by permno ncusip namedt; run; data crsp2; set crsp1; do i=0 to (nameenddt-namedt); date=namedt+i; output; end; format namedt nameenddt date date9.; drop i; run; proc sql; create table linktable_temp as select distinct a.permno, a.date, b.fsym_id from crsp2 a, fs b where a.ncusip=substr(b.cusip,1,8) and b.start_date<=a.date<=coalesce(b.end_date,today()); quit; proc export data= linktable_temp outfile= "C:/linktable_temp" dbms= dta replace; run; |
Second-part Stata code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
use linktable_temp, clear egen pid=group(permno fsym_id) tsset pid date tsspell pid bysort pid _spell: egen start_date=min(date) bysort pid _spell: egen end_date=max(date) format start_date end_date %td duplicates drop pid _spell, force keep permno fsym_id start_date end_date saveold linktable, replace v(12) |
Please set up LIBNAME and directories in SAS and Stata to reference to relevant datasets properly. Please use ssc install tsspell, replace
in Stata to install the third-party command.
Please note END_DATE in the link table is the most recent update date of CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, please use the most recent CRSP STOCKNAMES file to avoid any unexpected error.
Perhaps because CRSP only keeps historical CUSIP data when the securities are traded in major exchanges, whereas FactSet extracts data from a much large security universe, I find that the link table only covers a small portion of the FactSet universe – only 36,560 FSYM_IDs can be linked to a PERMNO whereas there are 310,234 FSYM_IDs in FactSet with FACTSET_ENTITY_IDs.
Thanks to two WRDS articles: Merging CRSP and Compustat Data and Linking IBES and CRSP Data.