Both FactSet and CRSP offer identifier files that contain primary identifiers at the entity level and security level (note: an entity may issue multiple securities). These files provide a mapping between their primary identifiers and all other historical identifiers such as CUSIP and ticker. By utilizing these identifiers, we can construct a link table.
Relevant CRSP Variables
The CRSP STOCKNAMES file provides a mapping between CRSP permanent identifiers and all historical CUSIPs, company names, and exchange tickers, along with their effective date ranges.
PERMCO
is the unique identifier for CRSP entities.PERMCO
remains constant over time, even if the company changes its name, CUSIP, or exchange ticker.PERMNO
is the unique identifier for CRSP securities.PERMNO
also does not change over time too.COMNAM
represents the company name in CRSP at a specific point in time.NCUSIP
is the 8-digit historical CUSIP for CRSP securities. A change in CUSIP can occur due to various reasons, including non-fundamental events like splits and company name changes. CRSP also has a header variable namedCUSIP
that reflects only the most recent CUSIP for the entire time series.NAMEDT
andNAMEENDDT
represent the first and last effective dates of each name structure, which includes CUSIP, company name, exchange code, exchange ticker, share class, and SIC code. EachPERMNO
has at least one name structure in the STOCKNAMES file.
Relevant FactSet Variables
The FactSet SYM_V1_SYM_CUSIP_HIST file provides a mapping between FactSet permanent identifiers and all historical CUSIPs, along with their effective date ranges.
FSYM_ID
is the unique identifier for FactSet securities (similar toPERMNO
in CRSP).FSYM_ID
remains unchanged over time. In another identifier file, ENT_V1_ENT_SCR_SEC_ENTITY_HIST, FactSet provides a mapping between FactSet securities and their issuing entities, uniquely identified byFACTSET_ENTITY_ID
, which, likePERMCO
in CRSP, does not change over time.CUSIP
is the 9-digit historical CUSIP for FactSet securities. FactSet also has a header variable namedMOST_RECENT
that reflects only the most recent CUSIP for the entire time series.START_DATE
andEND_DATE
represent the first and last effective dates of each name structure.
Linking FactSet FSYM_ID and CRSP PERMNO using CUSIP
Both FactSet FSYM_ID
and CRSP PERMNO
are used to uniquely identify securities. We can link the two using the associated historical CUSIPs, which serve as a common identifier in both FactSet and CRSP.
The following two-part code generates a link table that provides a mapping between FactSet FSYM_ID
and CRSP PERMNO
, along with their effective date ranges (START_DATE
and END_DATE
). The first part uses SAS, and the second part uses Stata. An SAS expert should be able to complete the task entirely in SAS. I use Stata for the second part simply because a handy command (tsspell
) is available in Stata.
First-part SAS code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
proc sort data=sym_v1_sym_cusip_hist out=fs (keep=fsym_id cusip start_date end_date); where not(missing(cusip)); by fsym_id cusip start_date; run; proc sort data=stocknames out=crsp1 (keep=permno ncusip comnam namedt nameenddt); where not missing(ncusip); by permno ncusip namedt; run; data crsp2; set crsp1; do i=0 to (nameenddt-namedt); date=namedt+i; output; end; format namedt nameenddt date date9.; drop i; run; proc sql; create table linktable_temp as select distinct a.permno, a.date, b.fsym_id from crsp2 a, fs b where a.ncusip=substr(b.cusip,1,8) and b.start_date<=a.date<=coalesce(b.end_date,today()); quit; proc export data= linktable_temp outfile= "C:/linktable_temp" dbms= dta replace; run; |
Second-part Stata code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
use linktable_temp, clear egen pid=group(permno fsym_id) tsset pid date tsspell pid bysort pid _spell: egen start_date=min(date) bysort pid _spell: egen end_date=max(date) format start_date end_date %td duplicates drop pid _spell, force keep permno fsym_id start_date end_date saveold linktable, replace v(12) |
Please set up the LIBNAME
and directories in SAS and Stata to properly reference the relevant datasets. In Stata, use the command ssc install tsspell, replace
to install the third-party command.
Please note that in the link table, the END_DATE
column represents the most recent update date of the CRSP STOCKNAMES file (rather than a missing value) if the link is still active. Therefore, it is important to use the most recent CRSP STOCKNAMES file to avoid any unexpected error.
I find that the link table only covers a small portion of the FactSet universe, as CRSP only keeps historical CUSIP data for securities traded in major exchanges, while FactSet extracts data from a much large security universe. Specifically, only 36,560 FSYM_ID
s can be linked to a PERMNO
, whereas there are 310,234 FSYM_ID
s in FactSet with FACTSET_ENTITY_ID
s.
I would like to express my gratitude to two WRDS articles: Merging CRSP and COMPUSTAT Data and Linking IBES and CRSP Data for their valuable insights.
Super useful code! Thanks a lot. Do you know how to get which Factset entity id is which institutional investor?
Bree,
I believe the Factset “entity id” identifies a company or issuer, not an investor.
“Factset Entity ID – FactSet has entity IDs for companies. The Entity ID is eight characters. Factset also provides and ID for securities, the Perm SEC ID.”
source: https://asklib.library.hbs.edu/faq/358168
I suggest confirming this with Factset.
—–
I’d also ask Factset and other data providers if there are identifiers that identify investors.
I’m guessing that identifiers that identify investors might be used in data sources that have data related to:
– the “holders” or “owners” of a security
ex. https://go.factset.com/marketplace/catalog/product/factset-ownership
– the holdings of a fund or firm (unsure of this data is distinct and different from “the holders or owners of a security”
– and maybe also in data sources that identify “insider” trades and trades of “large” holders or owners of a security.
ex. https://go.factset.com/marketplace/catalog/product/2iq-global-insider-transaction-data
Professor,
Thank you for generously sharing your knowledge, expertise, and insights.
Another approach to explore is using Compustat gvkey (an issuer or company level identifier) and Compustat gvkey combined with Compustat iid, “gvkey_iid” (an issue or security level identifier) as a “common connector”.
I know there is a CRSP-Compustat linking table that can be used to link a Compustat permco to a gvkey, and a Compustat permno to a Compustat gvkey_iid . https://www.crsp.org/products/documentation/link-actions
I will explore if there is a way to link Facset identifiers (ex. FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid).
I will report what I find here.
Professor,
Thank you for generously sharing your knowledge, expertise, and insights.
Another approach to explore for linking Factset and CRSP is using Compustat gvkey (an issuer or company level identifier) and Compustat gvkey combined with Compustat iid, “gvkey_iid” (an issue or security level identifier) as a “common connector”.
I know there is a CRSP-Compustat linking table that can be used to link a Compustat permco to a gvkey, and a Compustat permno to a Compustat gvkey_iid. https://www.crsp.org/products/documentation/link-actions
I will explore if there is a way to link Facset identifiers (FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid combined with iid).
I will report what I find here.
Hi,
Have you found a way to link Facset identifiers (FACTSET_ENTITY_ID, FSYM_ID) to Compustat identifiers (gvkey and gvkey_iid combined with iid)? Thank you.
Amy
After you follow this post to get PERMNO, you can use Compustat-CRSP link table to get GVKEY.
Thank you so much Kai.
I’m trying to keep SIC code, but failed.
“proc sort data=prac.stocknames out=prac.crsp1 (keep=permno ncusip comnam namedt nameenddt SIC);”
Could you help out? Thank you so much.
Amy