SEC makes all EDGAR filings publicly available. We can download all 10-Ks, 10-Qs, 8-Ks filed since 1993. However, SEC makes this far away from just a few mouse clicks (in order to reduce the server load and avoid the possible abuse I guess). To download EDGAR filings, we have to download EDGAR index files first to get the full path of each 10-K, 10-Q, 8-K, etc. We cannot download any file without the full path information. See technical details here.
I downloaded all EDGAR index files and converted them into Stata datasets. You can download here: Stata format (1993–2000); Stata format (2001–2005); Stata format (2006–2010); Stata format (2011–2015); Stata format (2016–2019/03/16).
If you want to know how I do this, please read my another blog here.
Hello Kai!
I have one question, is the dataset (“SAS dataset and now the file size is about 400M (unzipped file size is 19G)”) still available somewhere.
By the way, i love your site, it is perfect. Finally someone who cares also
about STATA.
Best Regards from Germany
Christian
P.h.d. Student
Broken link repaired. Let me know if it works. Kai
Perfect!!!!!!
Thank you!!!!!!!
Dataset not recognized by stata. Can you please confirm if it still works? Thanks
It still works. The file size is 3.74G. Stata needs to read all data into memory. So, if your computer memory is too small, there may be a problem. If that’s the case, please visit http://www.kaikaichen.com/?p=59 where I provide the data piece by piece for several date ranges.
Thanks Kai. This is really helpful. I truly appreciate your sharing.
Hi Kai,
The link is broken. Could you please repair it.
Thanks a lot
Fixed.
Cheers!
Hi, Kai, I have used the python code to download 10-Q file from the SEC website. However, I find that some of the files are out of order, and the reason is that the SEC website uses hml instead of txt format for the 10-k file. Have you noticed this and maybe you can E-mail me if you have time to discuss with me about this.
Thank you for providing so many useful documents!
Hi Kai,
I have visited your website a few times and been really amazed by your work and sharing spirit!!
As I am cleaning up the SEC Edgar server log data, which captures website visitors’ IP address, timestamp of the request, etc., I wonder if there is any program to condense the data. Loughran & McDonald have cleaned the data but didn’t share the code. https://sraf.nd.edu/data/edgar-server-log/
Thanks a ton in advance!
–Amy
Thank you so much for this post, incredible stuff!
The link to the blog post explaining how you obtain the URLs is broken, any chance you could fix it?
Thank you again for this amazing site.
-Jon.
Updated and thanks for the reminder.
How do I turn these files into the actual datasets?
I tried to download piece by piece but the link is broken!
They are in Stata format.