I helped my friend to download data from the DTCC’s Swap Data Repository. I am not familiar with the data and just use this as a programming practice.
This article gives an introduction to the origin of the data: http://www.dtcc.com/news/2013/january/03/swap-data-repository-real-time
The Python script will:
- download the daily Credit zip files; and
- extract CSV from individual zip files and combine the content into a single huge CSV (size 220MB), which then can be imported into Stata or other statistical package.
As of April 22, 2016, there were around one million historical records. The data seems available from April 6, 2013 and missing sporadically from then on. The Python script will print the bad dates where the daily data is not available.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
import io import zipfile from datetime import date import pandas as pd import requests start = date(2013, 1, 1) end = date.today() urls = [] for i in range(start.toordinal(), end.toordinal()): datestr = date.fromordinal(i).isoformat().replace('-', '_') url = ('https://kgc0418-tdw-data2-0.s3.amazonaws.com/slices/CUMULATIVE_CREDITS_' + datestr + '.zip', 'CUMULATIVE_CREDITS_' + datestr + '.zip') urls.append(url) badurls = [] df = pd.DataFrame() for url in urls: request = requests.get(url[0]) if not zipfile.is_zipfile(io.BytesIO(request.content)): print(url[1], 'is non-existent!') badurls.append(url) else: with open(url[1], 'wb') as f: f.write(request.content) print(url[1], 'downloaded!') z = zipfile.ZipFile(io.BytesIO(request.content)) df_ = pd.read_csv(z.open(z.namelist()[0])) df_['DATE'] = url[1][19:29] df = df.append(df_, ignore_index=True) df.to_csv('dtcc.csv') print(badurls) |
Thank you for posting this. I’m currently struggling through downloading daily CSV files and the size of some of them is very cumbersome to the task of actually being able to sift through the data. I’m hopeful this, and some of your other posts, may help lead to the solution on my end. Appreciate it.