Use Python to download data from the DTCC’s Swap Data Repository

I helped my friend to download data from the DTCC’s Swap Data Repository. I am not familiar with the data and just use this as a programming practice.

This article gives an introduction to the origin of the data: http://www.dtcc.com/news/2013/january/03/swap-data-repository-real-time

The Python script will:

download the daily Credit zip files; and
extract CSV from individual zip files and combine the content into a single huge CSV (size 220MB), which then can be imported into Stata or other statistical package.

As of April 22, 2016, there were around one million historical records. The data seems available from April 6, 2013 and missing sporadically from then on. The Python script will print the bad dates where the daily data is not available.

import io
import zipfile
from datetime import date

import pandas as pd
import requests

start = date(2013, 1, 1)
end = date.today()

urls = []

for i in range(start.toordinal(), end.toordinal()):
    datestr = date.fromordinal(i).isoformat().replace('-', '_')
    url = ('https://kgc0418-tdw-data2-0.s3.amazonaws.com/slices/CUMULATIVE_CREDITS_' + datestr + '.zip',
           'CUMULATIVE_CREDITS_' + datestr + '.zip')
    urls.append(url)

badurls = []

df = pd.DataFrame()

for url in urls:
    request = requests.get(url[0])
    if not zipfile.is_zipfile(io.BytesIO(request.content)):
        print(url[1], 'is non-existent!')
        badurls.append(url)
    else:
        with open(url[1], 'wb') as f:
            f.write(request.content)
        print(url[1], 'downloaded!')
        z = zipfile.ZipFile(io.BytesIO(request.content))
        df_ = pd.read_csv(z.open(z.namelist()[0]))
        df_['DATE'] = url[1][19:29]
        df = df.append(df_, ignore_index=True)

df.to_csv('dtcc.csv')

print(badurls)

import io

import zipfile

from datetime import date

import pandas as pd

import requests

start = date(2013, 1, 1)

end = date.today()

urls = []

for i in range(start.toordinal(), end.toordinal()):

datestr = date.fromordinal(i).isoformat().replace('-', '_')

url = ('https://kgc0418-tdw-data2-0.s3.amazonaws.com/slices/CUMULATIVE_CREDITS_' + datestr + '.zip',

'CUMULATIVE_CREDITS_' + datestr + '.zip')

urls.append(url)

badurls = []

df = pd.DataFrame()

for url in urls:

request = requests.get(url[0])

if not zipfile.is_zipfile(io.BytesIO(request.content)):

print(url[1], 'is non-existent!')

badurls.append(url)

else:

with open(url[1], 'wb') as f:

f.write(request.content)

print(url[1], 'downloaded!')

z = zipfile.ZipFile(io.BytesIO(request.content))

df_ = pd.read_csv(z.open(z.namelist()[0]))

df_['DATE'] = url[1][19:29]

df = df.append(df_, ignore_index=True)

df.to_csv('dtcc.csv')

print(badurls)

One Response to Use Python to download data from the DTCC’s Swap Data Repository

Rehypothecator says:

July 9, 2024 at 8:18 pm

Thank you for posting this. I’m currently struggling through downloading daily CSV files and the size of some of them is very cumbersome to the task of actually being able to sift through the data. I’m hopeful this, and some of your other posts, may help lead to the solution on my end. Appreciate it.

Use Python to download data from the DTCC’s Swap Data Repository

One Response to Use Python to download data from the DTCC’s Swap Data Repository

Leave a Reply Cancel reply

Categories

Archives

Site Admin