Use Python to download lawsuit data from Stanford Law School’s Securities Class Action Clearinghouse

[Update on 2019-07-07] I am grateful to Shiyu Chen, my research assistant, who did a very good job on not only web scraping the top-level table, but also extracting from the case summary page additional information (link to case summary page, case status, update date, case summary, case period start date, and case period end date). I post her Python program below with her permission.

[Original Post] Several papers borrow the litigation risk model supplied in Equation (3) of Kim and Skinner (2012, JAE, Measuring securities litigation risk). The logit model uses total asset, sales growth, stock return, stock return skewness, stock return standard deviation, and turnover to estimate a predicted value of litigation risk.  The measure of litigation risk is used by Billings and Cedergen (2015, JAE), Kerr and Ozel (2015, TAR), Bourveau, Lou, and Wang (2018, JAR), and Baginski, Campbell, Hinson, and Koo (2018, TAR), and so on (thanks to Chunmei Zhu for the literature review).

The model uses lawsuit data obtained from Stanford Law School’s Securities Class Action Clearinghouse. However, the website does not deliver the data in a downloadable format. I write the Python program for extracting the data from the website (a technique called webscraping).

I use Python 3.x and please install all required modules. I provide the data (as of 2019-07-07) in a CSV file for easy download (sca.csv).


This entry was posted in Python. Bookmark the permalink.

5 Responses to Use Python to download lawsuit data from Stanford Law School’s Securities Class Action Clearinghouse

  1. Griffin Geng says:

    Awesome! Thanks for sharing!

  2. Tigran says:

    I was about to go through building a scraper for this from scratch… you saved me so much time! This is great!

  3. Tianhua says:

    Hi Dr. Chen,
    Thanks so much for this coding. I just got stuck in using this codes as the Securities Class Action Clearinghouse requires login to get the full data. I tried “mechanize” pckage to login but it doesn’t work. Do you have any ideas about how to get the access to the website?

  4. Pengyuan li says:

    added error handling in get_class_period method to avoid the issue if the case’s status is currently Active.

    def get_class_period(soup):
    section = soup.find(“section”, id=”fic”)
    text = section.find_all(“div”, class_=”span4″)
    start_date = text[4].get_text()
    end_date = text[5].get_text()
    start_date = ‘null’
    end_date = ‘null’
    return start_date, end_date

  5. Yuchen says:

    how do you parse settlement value?

Leave a Reply

Your email address will not be published. Required fields are marked *