Use Python to download lawsuit data from Stanford Law School’s Securities Class Action Clearinghouse

[Update on 2022-01-08] This website requires login now. I add a function to login and retrieve the protected content. The code may terminate (e.g., every 80 pages) due to timeout or connection error, and thus you may need to run it several times (and change Lin 108 accordingly). Please replace the login email and password with your own (Line 104 and 105). I provide the data (6,122 cases as of 2022-01-08) in a CSV file for easy download (Securities Class Action Filings 2022-01-08 p1 to p84, Securities Class Action Filings 2022-01-08 p85 to p161, and Securities Class Action Filings 2022-01-08 p162 to p205).

[Update on 2019-07-07] I am grateful to Shiyu Chen, my research assistant, who did a very good job on not only web scraping the top-level table, but also extracting from the case summary page additional information (link to case summary page, case status, update date, case summary, case period start date, and case period end date). I post her Python program below with her permission.

[Original Post] Several papers borrow the litigation risk model supplied in Equation (3) of Kim and Skinner (2012, JAE, Measuring securities litigation risk). The logit model uses total asset, sales growth, stock return, stock return skewness, stock return standard deviation, and turnover to estimate a predicted value of litigation risk.  The measure of litigation risk is used by Billings and Cedergen (2015, JAE), Kerr and Ozel (2015, TAR), Bourveau, Lou, and Wang (2018, JAR), and Baginski, Campbell, Hinson, and Koo (2018, TAR), and so on (thanks to Chunmei Zhu for the literature review).

The model uses lawsuit data obtained from Stanford Law School’s Securities Class Action Clearinghouse. However, the website does not deliver the data in a downloadable format. I write the Python program for extracting the data from the website (a technique called webscraping).

I use Python 3.x and please install all required modules. I provide the data (as of 2019-07-07) in a CSV file for easy download (sca.csv).

 

This entry was posted in Python. Bookmark the permalink.

10 Responses to Use Python to download lawsuit data from Stanford Law School’s Securities Class Action Clearinghouse

  1. Griffin Geng says:

    Awesome! Thanks for sharing!

  2. Tigran says:

    I was about to go through building a scraper for this from scratch… you saved me so much time! This is great!

  3. Tianhua says:

    Hi Dr. Chen,
    Thanks so much for this coding. I just got stuck in using this codes as the Securities Class Action Clearinghouse requires login to get the full data. I tried “mechanize” pckage to login but it doesn’t work. Do you have any ideas about how to get the access to the website?

  4. Pengyuan li says:

    added error handling in get_class_period method to avoid the issue if the case’s status is currently Active.

    def get_class_period(soup):
    section = soup.find(“section”, id=”fic”)
    try:
    text = section.find_all(“div”, class_=”span4″)
    start_date = text[4].get_text()
    end_date = text[5].get_text()
    except:
    start_date = ‘null’
    end_date = ‘null’
    return start_date, end_date

    • Md Enayet Hossain says:

      Thanks for the correction. But this only solves the error issue. It does not return the class period for any lawsuits. Any idea how I can get the class period and access the lawsuit files? The html does not even show the contents beyond case summary.

  5. Yuchen says:

    how do you parse settlement value?

  6. Mengxi Chen says:

    Hi Dr. Chen and Shiyu! Thank you so much for sharing this! I appreciate it!

  7. Elisha Yu says:

    Thank you so much Dr.Chen!

    Just a small note: you need to set the Chrome default to maximize the window, or add this before line 18:
    driver.maximize_window()

Leave a Reply to Tigran Cancel reply

Your email address will not be published.