NYT Lawsuit Takes Aim at Microsoft's Custom Supercomputer Built for OpenAI Training

The New York Times has updated its copyright lawsuit against OpenAI and Microsoft, this time training its sights on a custom supercomputer that Microsoft allegedly built specifically to scrape web content for AI training.

The case, first filed in December 2023, made the Times the first major publisher to sue a generative AI company. The original complaint centered on OpenAI, accusing the company of using copyrighted Times articles to train ChatGPT without permission and reproducing protected content in the model’s output — undermining the value of a paid subscription.

The amended complaint shifts the focus squarely to Microsoft. According to the filing, Microsoft designed an “unusually complex” machine that does more than supply raw computing power. The system was built to crawl the web — including the Times — and feed the harvested data into OpenAI’s training pipeline. The Times alleges that Microsoft’s supercomputer didn’t just run the numbers; it played an active role in selecting which copyrighted works to ingest.

The system reportedly trained on “nearly the entire internet” while giving Times content extra weight in the selection process, the lawsuit claims.

The updated filing also includes evidence that some ChatGPT users asked the model to bypass the Times’ paywall by repeatedly requesting “the next paragraph” of articles. In some cases, the model returned multiple paragraphs of near-verbatim text from behind the paywall.

The case sits at the intersection of two big questions that courts are still wrestling with: whether training AI on copyrighted material constitutes fair use, and how much responsibility a company like Microsoft bears when it builds infrastructure specifically designed to enable that training. Neither question has a clear answer yet, but the Times is pushing hard to establish one.

The lawsuit is ongoing.