British Columbia

B.C. author leads 'David against Goliath' lawsuits alleging big tech used writers' works to train AI

Best-selling author J.B. MacKinnon is the representative plaintiff in four proposed class-action lawsuits that allege copyrighted books written by Canadian authors were illegally being used to train large language models.

Claims allege artificial intelligence companies are illegally mining copyrighted books of Canadian authors

A man with short brown hair poses for a headshot outside, dressed in a light blue dress shirt.
J.B. MacKinnon is an author, journalist and adjunct professor of journalism at the University of British Columbia. (Submitted by J.B. MacKinnon)

A best-selling Vancouver author has launched a class-action lawsuit against Nvidia, claiming the multi-trillion dollar tech company illegally used his and other Canadian writers' works to train artificial intelligence large language models (LLM).

J.B. MacKinnon is named as the representative plaintiff in the claim, which says his books, The 100-Mile Diet and The Once and Future World, were part of a 196,640-book dataset that Nvidia used without paying a licensing fee or securing consent to acquire or use the works. 

"This isn't a situation where some copyrighted material appears as a small fraction of the larger process. The models were entirely built on the mining of copyrighted work," MacKinnon told CBC.

"The most disturbing aspect of it is that those large language models and AIs can then compete with human writers, and are likely to displace human writers."

Besides Nvidia, MacKinnon is the representative plaintiff in three similar class actions filed in B.C. Supreme Court that individually name Meta/Facebook, Anthropic and Databricks Inc. as defendants. All four class actions will require court certification to move forward.

The class, or group, of plaintiffs described in each of the cases are all holders of Canadian copyright in works the companies used to build their LLMs.

Large language models are AI software designed to comprehend and generate language that mimics a knowledgeable human.

World's largest company

According to the claim, Nvidia trained its LLMs on books it obtained in a copied dataset "...because it believed doing so would improve its model and give it an advantage over its competition."

"NVIDIA monetized the NVIDIA LLMs by using them to assist in the growth and development [of the] company's position in the AI industry, which has in turn led to NVIDIA's growth into the world's largest company by market capitalization," it says.

A corporate sign reads, 'Nvidia.'
A Nvidia Corporation sign is shown in Santa Clara, Calif., on May 31, 2023. The tech giant is accused of mining the books of Canadian authors to train its AI. (Jeff Chiu/AP)

Nvidia has a market capitalization of $4.28 trillion.

"Collectively as Canadian writers, we're certainly the David against the Goliath in this case," said MacKinnon. "These are the most powerful and richest corporations in the world that we're up against. I don't think we have any reason to think that the fight will be an easy one."

CBC contacted Nvidia but a company spokesperson declined comment.

The claims also allege the four companies took steps to conceal copyright infringement by training the LLMs to respond in a "misleading way" when asked if copyrighted material was used in the LLMs' creation.

Additionally, the claims say the companies removed copyright management information before the books fed into the LLMs "...so that the LLM did not itself learn that it was built off copyrighted material."

"If that proves to be true in court, I hope that the courts will consider that cause not only for writers to be compensated, but for the companies to be punished for bad behaviour," said MacKinnon.

book covers
The Once and Future World by J.B. MacKinnon and the 100-Mile Diet, co-authored by Alisa Smith and MacKinnon, are mentioned in court documents as examples of copyrighted books used by NVIDIA to train its artificial intelligence large language models. (CBC)

A judge in San Francisco hearing a similar case brought by authors against Anthropic sided last month with the AI company, ruling that training LLMs on purchased copyrighted books qualifies as "extremely transformative" under the legal definition of fair use.

However, the judge did say that Anthropic's use of millions of books it allegedly pirated was a separate issue to be considered. 

A lawyer representing MacKinnon said the problem of AI companies using the original works of authors to build highly profitable products is an issue that's gaining attention worldwide.  

"The goal of the companies is not to transform the world, it's to make money," said Reidar Mogerman. 

"I think you can respect both the values of the copyright system and the ability of these companies to create these models.... It's just that you can't throw out one to create the other, especially when the thing you create is going to be a competitive threat to the work the authors did."

ABOUT THE AUTHOR

Karin Larsen

@CBCLarsen

Karin Larsen is a former Olympian and award winning sports broadcaster who covers news and sports for CBC Vancouver.