Science·Exposing Hate

Canadian company compiling database of hate speech that may help others spot and stop it

Toronto firm Hatebase has built a 3,600-word hate speech lexicon by scouring the web for potentially hateful messages. NGOs and social media companies can use the database to augment their own moderation mechanisms.

Toronto firm Hatebase analyzes data to help spot hateful messages that could lead to violence

Timothy Quinn is the co-founder of Hatebase, which built an online repository of hate speech terminology. Its software scours the internet searching for terms that sometimes might not immediately be recognizable as hate speech. (Evan Mitsui/CBC)

This story is part of Exposing Hate, an ongoing series examining the nature of hate in Canada: how it manifests, spreads and thrives and how Canadian institutions, law enforcement and individuals are dealing with it. 

To curb hate speech — and ultimately, the violence it can spur — Timothy Quinn and his team have spent years compiling the most vile words found on the internet.

His Toronto firm, Hatebase, relies on software that digs through the web several times an hour and spots potentially hateful words, which are then flagged to NGOs and social media companies interested in monitoring or curbing hate on their platforms.

Hatebase's multilingual hate speech lexicon has more than 3,600 terms so far and continues to grow.

The company defines hate speech as "any term which broadly categorizes a specific group of people based on malignant, qualitative and/or subjective attributes — particularly if those attributes pertain to ethnicity, nationality, religion, sexuality, disability or class."

But having software navigate complicated streams of text to decipher what falls under that category raises potential questions around whether computers are equipped to spot some of the more nuanced aspects of communication online.

"It's a horrible job for a human being to do," Quinn said. "You need some degree of automation to handle the worst of the worst."

Launched in 2013 as a partner of the Sentinel Project — a genocide-prevention group — Hatebase was initially meant as a way to track early signs of mass atrocities. It would analyze potentially dangerous online chatter in conflict zones in hopes of flagging it early enough to prevent it from escalating into violence.

A memorial for the victims of the Toronto van attack in April 2018, which left 10 people dead and 16 wounded. The suspected perpetrator is believed to have posted hateful messages online in advance of the attack. (Patrick Morrell/CBC)

Early signs of violence

Online messages may have served as precursors to more recent, high-profile killings. Suspects in the Toronto van attack, the El Paso Walmart shooting and the massacre at a New Zealand mosque, among others, are said to have spread spiteful content online in the lead-up to their rampage.

Hatebase's automated social media monitoring engine, known as Hatebrain, is not designed to single out users, but it can  spot a noticeable spike in online hate speech that can sometimes precede targeted violence, Quinn said.

"We're not looking for the one active shooter," he said. "We're looking for raw trends around language being used to discriminate against groups of people online."

The firm's database includes words or phrases in 97 languages spotted across 184 countries. In Canada, gay people and women represent the most-targeted groups, according to a country-specific page not yet made public but made available to CBC News.

In total, the 3,600 terms were spotted more than a million times across the web, with some used just once or twice and others several thousand times.

How it's used

Hatebase licenses its software to tech companies, including the Chinese-owned video sharing app TikTok and other social media firms. Quinn said his company works with well-known Silicon Valley firms but declined to name them, citing non-disclosure agreements.

Hatebase only provides the data. It's up to its clients to decide how to use it — for instance by blocking users who use hateful words that appear in the lexicon, deleting their messages or flagging content to human moderators.

The Canadian Civil Liberties Association (CCLA) said that while it's not familiar with the specific software it would have cause for concern if the data were used as the basis for excluding some points of view from online discussion.

CCLA's Cara Zwibel said Hatebase's definition of hate speech might be too restrictive.

"[Words] that most people in ordinary conversation would think is hate speech, is not hate speech under the law," she said.

Tony McAleer, a former skinhead recruiter, says it's bad idea to remove hateful messages without offering an alternative message. (Craig Chivers/CBC)

More than words

The context around questionable content — not just the words themselves — must be analyzed before determining whether it should be taken down, Zwibel said. "I am worried about using machines to do this kind of work."

Humans rate the entries in Hatebase's lexicon — from "mildly offensive" (such as "bimbo") to "extremely offensive" (such as the N-word).

The software also uses several factors to analyze the way words are being used in a sentence, such as by searching for so-called pilot fish, words or symbols often attached to targeted slurs (named for the small aquatic creatures that live alongside sharks). Quinn said pilot fish could include words such as  "asshole" or the cartoon-turned-hate symbol Pepe the Frog.

Hatebase provides its services free to non-profit groups. Its website lists the UN's human rights agency and the U.S.-based Anti-Defamation League as partners. The company also says more than 275 universities and colleges, including Harvard and Oxford, use Hatebase data for research.

In Ottawa, the United for All coalition — a local group recently formed to counter hate and violence — is considering working with Hatebase to identify neighbourhoods where residents may be vulnerable to radicalization.

"It's not about targeting or fingering people who are engaging in hate or dangerous speech; it's about knowing where it's happening," said Julie McKercher, an Ottawa Police co-ordinator for the MERIT program, which is part of the coalition. 

She said geolocation data obtained by Hatebase could point authorities and community groups in the right direction. 

'You're always playing catch-up'

Another challenge emerges when trying to track hate speech: subtle changes to words made to circumvent digital filters. Tony McAleer, a former skinhead recruiter living in B.C., compares it to the arcade game Whac-A-Mole

"The groups themselves will change the language they're using, so you're always playing catch-up," he said.

Hatebase, for instance, lists the word "ghey" as "an intentional misspelling of 'gay' meant to avoid censorship and mock homosexual behaviour." A recent search of public tweets found the spelling used frequently.

McAleer, who recently published his memoir, The Cure for Hate, said hateful words shouldn't just be suppressed without proposing an alternative message.

"When you censor something, it becomes more popular than it ever was."

Quinn at Hatebase said the company's mandate "is in no way to limit free speech." He agrees counter-messaging and understanding the root of hate is a better strategy.

"We're really in the business of making data available, so organizations can understand the scale of the problem."

ABOUT THE AUTHOR

Thomas Daigle

Senior Reporter

Thomas is a CBC News reporter based in Toronto. In recent years, he has covered some of the biggest stories in the world, from the 2015 Paris attacks to the Tokyo Olympics and the funeral of Queen Elizabeth II. He's reported from the Lac-Mégantic rail disaster, the Freedom Convoy protest in Ottawa and the Pope's visit to Canada aimed at reconciliation with Indigenous people. Thomas can be reached at thomas.daigle@cbc.ca.

With files from CBC's Melanie Glanz