The statistics are alarming: the pandemic has had disastrous repercussions on children’s safety online. Young people are more connected and accessible than ever, making them easy targets for online sexual predators. Detecting the “grooming” of children online has thus become essential for protecting them.

From 2014 to 2020, some 4.3 million cases of sexual exploitation were reported to Cybertip.ca, Canada’s national tipline for reporting online sexual exploitation of children. Technological advances and the democratization of the Internet have transformed society—for better and for worse. Almost every young person in Canada has a mobile phone and uses social platforms, and has done since childhood. Pedophiles now have access to hundreds of potential victims in just a few clicks.

Furthermore, the significant jump in screen time and isolation resulting from pandemic stay-at-home orders have had catastrophic consequences. Cybertip noted an 88% increase in reports of online exploitation in Canada since the start of the pandemic. These numbers are explained by the fact that parents have very little control over their children’s virtual entourage: although parental control mechanisms exist, they are often intrusive and do not respect children’s privacy. As a result, young people tend to favour unmoderated communications platforms. Finding a balance between safety and privacy is therefore key. This is where my research project comes in. The objective is to warn parents when their child is a victim of grooming, but without their being granted access to the child’s online conversations.

My solution is in two steps. The first is to develop an artificial neural model able to detect grooming messages as early as possible. The second is to implement that model using federated learning, a decentralized, collaborative approach whereby the model will learn locally on social network users’ phones and send only updates to the model to a central server.

It is important to note that 2.2% of sexual infractions reported to police are committed by women, in person and not online, so very few data on the subject exist. The literature, data and models I am basing my study on therefore concern male predators only, as there is not enough information to infer that female predators exhibit the same online behaviours. The rest of this article will therefore refer to male predators. This is not, however, to minimize the actions of female predators, which can also be harmful and should be the subject of distinct studies.

The reason why I am training a model to detect messages from child sexual predators is that they rely on an atypical means of communication. In “Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication,” researcher Loreen N. Olson proposes a model of predators’ communication and describes grooming as a cycle consisting of multiple phases with specific characteristics. She refers to the process of “desensitization” of potential victims to sexual contact by using sexual terms out of context (e.g., cum instead of come) and “reframing,” which presents sexual contact in a playful manner (messing around, playing, learning).

The neural network that I plan to train will use linguistic cues to distinguish “normal” messages from grooming messages after the model learns using a tagged database. During the learning phase, the model will seek to establish connections and patterns to define the various categories. The learnings completed during this phase will then be used to categorize new messages, never seen by the model, to validate it.

For my solution to work, the model will require access to conversations in real time. This type of data, however, is extremely sensitive. Moreover, one of the main limitations of my study is the lack of tagged data available: the existing data are from 2012, which is less than ideal because online language is rapidly evolving. To address these issues, I have decided to implement the model using federated learning. The principle is simple: federated learning will enable transmission of the model directly to users’ phones, where it will learn locally (instead of transmitting the data to a central server). The users will be able to tag their data themselves and, by having access to their conversations, the algorithm will detect messages from sexual predators. Implementing such a mechanism on social platforms will thus make young people’s online environment somewhat safer, without compromising the confidentiality of their data.

Although there are several limitations to my approach—notably as concerns availability of tagged data—the sharp rise in the incidence of child sexual exploitation in recent years demands action. The Internet has created an entirely new arena for predators, and the rules have changed. As journalists Caroline Touzin and Gabrielle Duchaine noted in their exclusive investigation in La Presse in 2020: “Sexual predators are no longer lurking in the neighbourhood park: they’re behind your child’s mobile phone screen.”

This article was produced by Khaoula Chehbouni, Master’s student – Business Intelligence (HEC Montréal), with the guidance of Marie-Paule Primeau, science communication advisor, as part of our “My research project in 800 words” initiative.