Automobile insurance providers often use artificial intelligence to predict customers’ risk profile and level. The personal information used, however, may be sensitive, and the use of AI can lead to instances of unfair discrimination. Techniques for correcting discriminatory bias in insurance data are currently being studied, with a view to dispensing financial services based on more ethical and responsible AI.
More than five million Quebecers have a driver’s licence, entitling them to operate a motor vehicle on the province’s roads, which cover over 325,000 kilometres. In 2021, the Société de l’assurance automobile du Québec reported 27,888 injuries resulting from road accidents. In the eyes of insurance providers, every individual at the wheel of an automobile represents a different risk, and those providers rely on increasingly sophisticated algorithms to assess that risk. Their estimates are based on a growing mass of information, some of it considered controversial, about insured persons. Just like human intelligence, artificial intelligence can be biased, such that the calculations it makes discriminate against certain population segments. My research project aims at curbing such discriminatory bias by correcting the algorithms used by automobile insurers to ensure that their predictions are more in line with society’s values.
Many jurisdictions around the world prohibit discrimination on certain grounds in the auto insurance industry: gender in Europe, ethnic origin in Texas, religion in California and credit rating in Ontario, for example. Banning reliance on a contentious characteristic in risk estimation, however, only eliminates so-called direct discrimination based on that characteristic. An artificial intelligence may still indirectly guess the characteristic deemed problematic and continue to discriminate unfairly, unbeknownst to the analyst who created the algorithm. This insidious phenomenon is known as “indirect discrimination.”
For example, even if the AI does not have access to individuals’ gender when setting insurance premiums, it might still determine different premiums for, say, nurses, the majority of whom are women, versus automobile mechanics, the majority of whom are men. In the United States, there is also a marked correlation between ethnic origin and the place of residence of insured persons, which can cause an AI to discriminate indirectly based on ethnic origin when it “sees” where someone lives. Likewise, even if an insurance provider is prohibited from discriminating based on credit rating, the AI could infer someone’s credit rating using their age or marital status. There are many such correlations among data, and they allow an AI tool to indirectly access potentially sensitive information. The situation becomes even more worrisome when there is a large amount of data available for each individual. It then becomes more difficult for analysts who evaluate the ethical aspects of artificial intelligence to recognize instances of indirect discrimination and, more important, to find solutions to remedy the situation.
The purpose of my project is to develop statistical strategies that can detect and mitigate indirect discrimination by such algorithms. To that end, three types of strategy have been proposed. First, the data can be modified to eliminate any clues to characteristics deemed sensitive, effectively making the artificial intelligence blind to those characteristics. Second, human constraints can be applied to the artificial intelligence so that it is “penalized” for any decision considered unfair according to one of those constraints. Lastly, the AI’s predictions can be adapted to make them more equitable and thus avoid imposing an unfair burden on any segment of the population.
My research project will emphasize the third strategy, proposing new statistical methods for adapting AI predictions. These supplementary tools will draw inspiration from causal inference, a branch of statistics that looks beyond simple correlations and focuses on cause-and-effect relationships. Artificial intelligence is an excellent detector of correlation. But correlation does not necessarily imply direct causation. For example, sales of ice cream on a beach may be correlated with shark attacks, but that does not mean that sales of ice cream cause shark attacks. The correlation observed is deceptive: both phenomena are merely related to the summer season and the fact that more people visit beaches at that time of year. Causal inference thus offers an indispensable method for identifying characteristics of individuals who genuinely contribute to increased road accident risk, thereby preventing spurious correlations in risk estimation. For example, a causality analysis might demonstrate that age is a determining factor in road accident risk, while ruling out a potential correlation between ethnic origin and road accidents because of the absence of causality.
To ensure that the new statistical methods are applicable to the automobile insurance context, they will be tested on real data obtained through a research partnership with one of Canada’s leading automobile insurers. The intended outcome is an accessible, efficient, turnkey methodology usable by legal experts and specialists in the auto insurance field seeking to critique or adapt algorithms that are likely to discriminate unfairly. The auto insurance field is only a case study; the insights gained from the project could be used to improve the ethical standard of any artificial intelligence system that directly or indirectly influences members of society.
This article was produced by Claudia Picard-Deland, Phd in actuarial science (Université de Montréal), with the guidance of Marie-Paule Primeau, science communication advisor, as part of our “My research project in 800 words” initiative.