Shed light on the “dark matter” of the human genome

Recent advances in sequencing technologies have revealed the existence of thousands of RNA molecules that are not translated into proteins. Martin Smith and his team at Sainte-Justine university hospital centre are attempting to better understand these RNAs’ mechanisms of action. Their work could have benefits for people living with complex diseases, by pointing the way to new therapeutic targets.

When the Human Genome Project—an international research effort that led to the first complete sequencing of the human genome’s DNA, was completed in 2003—it sparked a great deal of hope in the scientific community. U.S. President Bill Clinton had said that “genome science [. . .] will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.” But where does this leave us, some 20 years later? To this day, many diseases including diabetes, hypertension and psychiatric disorders are still poorly understood as well as difficult to predict and treat, while a host of others still have no cure.

DNA is often portrayed as the template for the transcription of messenger RNAs, which are then translated into proteins. In much the same way, DNA can be thought of as a recipe book, messenger RNAs as copies of the recipes, and the synthesized protein as the end product of the recipe. Although proteins have long been assumed to be the primary agents of molecular function in cells and tissues, that assumption is now being challenged. A large part of DNA is transcribed into RNAs, but the majority of RNAs do not carry the information for protein synthesis. These “non-coding” sequences are often called “dark matter” RNA.

Since the early 2000s, a true gold mine of molecules has been discovered in the non-coding (or dark matter) portion of the human genome. This DNA, which was thought to be unimportant or “junk” DNA, in fact generates thousands of molecules called long non-coding RNAs (lncRNAs). Researchers have found that some of these molecules perform tasks that are essential for maintaining the health of our cells. Like a Lego set, lncRNAs are made up of more than 200 blocks, called nucleotides, that are not translated into proteins. These blocks can form structures and interact with proteins to regulate what goes on in our cells.

My research project involves comparing the human genome to those of other mammals. I am looking for fragments of the genome’s dark matter that exist in both humans and animals, and that could be translated into lncRNAs. The idea is that if a particular lncRNA has been preserved for millions of years in very different species, like humans and horses, for example, then it probably plays an important role and merits investigation.

Long non-coding RNAs frequently assume a non-linear structure, folding onto themselves like a hairpin. This structure is especially interesting to me because we believe that it is through it that lncRNAs interact with other molecules and thus determine their roles within the cell. Using bioinformatics tools and artificial intelligence algorithms, I can visualize their various structures and classify these molecules into families based on the predicted structures.

Some lncRNAs can bind to proteins to “recruit” and direct them to specific locations in the genome. To determine potential therapeutic targets, I will use biochemical methods combined with high-throughput sequencing to identify these proteins and their binding sites. Molecules that mimic the functional region of specific lncRNAs are being developed, which is why it is important to identify these proteins. Therapies that would increase the amounts of certain lncRNAs, or parts of them, could be particularly useful for the treatment of diseases caused by lncRNA mutation, such as cancers and cardiovascular diseases.

I will focus particularly on families of structures that would appear to play a critical role in neural cells, as the human brain expresses several thousand lncRNAs. I will be using the new CRISPR-Cas9 technology, which allows genome editing, to disrupt certain lncRNA structures that might play a role in neural function so as to examine the effect of that disruption.

The aim of my project is therefore to improve our understanding of the architecture of the human genome, and to produce a large set of new targets for drug design, molecular diagnostics and genome modification. My project will also shed light on non-coding parts of the genome that have as-yet unknown significance. With new therapeutic strategies targeting RNA continuing to emerge, unravelling the functional role of long non-coding RNAs could help change the way we think about them. Rather than being junk DNA, as originally thought, lncRNAs offer a potential source of treatments for people with diseases.

This article was produced by Vanda Gaonac’h-Lovejoy (Université de Montréal), with the guidance of Marie-Paule Primeau, science communication advisor, as part of our “My research project in 800 words” initiative.

Back to search projects

Artificial intelligence to...

Shed light on the “dark matter” of the human genome