Personalized medicine helps patients get customized health care based on their unique genetic makeup for an optimal survival prognosis. When it’s a matter of life or death, physicians need to make informed treatment decisions. The goal is to optimize biological data analyses with a new bioinformatics tool being developed in a lab at the Sainte-Justine University Hospital Centre (CHUSJ) that will facilitate clinical decision-making.

On June 26, 2000, President Bill Clinton declared, “We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by humankind.” That day, President Clinton announced the completion of a rough draft of the human genome. But in fact, there were two independent drafts of the genome. The President was accompanied by two scientists: Dr. Francis Collins, an American medical geneticist, led the Human Genome Project (HGP). Its ambitious goal was to determine the content, structure, and organization of the human genome, which could help prevent, diagnose, and treat diseases. Achieving this goal took 13 years, cost US$2.7 billion, and required international collaboration. Dr. Craig Venter, an American biotechnologist, founded and directed the private company Celera Genomics that used a different technique to produce its genome draft.

Like a recipe book, a genome contains all the information for the growth and development of an organism. It represents the complete set of genetic material stored in DNA molecules that are made of a unique genetic code. The genetic code or DNA alphabet is made up of four nucleotide bases or letters, which are chemicals linked together to make long chains of DNA. The nucleotides in DNA are adenine (A), cytosine (C), guanine (G) and thymine (T). Sequencing segments of DNA helps establish the exact order of the letters in the recipe book.

In this era of personalized medicine, doctors can create highly tailored drug treatments and cancer therapies based on an individual’s sequenced genome. Essentially, what once took years, billions of dollars, and a planet-wide effort can now be done in days and for a few hundred dollars. The development of new sequencing methods, such as massive parallel sequencing, opens up incredible possibilities in genomic medicine. My research project is in line with these efforts to adapt health care to the genetics of each patient. It implements a new strategy for instant analysis of biological data that will serve as a basis for development of diagnostic tools.

My methodology is based on a recent sequencing technique called nanopore sequencing. Nanopores, whose diameter is 100,000 times smaller than that of a human hair, are holes in a membrane separating two chambers. When an electrical current is applied, it creates a current flow in the nanopores that helps read the letters that make up DNA molecules as the genetic material is threaded through the nanopores. Imagine the thread is coloured yellow, green, blue, and red to represent unique properties. As the coloured thread passes through the needle, a sensor connected to the needle captures the different colours. The electric current in the nanopores also changes because each base has a different shape that blocks the current differently. The raw electrical signal generated by the sequencer is transmitted to a computer that can use machine learning to translate those signals to bases. Overall, this innovative technique makes sequencing drastically quicker and less expensive.

Since nanopore sequencing is still being developed, the accuracy of the translated sequences is relatively low compared to other sequencing technologies. I want to bypass the translation step and work in the signal space to make sequencing faster and more sensitive. The raw nanopore signal represents the electric current measures over time. It can be compared to the stock market time series representing the stock price over a period or an electrocardiogram (ECG) recording the electrical signal in the heart. One of my specific objectives is to implement a dynamic time warping (DTW) algorithm to aggregate the data to establish common patterns in the raw signal. DTW is a similarity measure that accommodates changes in speed and temporal deformations. It does pairwise comparisons and returns the best correspondence between two signals. I will need to adjust and optimize parameters to enhance the alignments. I will then use a clustering algorithm to group similar raw signals. To return to the coloured thread analogy, these algorithms will help group similarly coloured threads. Consequently, this new strategy for data analysis will quickly distinguish the biological material in a sample by its discriminative power. From a personalized medicine perspective, my project will assist disease-relevant feature extraction for an accurate and rapid diagnosis.

Real-time molecular diagnostics could boost recognition of appropriate clinical categories. For example, it’s difficult to differentiate a patient with viral sepsis and a patient with an acute inflammatory response unrelated to a pathogen. The first should be prescribed an antiviral while the other should be prescribed an immunosuppressant. Inappropriate treatment causes death, hence the relevance of providing instant and accurate diagnostics, which I intend to do by optimizing the signal analysis algorithms.

This article was produced by Kristina Atanasova, Master’s student in Bioinformatics (Université de Montréal), with the guidance of Marie-Paule Primeau, science communication advisor, as part of our “My research project in 800 words” initiative.