UW Medicine genome scientists were among the key contributors to the publication of the first complete, uninterrupted sequence of a human genome announced this week by the National Human Genome Research Institute.
The lab of Evan Eichler, a professor of genome science at the University of Washington in Seattle, was a key contributor to the lead article, “The Complete Sequence of the Human Genome,” published in Science on April 1. . of a large consortium, the Telomere-to-Telomer, or T2T, which aimed to obtain complete sequences of the 23 human chromosomes, from start to finish.
Eichler’s team and collaborators from other institutions also produced a companion paper offering the first comprehensive view of large, highly identical repeat regions, called segmental duplications, and their variation in human genomes.
These areas of the human genome are essential for understanding human evolution and genetic diversity, as well as resistance or susceptibility to many diseases. Of the 20,000 genes in the human genome, about 950 come from segmental duplications.
However, segmental duplications were among the last regions of the human genome assembly to be fully sequenced, due to their complexity.
The desire to resolve these regions was part of the push to advance sequencing technologies, such as the ability to read long stretches of DNA. These technologies, along with many laboratory tools, computational biology approaches, and other essential research resources, were not available when the human genome was first written more than two decades ago.
The Eichler lab-led team reported their findings and analysis in a companion scientific paper published this week, titled “Segmental Duplications and Their Variation in a Complete Human Genome.” The lead author of this article is Mitchell R. Vollger, postdoctoral fellow in genome sciences at the UW School of Medicine. He has applied his skills in computing, data visualization, and mathematics to analyze novel genomic repeats to further our understanding of human variation in segmental duplications. Together with Phil Dishuck, a graduate student in the Eichler lab, they showed that completing the human genome added about 180 “new” protein-coding genes, almost all of which mapped to segmental duplications.
As a child I saw the magazine covers for a complete human genome in 2001. I remember thinking that was the coolest project and how disappointed I was that I could never do something so cool. I thought about that a lot during this project, that I had to contribute to the human genome sequence, and it excites me a lot, that I had the opportunity to do that.”
Mitchell R. Vollger, Postdoctoral Fellow in Genome Sciences, UW School of Medicine
Several intriguing discoveries have emerged from recent achievements in sequencing these regions.
In addition to the implications for medical research of the completed assembly, it also helps answer: what is contained in our genomes that makes us distinctly human? Some of the genes that were gaps in the original genome are now believed to be critically important in helping to make humans bigger brains compared to other apes.
Eichler’s lab also generated long-read assemblies from other non-human primate genomes and compared them to the new uninterrupted assembly of the human genome. They systematically reconstructed the evolution of some biomedically relevant genes, as well as some human-specific duplicate genes.
These human-specific segmental duplications are reservoirs of new genes that cause more neurons to form in developing brains and increase synapse connectivity in the frontal cortex – the anatomical part of the brain where some of the thinking, reasoning, logic, and language functions that seem typically human occur.
In TBC1D3, a gene family linked to the expansion of the human prefrontal cortex, analysis by graduate student Xavi Guitart in the Eichler lab revealed that recurrent, independent expansions occur at different times in primate evolution. The most recent dates from about 2 to 2.6 million years ago, around the time the genus Homo has emerged. Surprisingly, the human TBC1D3 gene family showed remarkable large-scale structural variation in a subset of samples.
“Different humans carry radically different complements and arrangements of the TBC1D3 family of genes,” the researchers explained in their paper and this was unexpected for a gene thought to be so important for brain function. The scientists also discovered diversity in the complex structure of the PLA gene, in which variability in part of this lipoprotein gene underlies the most important genetic risk factor for cardiovascular disease from abnormal blood lipid levels.
The researchers also examined NMS (motor neuron gene) whose mutations are linked to certain neuromuscular diseases. Having better sequence resolution of the spinal muscular atrophy region – one of the most difficult regions to complete on chromosome 5 – could be a practical advantage both in determining disease risk and in treatment. as duplicate gene SMN2 is a target for one of the most effective gene therapies.
Based on these and other findings, the scientists noted that the new reference genome “reveals unprecedented levels of human genetic variation in genes important to neurodevelopment and human disease.”
Besides being a source of new knowledge about human biology, the recently completed human genome is also likely to answer some fundamental questions about cell biology. For example, the assembly will allow to better understand the differences of centromeres present in each of the human chromosomes. Problems in the centromeres can lead to difficulties during cell division.
Studying centromere sequences could go to the origin of medical conditions where cell division and the distribution of genetic material between cells go wrong. These include cancer as well as abnormalities that affect prenatal development, such as Down syndrome or Robertsonian translocations.
Glennis A. Logsdon, a postdoctoral fellow in genome science at the UW School of Medicine, has made several discoveries related to centromere sequencing.
“We had to develop new ways to target these regions,” she explained. “We took advantage of new technology that was on the horizon, such as ultra-long-read sequencing, to traverse these regions. We also worked hard to tweak the genome sequence to make sure it was very accurate.
Eichler commented on the training and experience that early career human genome researchers have received during T2T projects.
“I consider it a privilege to help train the next generation of scientists,” he said. “It’s so fun to watch them start as students, contribute to a big project, and then take it to the next level.”
Eichler was part of the original Human Genome Project in 2001. He was fascinated by regions of the genome that were complex in terms of being highly repetitive, but also encoded genes.
When the conclusion of the human genome sequencing project was declared, many of these regions had not been finalized.
Eichler added that since then he had an intense desire to finish them.
“I’ve always come back to this point that to understand genetic variation comprehensively, we need a comprehensive reference. Otherwise, we’re missing pieces of the puzzle. 95% of the puzzle being solved is enough for some But I guess for me getting that last 5% was so important because I believe a lot of what we don’t understand about disease, or what we don’t understand about evolution, is represented from disproportionately in those 5% of genomes that we didn’t sequence to begin with.”
This is not the end, he said. “Even if people were like, ‘Well, we’re done with finishing the genome. We finished a genome. There will be hundreds, probably thousands of genomes over the next few years. I think our view of how humans differ from each other is going to be transformed, and how important more complex genetic variation is to not only make us human, but to make us different.”