Researchers at Carnegie Mellon University’s Computational Biology Department (CBD) have developed innovative techniques to identify crucial segments of the genome that play a role in understanding the evolutionary origins of certain species traits. Their findings, published in the journal Science, contribute to the Zoonomia Project, a comprehensive effort to sequence the complete genomes of 240 mammals, with the aim of shedding light on fundamental aspects of genes and traits relevant to human health protection and biodiversity conservation. To make sense of the vast amount of data generated by this project, the researchers utilized cutting-edge artificial intelligence (AI) and machine learning (ML) technology.
While only about one percent of the human genome consists of coding DNA, which provides instructions for protein production, these sections have a significant impact on cell function regulation and, consequently, evolution. Over time, slight variations arise in the instructions provided by coding DNA, serving as driving forces behind evolutionary changes. However, the remaining noncoding DNA regions, known as enhancers, determine when and where specific genes are active.
The Carnegie Mellon University team devised a machine learning approach called the Tissue-Aware Conservation Inference Toolkit (TACIT) to gain insights into the functioning of these enhancer regions. While traditional evolutionary models might attribute changes in brain size to mutations in a group of genes, enhancers can simply activate or deactivate genes to achieve the same outcome.
Most research on mammalian evolution focuses on the relatively unchanged regions of the genome over millions of years. These conserved regions, particularly genes, provide valuable insights into the fundamental elements of mammalian DNA, highlighting unique characteristics in individual species.
The challenge for Assistant Professor Andreas Pfenning and his team lies in identifying enhancer regions that may change in sequence over time but retain their functionality. For instance, a well-studied Islet enhancer exhibits similar gene regulatory patterns across humans, mice, zebrafish, and sponges, despite over 700 million years of evolution. Traditional methods of examining individual nucleotides struggle to track and identify these enhancer regions accurately.
TACIT addresses this challenge by accurately predicting the activity of enhancers in specific cell types or tissues. It allows scientists to identify these critical enhancer regions in newly sequenced genomes without the need for additional laboratory experiments. This capability holds potential applications in conservation biology, as TACIT can predict enhancer function in endangered or threatened species where controlled laboratory experiments are unfeasible.
Irene Kaplow, a lead author on the paper and a postdoctoral associate in the CBD, explains, “TACIT provides an unprecedented opportunity to predict the function of parts of the genome outside of genes in species for which we cannot get primary tissue samples, such as the bottlenose dolphin and the critically endangered black rhinoceros. As ML methods and methods for identifying enhancers from specific cell types improve, I anticipate that we will be able to broaden the functions of TACIT to provide new kinds of insights into mammalian evolution.”
Using TACIT, the research team made predictions about the function of genomic sequences across the 240 mammals, and applied the tool to identify genome regions that have evolved in mammals with larger brains. They discovered that these regions tended to be near genes implicated in human brain-size disorders. Additionally, they identified an enhancer associated with social behavior across mammals, which is specific to a particular subtype of neuron called the parvalbumin positive inhibitory interneuron.
Professor Pfenning emphasizes that these findings are just the beginning, stating, “We think this is just the tip of the iceberg. We found interesting relationships by applying TACIT to a small number of tissues and small number of traits, but there is still a lot more to discover.”
Source: Carnegie Mellon University