A global team of scientists has discovered how to make more accurate predictions of genetic mutations that cause disease in humans after applying AI techniques to an extensive primate DNA database.
The project combined the genetic information of about 800 primates belonging to 233 species of apes, apes and lemurs. An AI algorithm based on the genomic database was then used to analyze the DNA of 454,000 human participants in the UK Biobank project, with the results showing “greatly improved genetic risk prediction,” the researchers said.
“We have shown that the more we learn about genetic variation in non-human primates, the better we can make predictions about which mutations are likely to cause disease in humans,” said Baylor College of Medicine’s Jeffrey Rogers, one of the consortium leaders. .
The consortium’s work will advance understanding of human genetics and support health research, particularly for groups not well covered by previous medical studies, while improving guidance for conservationists seeking to protect dwindling primate populations. The results were published Thursday in the magazine Science.
The academic researchers teamed up with Illumina, the US company that makes DNA sequencing equipment, to identify 4.3 million common genetic variants found in the genomes of 233 primate species. To predict their health effects, they trained an AI algorithm called PrimateAI-3D with data on these mutations and the three-dimensional structures of the proteins they produce.
“You can train a generative language model like ChatGPT on existing text from Wikipedia and elsewhere,” said Kyle Farh, Illumina vice president for AI. “We used an analog deep learning architecture, but our data comes from millions of years of natural selection.”
The scientists then applied PrimateAI-3D to identify potentially harmful human mutations, using DNA and medical records from 454,000 volunteers who donated samples to UK Biobank.
The results were particularly successful in finding rare genetic variants that carry a high risk of common diseases. Farh said PrimateAI-3D was overall 12 percent more accurate than any previous method for assessing genetic risks of developing health problems such as cardiovascular disease and type 2 diabetes.
An advantage of the new technique, he added, was that it was equally applicable to all of humanity — overcoming biases toward populations of white European ancestry inherent in existing genetic risk assessments, which are primarily based on data from these groups.
“It is a step towards the implementation of genetics-based medicine for diverse non-European populations,” Farh said.
The genome research also has important implications for the primates themselves.
For Rogers, “the biggest surprise was that the level of genetic variation in primate species is typically two, three or even four times higher than in humans. This gives us a perspective on human genetic variation that is very low, even among humans in Africa, by the standards of other primates.”
Ancestral humans are believed to have lost genetic diversity as populations declined to very low numbers tens or hundreds of thousands of years ago.
Primate genetic diversity, found even in very rare and endangered species, could also boost animal conservation, Rogers added: “If we can save the habitats, there’s enough genetic variation in the surviving populations.”
Jean Boubli, professor of tropical ecology and conservation at the University of Salford and a leading member of the consortium, called his work a “groundbreaking change in the study of many aspects of primate evolution. Many of these species are endangered and the results here could help conservation efforts,” he said.