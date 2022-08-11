Researchers found that recombination of COVID-19, while uncommon, is most common in the virus’s spike protein region. Credit: Centers for Disease Control and Protection



An analysis of millions of SARS-CoV-2 genomes shows that recombination of the virus is uncommon, but when it does occur, it is usually in the spike protein region, the region where the virus can attach to and infect host cells.

The study, led by scientists at UC Santa Cruz, was published Aug. 11 in the journal Nature. It describes a new software created by the researchers to search the phylogenetic tree of COVID-19, a diagram of the virus’s evolutionary history, for instances of recombination. This software is open source, allowing public health officials to use it to track cases of recombination within their communities.

Recombination occurs when two genetically different forms of the virus hybridize. This study focused on detectable recombination, when the hybridization results in a sequence that is genetically novel, and not cases where two sequences come together to form a sequence identical to a pre-existing one.

“It’s very important for reconstructing the evolutionary history of the virus,” said Russell Corbett-Detig, senior author of the study and associate professor of biomolecular engineering at the Baskin School of Engineering. “If there’s recombination, it’s not one tree, it’s many trees, and being able to trace that accurately is really crucial to understanding the evolution of the virus.”

Findings on recombination

The researchers analyzed 1.6 million samples of COVID-19 and found 589 recombination events, indicating that only about 2.7% of sequenced genomes result from recombination. These sequences come from the UC Santa Cruz SARS-CoV-2 Browser, a repository for COVID-19 genomic data, which is now the largest single-species collection of genomic sequences ever assembled, currently containing nearly 12 million sequences.

Although the results show that recombination is more common in the spike protein region, it is not yet known why this is. This may be due to a mechanistic bias, indicating that it is the natural tendency of all coronaviruses to recombine to the three-prime number of the viral genome, which contains the spike protein, or that positive natural selection for COVID-19 19 is favourable. recombinants occurring in this region.

Although recombination does occur, there is no evidence that the resulting strains are more likely to be epidemiologically important. In fact, most of the recombinant variants are dying out, as are most of the thousands of mutated variants of COVID-19.

A new software, primarily written by UC San Diego assistant professor Yatish Turakhia during his postdoctoral training in Corbett-Detig’s lab, enabled the computing power needed to analyze millions of genomes. The software, called Recombination Inference using Phylogenetic PLAcEmentS (RIPPLES), can efficiently search a huge phylogenetic tree of COVID-19 genomes to find cases where a new sequence appears to be a combination of two different parts of the tree. The phylogenetic tree COVID-19, called UShER, was created by UCSC researchers and is the primary tool used by health officials around the world to track the spread of variants in their communities.

The researchers found that recombination most often shows up on the phylogenetic COVID-19 tree in the form of “long branches,” making it seem like several mutations happened in succession, which is quite rare.

“In a tree of millions of strings, you find these long branches, which reduce the possible cases of detectable recombination to just about tens of thousands of branches,” Turakhia said. “These long branches make recombination much easier to spot on the tree, enabling the efficient performance of the new software.”

Turakhia and his team strive to keep improving the speed and performance of RIPPLES and create visual aids to make it more accessible to a wider audience.

Use for public health

Knowing when recombination occurs is crucial to understanding the evolutionary lineage of a sequence from the virus. Recombination can complicate the process of tracing the phylogenetic tree of a particular sequence because its genetic material is the result of two joining regions of the general COVID-19 family tree.

This can help officials understand when a COVID-19 lineage that appears new is really an independent mutation first introduced, or rather a combination of two lineages that already existed in the community. Understanding when recombination occurs is also important from a public health perspective, as it could potentially make the virus more adept at evading immunity.

In addition, the availability and ease of use of the RIPPLES software has positive implications for genomics experts and public health officials alike, who can efficiently search a range of COVID-19 genomic samples for recombination in just minutes.

This reflects a larger theme of the work of scalable translation of pathogenic genomics data in the lab of Corbett-Detig and the UCSC Genomics Institute. Researchers are focused on creating tools that allow public health officials to automate and translate the questions they want to ask, and get answers that are easy to handle and reliable.

“A big part of the success of our work is that the software is extremely accessible and computationally cheap in the grand scheme of things,” Corbett-Detig said. “Anyone could take their hundred new SARS-CoV-2 genome sequences and find out in minutes on a simple laptop whether there were possibly recombinant samples. Global public health needs to be democratized, to the point that anyone can do it, even if they are not a super-rich lab with giant servers.”

More information:

Pandemic scale phylogenomy reveals the SARS-CoV-2 recombination landscape, Nature (2022). Pandemic scale phylogenomy reveals the SARS-CoV-2 recombination landscape,(2022). DOI: 10.1038/s41586-022-05189-9

Provided by University of California – Santa Cruz

