The completion of the complete ‘telomere-to-telomere’ (T2T) human genome last year confirmed that genome sequences previously thought to be “complete” were not, in fact, complete at all.
Moreover, many modern genomes are sequenced with short read sequencing techniques, which fragment DNA into short fragments, usually 150–300 base pairs in length, which are then compared to the reference sequence. While short read methodologies are fast, accurate, and relatively economical, they routinely miss large portions of the genome, about 10% overall. Missing segments include regions with high G/C content and repetitive sequences, including segment repeats, simple repeats, and transposable elements (TEs).
TEs are repetitive sequences that have moved to other locations in the genome, and movement of these sequences contributes significantly to genetic variation. Repetitive sequences often underlie the formation of structural variants (SVs)—genetic variations caused by duplications, insertions, deletions, and inversions. SVs are often missed when using short read sequencing (particularly those mediated by repeats) but they can play important roles in genome dysregulation and disease.
Researchers have turned to long-read sequencing for more comprehensive genome analysis, as these techniques allow sequencing of much longer DNA segments and can accurately capture a more complete picture of the genome. Recent advances have improved long read accuracy and utility, allowing researchers to investigate previously undiscovered genomic features, not just in humans.
Jackson Lab (JAX) and University of Connecticut Health Center Assistant Professor Kristin Beck, Ph.D., led a team that explored the genomes of another prominent species, the mouse, revealing details across 20 diverse endogenous lineages that will be critical to informing mouse-based genetics and genomics moving forward.
Structural difference between mouse strains
Mice have their own reference genome, known as GRCm39, based on the sequence of C57BL/6J, a strain of the domestic Mus musculus strain. But many commonly used laboratory mouse strains were derived from two other subspecies as well, Mus musculus castaneus and Mus musculus musculus, and there are many genetic differences between the different inbred strains.
For work presented in “Resolving structural variation in diverse murine genomes reveals chromatin remodeling due to transposable elements,” published in Cell genomicsBeck selected a wide range of commonly used strains, including the seven parental founders of the Genetically Diverse Collaborative (CC) and Diversity Outbred (DO) mouse panels, six resulting CC strains with abnormalities of unknown genetic origin, and seven other commonly used strains with different genetic backgrounds.
Ardian Feraj, a graduate student and lead author on the study, assembled the genomes of these 20 mice, and used these sequences to identify SVs present in animals whose genomes distinguished them from those in the C57BL/6J reference. Using PAV, a program developed by Beck lab member Dr Peter Audano, Ardian showed that SVs are widespread throughout the genomes of mice and contribute significantly to genetic variation. Indeed, SVs contain approximately five times as many bases affected as compared to previously published single-nucleotide variants from diverse mouse genomes.
They also found significantly greater diversity of SVs among mouse genomes than in human genomes, indicating that a single mouse reference genome is insufficient for mapping genomic data across mouse strains. Importantly, long read sequencing is vital for capturing this variation. Across 18 strains of mice, the research team detected 213,688 additional insertions, 64,277 deletions and 97 inversions with long reads compared to the short read data.
Transferable elements and the consequences of structural variation
While only a few TEs are still able to be packaged into the human genome, they are more mobile in mice. For this reason, Beck and her team focused on transmissible element variants (TEVs), which they found comprise approximately 40% of all SVs, most of which (60%) are insertions. There are multiple types of TEVs, known as short nuclear versus long scattered elements (SINEs and LINEs), which are distinguished by their predicted size. LINEs were almost twice as common as SINEs in mouse genomes, 47% to 24%.
Due to their size, LINEs also contribute to approximately half of the variable sequence content in murine genomes, compared to only 24% contributions by non-TEV SVs and 2.1% by SINEs. Different endogenous retroviral sequences generated the remaining 28% of TEVs. Retroviruses are RNA viruses whose genomes are reverse transcribed into DNA, which is then inserted into the genome. While many current retroviruses are associated with diseases such as AIDS and cancer, the genomes of normal mammals contain large amounts of DNA derived from retroviruses over thousands of years, known as endogenous retroviruses or retroviruses, which help cause genetic variation in mice.
So what are the potential consequences of all this genetic variation and activity? The researchers considered SVs in the context of known genomic features and predicted the severity of the effects. Of the newly discovered SVs within genomic sequences, the vast majority (94,863) were within introns, which are sequences that are spliced by mRNAs so that they do not alter protein structure; There were 1,469 untranslated parts (UTRs) at either end of the gene; and 510 within the actual protein coding sequence.
They also identified the insertion of a previously undetected retroviral component within a specific gene, Mutyh, which is a DNA repair gene associated with a known mutational signature in certain strains of mice. The underlying variant was unknown, but the team found that the insertion was associated with a significant decrease in mteeh gene expression. The results show that unidentified SVs can alter important genomic regions and reside in genes associated with traits relevant to health and function, including disease.
Finally, in collaboration with Jax investigator Dr Laura Reinholdt, the team investigated the effect of TEs on embryonic stem cell differentiation. TEs promote genome diversity and their variation may alter important aspects of gene expression between strains. Indeed, the study found that more than 22,000 TEVs were associated with significant changes in the accessibility of stem cell chromatin, a key regulator of gene expression, across embryonic stem cells from 10 genetically diverse mouse strains.
Focusing again on a specific example, they investigated a strain-specific intronic insertion (CAST/EiJ) in the Slc47a2 gene, which was accompanied by a chromatin accession signal unique to the strain. They found elevated levels of Slc47a2 expression compared to strains lacking the insert, with a strain-specific transcript and potential pluripotency factor-binding region, suggesting important roles of TEVs in early development.
a more complete understanding
Given the importance of the mouse as a model for mammalian genetics and human disease, it is essential that we fully understand the functional consequences of genetic diversity. The comprehensive detection and characterization of SVs among the genomes of the mouse strain is an important part of this understanding, and the results and data generated by Dr Beck and her collaborators provide an important step forward in this field.
The authors have produced a sequencing-resolved SV resource, a mouse embryonic stem cell expression resource, and chromatin accessibility data for the research community that may aid further investigations into mouse development and genomic features of interest.
Kristen R. Beck, Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements, Cell genomics (2023). DOI: 10.1016/j.xgen.2023.100291. www.cell.com/cell-genomics/ful… 2666-979X (23) 00057-5
the quote: New Study Reveals Details Across 20 Diverse Inbred Mouse Strains (2023, April 5) Retrieved April 5, 2023 from https://phys.org/news/2023-04-04-reveals-diverse-inbred-mouse-strains.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.