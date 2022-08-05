A double-stranded DNA fragment. Credit: Vcpmartin/Wikimedia/CC BY-SA 4.0



DNA-based information is a new interdisciplinary field connecting information technology and biotechnology. The field hopes to meet the enormous need for long-term data storage by using DNA as an information carrier. Despite DNA’s promise of strong stability, high storage density and low maintenance costs, researchers face difficulties in accurately rewriting digital information encoded in DNA sequences.

In general, DNA data storage technology has two modes, namely the “in vitro hard disk mode” and the “in vivo CD mode”. The main advantage of the in vivo mode is the inexpensive, reliable replication of chromosomal DNA by cell replication. Because of this property, it can be used for fast and inexpensive distribution of data copies. However, since encoded DNA sequences for certain information contain a large number of repeats and the appearance of homopolymers, such information can only be “written” and “read”, but cannot be “rewritten” accurately.

To solve the rewrite problem, Prof. Liu Kai of the Department of Chemistry, Tsinghua University, Prof. LI Jingjing of the Changchun Institute of Applied Chemistry (CIAC) of the Chinese Academy of Sciences, and Prof. Chen Dong of Zhejiang University, a research team that recently developed a dual-plasmid editing system for accurately processing digital information in a microbial vector. Their findings were published in scientific progress.

The researchers set up a dual-plasmid system in vivo using a rationally designed coding algorithm and information editing tool. This dual-plasmid system is capable of storing, reading, and rewriting various types of information, including text, codebooks, and images. It fully explores the coding ability of DNA sequences without the need for addressing indices or backup sequences. It is also compatible with various types of encryption algorithms, enabling high encryption efficiency. For example, the coding efficiency of the current system reaches 4.0 bits per nucleotide.

To achieve high efficiency and reliability in rewriting complex information stored in exogenous DNA sequences in vivo, a variety of CRISPR-associated proteins (Cas) and recombinase were used. The tools were directed by their corresponding CRISPR RNA (crRNA) to cleave a target locus in a DNA sequence so that the specific information could be addressed and rewritten. Due to the high specificity between complementary pairs of nucleic acid molecules, the information-encoded DNA sequences were accurately reconstructed by recombinase to encode new information. By optimizing the crRNA sequence, the information rewrite tool became highly adaptable to complex information, resulting in a rewrite fidelity of up to 94%, which is comparable to existing gene editing systems.

The dual plasmid system can serve as a universal platform for in vivo rewriting of DNA-based information, providing a novel strategy for information processing and target-specific rewriting of large and complicated data at the molecular level.

“We believe that this strategy could also be applied in a living host with a larger genome, such as yeast, which would pave the way for practical applications related to big data storage,” said Prof. Liu.

More information:

Yangyi Liu et al, In vivo processing of digital information molecular with targeted specificity and robust reliability, scientific progress (2022). www.science.org/doi/10.1126/sciaadv.abo7415 Yangyi Liu et al, In vivo processing of digital information molecular with targeted specificity and robust reliability,(2022). DOI: 10.1126/sciaadv.abo7415

Provided by the Chinese Academy of Sciences





