Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage

With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational biology Jg. 32; H. 3; S. 298
Hauptverfasser: Hu, Yingxin, Liu, Yanjun, Yang, Yuefei
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 01.03.2025
Schlagworte:
ISSN:1557-8666, 1557-8666
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediums, endowed with superior characteristics including expansive capacity, remarkable density, minimal energy requirements, and unparalleled longevity. Central to the efficient DNA storage is the process of DNA coding, whereby digital information is converted into sequences of DNA bases. A novel encoding method based on adaptive arithmetic coding (AAC) has been introduced, delineating the encoding process into three distinct phases: compression, error correction, and mapping. Prediction by Partial Matching (PPM)-based AAC in the compression phase serves to compress data and enhance storage density. Subsequently, the error correction phase relies on octal Hamming code to rectify errors and safeguard data integrity. The mapping phase employs a "3-2 code" mapping relationship to ensure adherence to biochemical constraints. The proposed method was verified by encoding different formats of files such as text, pictures, and audio. The results indicated that the average coding density of bases can be up to 3.25 per nucleotide, the GC content (which includes guanine [G] and cytosine [C]) can be stabilized at 50% and the homopolymer length is restricted to no more than 2. Simulation experimental results corroborate the method's efficacy in preserving data integrity during both reading and writing operations, augmenting storage density, and exhibiting robust error correction capabilities.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1557-8666
1557-8666
DOI:10.1089/cmb.2024.0697