Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage

With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediu...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Journal of computational biology Ročník 32; číslo 3; s. 298
Hlavní autori:	Hu, Yingxin, Liu, Yanjun, Yang, Yuefei
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States 01.03.2025
Predmet:	Algorithms Computational Biology - methods Data Compression - methods DNA - chemistry DNA - genetics coding constraint hamming error-correcting codes DNA storage adaptive arithmetic coding mapping methods
ISSN:	1557-8666, 1557-8666
On-line prístup:	Zistit podrobnosti o prístupe
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediums, endowed with superior characteristics including expansive capacity, remarkable density, minimal energy requirements, and unparalleled longevity. Central to the efficient DNA storage is the process of DNA coding, whereby digital information is converted into sequences of DNA bases. A novel encoding method based on adaptive arithmetic coding (AAC) has been introduced, delineating the encoding process into three distinct phases: compression, error correction, and mapping. Prediction by Partial Matching (PPM)-based AAC in the compression phase serves to compress data and enhance storage density. Subsequently, the error correction phase relies on octal Hamming code to rectify errors and safeguard data integrity. The mapping phase employs a "3-2 code" mapping relationship to ensure adherence to biochemical constraints. The proposed method was verified by encoding different formats of files such as text, pictures, and audio. The results indicated that the average coding density of bases can be up to 3.25 per nucleotide, the GC content (which includes guanine [G] and cytosine [C]) can be stabilized at 50% and the homopolymer length is restricted to no more than 2. Simulation experimental results corroborate the method's efficacy in preserving data integrity during both reading and writing operations, augmenting storage density, and exhibiting robust error correction capabilities.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1557-8666 1557-8666
DOI:	10.1089/cmb.2024.0697