Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorith...

Full description

Saved in:
Bibliographic Details
Published in:Journal of applied statistics Vol. 52; no. 12; pp. 2321 - 2353
Main Author: Dai, Wanyang
Format: Journal Article
Language:English
Published: England Taylor & Francis 10.09.2025
Taylor & Francis Ltd
Subjects:
ISSN:0266-4763, 1360-0532
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0266-4763
1360-0532
DOI:10.1080/02664763.2025.2460076