Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorith...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of applied statistics Vol. 52; no. 12; pp. 2321 - 2353
Main Author:	Dai, Wanyang
Format:	Journal Article
Language:	English
Published:	England Taylor & Francis 10.09.2025 Taylor & Francis Ltd
Subjects:	Algorithms Artificial neural networks convolutional neural network (CNN) Ewens sampling Gene mutation rate Gene sequencing Genetic modification Kuhn-Tucker method Machine learning multi-input multi-output (MIMO) mutual information Mutation Optimization Optimization techniques Proteins Sampling stochastic gradient multi-input multi-output (MIMO) mutual information Ewens sampling stochastic gradient Gene mutation rate convolutional neural network (CNN) machine learning
ISSN:	0266-4763, 1360-0532
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0266-4763 1360-0532
DOI:	10.1080/02664763.2025.2460076