Combinatorial Optimization Machine Learning Algorithms and Statistical Modeling in Genomics

The dissertation contains a broad set of algorithmic questions that arise in machine learning and combinatorics. We have exploited the special combinatorial structure of the problem in order to improve the running time. We also use optimization techniques in statistical modeling and machine learning...

Full description

Saved in:

Bibliographic Details
Main Author:	Le, Thong
Format:	Dissertation
Language:	English
Published:	ProQuest Dissertations & Theses 01.01.2019
Subjects:	Computer science
ISBN:	1085587215, 9781085587211
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The dissertation contains a broad set of algorithmic questions that arise in machine learning and combinatorics. We have exploited the special combinatorial structure of the problem in order to improve the running time. We also use optimization techniques in statistical modeling and machine learning to solve some problems in genomics, and improve the robustness of deep neural network models. There are three main results in the dissertation.1) The matrix-chain multiplication problem is a classic problem that is widely taught to illustrate dynamic programming. The textbook solution runs in Θ(n3) time. Based on triangulating convex polygons, we give a complete correct proofs and implementation details of an O(n2) algorithm. We also extend the solution to a more general class of problems and give an approximation algorithm which runs in linear time.2) Several algorithms have been developed that use high throughput sequencing technology (HTS) characterize structural variations (SV). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions, and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. We gives a novel statistical modeling method to characterize complex structural variation (SV) in genome.3) We study how to attack a machine learning models so that we can improve the robustness of deep neural networks. We propose a novel way to formulate the hard-label black-box attack as a real-valued optimization problem which is usually continuous and can be solved by any zeroth order optimization algorithm. We demonstrate that our proposed method outperforms the previous random walk approach on attacking convolutional neural networks on MNIST, CIFAR, and ImageNet datasets. More interestingly, we show that the proposed algorithm can also be used to attack other discrete and non-continuous machine learning models, such as Gradient Boosting Trees.
Bibliography:	SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12
ISBN:	1085587215 9781085587211