Application of t-SNE to human genetic data

The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore...

Full description

Saved in:
Bibliographic Details
Published in:Journal of bioinformatics and computational biology Vol. 15; no. 4; p. 1750017
Main Authors: Li, Wentian, Cerise, Jane E, Yang, Yaning, Han, Henry
Format: Journal Article
Language:English
Published: Singapore 01.08.2017
Subjects:
ISSN:1757-6334, 1757-6334
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1757-6334
1757-6334
DOI:10.1142/S0219720017500172