Sparse data embedding and prediction by tropical matrix factorization

Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization ( STMF ) for the estimatio...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics Vol. 22; no. 1; pp. 89 - 18
Main Authors: Omanović, Amra, Kazan, Hilal, Oblak, Polona, Curk, Tomaž
Format: Journal Article
Language:English
Published: London BioMed Central 25.02.2021
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects:
ISSN:1471-2105, 1471-2105
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization ( STMF ) for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization ( NMF ), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. Conclusion STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-021-04023-9