Transfer learning for high‐dimensional linear regression: Prediction, estimation and minimax optimality

This paper considers estimation and prediction of a high‐dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of the Royal Statistical Society. Series B, Statistical methodology Ročník 84; číslo 1; s. 149 - 173
Hlavní autoři: Li, Sai, Cai, T. Tony, Li, Hongzhe
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 01.02.2022
Témata:
ISSN:1369-7412, 1467-9868
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper considers estimation and prediction of a high‐dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxiliary studies is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When the set of informative auxiliary samples is unknown, we propose a data‐driven procedure for transfer learning, called Trans‐Lasso, and show its robustness to non‐informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans‐Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating data from multiple different tissues as auxiliary samples.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1369-7412
1467-9868
DOI:10.1111/rssb.12479