A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes.

Saved in:
Bibliographic Details
Title: A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes.
Authors: Xiang, Weixi, Li, Zhaoxin, Sun, Qixin, Chai, Xiujuan, Sun, Tan
Source: Animals (2076-2615); Sep2025, Vol. 15 Issue 17, p2485, 17p
Subject Terms: DEEP learning, TRANSFORMER models, GENETIC profile, ANIMAL breeding, HERITABILITY, SUPERVISED learning, SWINE breeds
Abstract: Simple Summary: Predicting complex genetic traits is essential for improving swine-breeding programs, but traditional methods face limitations. This study introduces a novel deep learning framework, using a Transformer model, to more accurately predict swine phenotypes. The model first learns the fundamental patterns of the pig genome from genetic data and is then fine-tuned to predict key economic traits. Our results show this method outperforms existing approaches, like GBLUP. This enhanced accuracy provides breeders with a powerful tool for selecting superior animals, potentially accelerating genetic gain and delivering substantial economic benefits to the swine industry. Accurate genomic prediction of complex phenotypes is crucial for accelerating genetic progress in swine breeding. However, conventional methods like Genomic Best Linear Unbiased Prediction (GBLUP) face limitations in capturing complex non-additive effects that contribute significantly to phenotypic variation, restricting the potential accuracy of phenotype prediction. To address this challenge, we introduce a novel framework based on a self-supervised, pre-trained encoder-only Transformer model. Its core novelty lies in tokenizing SNP sequences into non-overlapping 6-mers (sequences of 6 SNPs), enabling the model to directly learn local haplotype patterns instead of treating SNPs as independent markers. The model first undergoes self-supervised pre-training on the unlabeled version of the same SNP dataset used for subsequent fine-tuning, learning intrinsic genomic representations through a masked 6-mer prediction task. Subsequently, the pre-trained model is fine-tuned on labeled data to predict phenotypic values for specific economic traits. Experimental validation demonstrates that our proposed model consistently outperforms baseline methods, including GBLUP and a Transformer of the same architecture trained from scratch (without pre-training), in prediction accuracy across key economic traits. This outperformance suggests the model's capacity to capture non-linear genetic signals missed by linear models. This research contributes not only a new, more accurate methodology for genomic phenotype prediction but also validates the potential of self-supervised learning to decipher complex genomic patterns for direct application in breeding programs. Ultimately, this approach offers a powerful new tool to enhance the rate of genetic gain in swine production by enabling more precise selection based on predicted phenotypes. [ABSTRACT FROM AUTHOR]
Copyright of Animals (2076-2615) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Biomedical Index
Description
Abstract:Simple Summary: Predicting complex genetic traits is essential for improving swine-breeding programs, but traditional methods face limitations. This study introduces a novel deep learning framework, using a Transformer model, to more accurately predict swine phenotypes. The model first learns the fundamental patterns of the pig genome from genetic data and is then fine-tuned to predict key economic traits. Our results show this method outperforms existing approaches, like GBLUP. This enhanced accuracy provides breeders with a powerful tool for selecting superior animals, potentially accelerating genetic gain and delivering substantial economic benefits to the swine industry. Accurate genomic prediction of complex phenotypes is crucial for accelerating genetic progress in swine breeding. However, conventional methods like Genomic Best Linear Unbiased Prediction (GBLUP) face limitations in capturing complex non-additive effects that contribute significantly to phenotypic variation, restricting the potential accuracy of phenotype prediction. To address this challenge, we introduce a novel framework based on a self-supervised, pre-trained encoder-only Transformer model. Its core novelty lies in tokenizing SNP sequences into non-overlapping 6-mers (sequences of 6 SNPs), enabling the model to directly learn local haplotype patterns instead of treating SNPs as independent markers. The model first undergoes self-supervised pre-training on the unlabeled version of the same SNP dataset used for subsequent fine-tuning, learning intrinsic genomic representations through a masked 6-mer prediction task. Subsequently, the pre-trained model is fine-tuned on labeled data to predict phenotypic values for specific economic traits. Experimental validation demonstrates that our proposed model consistently outperforms baseline methods, including GBLUP and a Transformer of the same architecture trained from scratch (without pre-training), in prediction accuracy across key economic traits. This outperformance suggests the model's capacity to capture non-linear genetic signals missed by linear models. This research contributes not only a new, more accurate methodology for genomic phenotype prediction but also validates the potential of self-supervised learning to decipher complex genomic patterns for direct application in breeding programs. Ultimately, this approach offers a powerful new tool to enhance the rate of genetic gain in swine production by enabling more precise selection based on predicted phenotypes. [ABSTRACT FROM AUTHOR]
ISSN:20762615
DOI:10.3390/ani15172485