AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications
Saved in:
| Title: | AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications |
|---|---|
| Authors: | Diego S. Almeida, Matheus V. Almeida, Jean V. Sampaio, Eduardo M. Gaieta, Andrielly H. S. Costa, Francisco F. A. Rabelo, César L. Cavalcante, Geraldo R. Sartori, João H. M. Silva |
| Publication Year: | 2025 |
| Subject Terms: | Biophysics, Biochemistry, Biotechnology, Immunology, Cancer, Hematology, Infectious Diseases, Biological Sciences not elsewhere classified, Chemical Sciences not elsewhere classified, Information Systems not elsewhere classified, specific databases often, reference experimental complexes, provide molecular descriptors, machine learning applications, incorrect quality based, corresponding molecular descriptors, accompanying scripts hosted, protein data bank, publicly available via, generate structural variants, https :// github, curated dataset comprising, standardized data set, 000 antibody structures, data sets, structural similarity, decoy set, available structures, zenodo repository, therapeutic antibodies |
| Description: | Machine learning algorithms have played a fundamental role in the development of therapeutic antibodies by being trained on data sets of sequences and/or structures. However, structural data sets remain limited, especially those that include antibody–antigen complexes. Additionally, many of the available structures are not standardized, and antibody-specific databases often do not provide molecular descriptors that could enhance ML models. To address this gap, we introduce AbSet, a curated dataset comprising over 800,000 antibody structures and corresponding molecular descriptors, including both experimentally determined and in silico-generated antibody–antigen complexes. We systematically retrieved antibody structures from the Protein Data Bank (PDB), applied rigorous standardization protocols, and expanded the dataset through large-scale protein–protein docking to generate structural variants of antibody–antigen interactions. Each model was classified as high, medium, acceptable, or incorrect quality based on structural similarity to reference experimental complexes. This classification enables both the construction of a decoy set of confirmed non-binders and the generation of high-confidence augmented structural data for machine learning applications. AbSet is publicly available via the Zenodo repository, with accompanying scripts hosted on GitHub (https://github.com/SFBBGroup/AbSet.git). |
| Document Type: | article in journal/newspaper |
| Language: | unknown |
| Relation: | https://figshare.com/articles/journal_contribution/AbSet_A_Standardized_Data_Set_of_Antibody_Structures_for_Machine_Learning_Applications/29031922 |
| DOI: | 10.1021/acs.jcim.5c00410.s002 |
| Availability: | https://doi.org/10.1021/acs.jcim.5c00410.s002 https://figshare.com/articles/journal_contribution/AbSet_A_Standardized_Data_Set_of_Antibody_Structures_for_Machine_Learning_Applications/29031922 |
| Rights: | CC BY-NC 4.0 |
| Accession Number: | edsbas.7AB13308 |
| Database: | BASE |
Be the first to leave a comment!
Nájsť tento článok vo Web of Science