LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning
Uloženo v:
| Název: | LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning |
|---|---|
| Autoři: | SUN, Tiezhu, PIAN, Weiguo, DAOUDI, Nadia, ALLIX, Kevin, F. Bissyandé, Tegawendé, KLEIN, Jacques |
| Zdroj: | urn:isbn:978-3-03-170238-9 ; Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings (2024-09-20); The 29th International Conference on Natural Language & Information Systems, Turin, Italy [IT], 25-06-2024 => 27-06-2024 |
| Informace o vydavateli: | Springer Science and Business Media Deutschland GmbH |
| Rok vydání: | 2024 |
| Sbírka: | University of Luxembourg: ORBilu - Open Repository and Bibliography |
| Témata: | Large file classification, Multiple instance learning, Classification tasks, Computational costs, Input constraints, Language processing, Large files, Multiple-instance learning, Natural languages, Text classification, Engineering, computing & technology, Computer science, Ingénierie, informatique & technologie, Sciences informatiques |
| Popis: | peer reviewed ; Transformer-based models have significantly advanced natural language processing, in particular the performance in text classification tasks. Nevertheless, these models face challenges in processing large files, primarily due to their input constraints, which are generally restricted to hundreds or thousands of tokens. Attempts to address this issue in existing models usually consist in extracting only a fraction of the essential information from lengthy inputs, while often incurring high computational costs due to their complex architectures. In this work, we address the challenge of classifying large files from the perspective of correlated multiple instance learning. We introduce LaFiCMIL, a method specifically designed for large file classification. It is optimized for efficient training on a single GPU, making it a versatile solution for binary, multi-class, and multi-label classification tasks. We conducted extensive experiments using seven diverse and comprehensive benchmark datasets to assess LaFiCMIL’s effectiveness. By integrating BERT for feature extraction, LaFiCMIL demonstrates exceptional performance, setting new benchmarks across all datasets. A notable achievement of our approach is its ability to scale BERT to handle nearly 20000 tokens while training on a single GPU with 32 GB of memory. This efficiency, coupled with its state-of-the-art performance, highlights LaFiCMIL’s potential as a groundbreaking approach in the field of large file classification. |
| Druh dokumentu: | conference object report |
| Jazyk: | English |
| ISBN: | 978-3-031-70238-9 3-031-70238-7 |
| Relation: | https://link.springer.com/content/pdf/10.1007/978-3-031-70239-6_5; https://orbilu.uni.lu/handle/10993/62891; info:hdl:10993/62891; https://orbilu.uni.lu/bitstream/10993/62891/1/LaFiCMIL.pdf |
| DOI: | 10.1007/978-3-031-70239-6_5 |
| Dostupnost: | https://orbilu.uni.lu/handle/10993/62891 https://orbilu.uni.lu/bitstream/10993/62891/1/LaFiCMIL.pdf https://doi.org/10.1007/978-3-031-70239-6_5 |
| Rights: | open access ; http://purl.org/coar/access_right/c_abf2 ; info:eu-repo/semantics/openAccess |
| Přístupové číslo: | edsbas.DCDC8119 |
| Databáze: | BASE |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://orbilu.uni.lu/handle/10993/62891# Name: EDS - BASE (s4221598) Category: fullText Text: View record from BASE – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=SUN%20T Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edsbas DbLabel: BASE An: edsbas.DCDC8119 RelevancyScore: 947 AccessLevel: 3 PubType: Conference PubTypeId: conference PreciseRelevancyScore: 947.306396484375 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22SUN%2C+Tiezhu%22">SUN, Tiezhu</searchLink><br /><searchLink fieldCode="AR" term="%22PIAN%2C+Weiguo%22">PIAN, Weiguo</searchLink><br /><searchLink fieldCode="AR" term="%22DAOUDI%2C+Nadia%22">DAOUDI, Nadia</searchLink><br /><searchLink fieldCode="AR" term="%22ALLIX%2C+Kevin%22">ALLIX, Kevin</searchLink><br /><searchLink fieldCode="AR" term="%22F%2E+Bissyandé%2C+Tegawendé%22">F. Bissyandé, Tegawendé</searchLink><br /><searchLink fieldCode="AR" term="%22KLEIN%2C+Jacques%22">KLEIN, Jacques</searchLink> – Name: TitleSource Label: Source Group: Src Data: urn:isbn:978-3-03-170238-9 ; Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings (2024-09-20); The 29th International Conference on Natural Language & Information Systems, Turin, Italy [IT], 25-06-2024 => 27-06-2024 – Name: Publisher Label: Publisher Information Group: PubInfo Data: Springer Science and Business Media Deutschland GmbH – Name: DatePubCY Label: Publication Year Group: Date Data: 2024 – Name: Subset Label: Collection Group: HoldingsInfo Data: University of Luxembourg: ORBilu - Open Repository and Bibliography – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Large+file+classification%22">Large file classification</searchLink><br /><searchLink fieldCode="DE" term="%22Multiple+instance+learning%22">Multiple instance learning</searchLink><br /><searchLink fieldCode="DE" term="%22Classification+tasks%22">Classification tasks</searchLink><br /><searchLink fieldCode="DE" term="%22Computational+costs%22">Computational costs</searchLink><br /><searchLink fieldCode="DE" term="%22Input+constraints%22">Input constraints</searchLink><br /><searchLink fieldCode="DE" term="%22Language+processing%22">Language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Large+files%22">Large files</searchLink><br /><searchLink fieldCode="DE" term="%22Multiple-instance+learning%22">Multiple-instance learning</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+languages%22">Natural languages</searchLink><br /><searchLink fieldCode="DE" term="%22Text+classification%22">Text classification</searchLink><br /><searchLink fieldCode="DE" term="%22Engineering%22">Engineering</searchLink><br /><searchLink fieldCode="DE" term="%22computing+%26+technology%22">computing & technology</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+science%22">Computer science</searchLink><br /><searchLink fieldCode="DE" term="%22Ingénierie%22">Ingénierie</searchLink><br /><searchLink fieldCode="DE" term="%22informatique+%26+technologie%22">informatique & technologie</searchLink><br /><searchLink fieldCode="DE" term="%22Sciences+informatiques%22">Sciences informatiques</searchLink> – Name: Abstract Label: Description Group: Ab Data: peer reviewed ; Transformer-based models have significantly advanced natural language processing, in particular the performance in text classification tasks. Nevertheless, these models face challenges in processing large files, primarily due to their input constraints, which are generally restricted to hundreds or thousands of tokens. Attempts to address this issue in existing models usually consist in extracting only a fraction of the essential information from lengthy inputs, while often incurring high computational costs due to their complex architectures. In this work, we address the challenge of classifying large files from the perspective of correlated multiple instance learning. We introduce LaFiCMIL, a method specifically designed for large file classification. It is optimized for efficient training on a single GPU, making it a versatile solution for binary, multi-class, and multi-label classification tasks. We conducted extensive experiments using seven diverse and comprehensive benchmark datasets to assess LaFiCMIL’s effectiveness. By integrating BERT for feature extraction, LaFiCMIL demonstrates exceptional performance, setting new benchmarks across all datasets. A notable achievement of our approach is its ability to scale BERT to handle nearly 20000 tokens while training on a single GPU with 32 GB of memory. This efficiency, coupled with its state-of-the-art performance, highlights LaFiCMIL’s potential as a groundbreaking approach in the field of large file classification. – Name: TypeDocument Label: Document Type Group: TypDoc Data: conference object<br />report – Name: Language Label: Language Group: Lang Data: English – Name: ISBN Label: ISBN Group: ISBN Data: 978-3-031-70238-9<br />3-031-70238-7 – Name: NoteTitleSource Label: Relation Group: SrcInfo Data: https://link.springer.com/content/pdf/10.1007/978-3-031-70239-6_5; https://orbilu.uni.lu/handle/10993/62891; info:hdl:10993/62891; https://orbilu.uni.lu/bitstream/10993/62891/1/LaFiCMIL.pdf – Name: DOI Label: DOI Group: ID Data: 10.1007/978-3-031-70239-6_5 – Name: URL Label: Availability Group: URL Data: https://orbilu.uni.lu/handle/10993/62891<br />https://orbilu.uni.lu/bitstream/10993/62891/1/LaFiCMIL.pdf<br />https://doi.org/10.1007/978-3-031-70239-6_5 – Name: Copyright Label: Rights Group: Cpyrght Data: open access ; http://purl.org/coar/access_right/c_abf2 ; info:eu-repo/semantics/openAccess – Name: AN Label: Accession Number Group: ID Data: edsbas.DCDC8119 |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.DCDC8119 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/978-3-031-70239-6_5 Languages: – Text: English Subjects: – SubjectFull: Large file classification Type: general – SubjectFull: Multiple instance learning Type: general – SubjectFull: Classification tasks Type: general – SubjectFull: Computational costs Type: general – SubjectFull: Input constraints Type: general – SubjectFull: Language processing Type: general – SubjectFull: Large files Type: general – SubjectFull: Multiple-instance learning Type: general – SubjectFull: Natural languages Type: general – SubjectFull: Text classification Type: general – SubjectFull: Engineering Type: general – SubjectFull: computing & technology Type: general – SubjectFull: Computer science Type: general – SubjectFull: Ingénierie Type: general – SubjectFull: informatique & technologie Type: general – SubjectFull: Sciences informatiques Type: general Titles: – TitleFull: LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: SUN, Tiezhu – PersonEntity: Name: NameFull: PIAN, Weiguo – PersonEntity: Name: NameFull: DAOUDI, Nadia – PersonEntity: Name: NameFull: ALLIX, Kevin – PersonEntity: Name: NameFull: F. Bissyandé, Tegawendé – PersonEntity: Name: NameFull: KLEIN, Jacques IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2024 Identifiers: – Type: isbn-print Value: 9783031702389 – Type: isbn-print Value: 3031702387 – Type: issn-locals Value: edsbas – Type: issn-locals Value: edsbas.oa Titles: – TitleFull: urn:isbn:978-3-03-170238-9 ; Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings (2024-09-20); The 29th International Conference on Natural Language & Information Systems, Turin, Italy [IT], 25-06-2024 => 27-06-2024 Type: main |
| ResultId | 1 |
Nájsť tento článok vo Web of Science