Malicious Text Identification: Deep Learning from Public Comments and Emails
Uložené v:
| Názov: | Malicious Text Identification: Deep Learning from Public Comments and Emails |
|---|---|
| Autori: | Asma Baccouche, Sadaf Ahmed, Daniel Sierra-Sosa, Adel Elmaghraby |
| Zdroj: | Information, Vol 11, Iss 312, p 312 (2020) |
| Informácie o vydavateľovi: | MDPI AG |
| Rok vydania: | 2020 |
| Zbierka: | Directory of Open Access Journals: DOAJ Articles |
| Predmety: | spam text filter, text mining, content-based classification, natural language processing, multi-label classification, LSTM, Information technology, T58.5-58.64 |
| Popis: | Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset. |
| Druh dokumentu: | article in journal/newspaper |
| Jazyk: | English |
| Relation: | https://www.mdpi.com/2078-2489/11/6/312; https://doaj.org/toc/2078-2489; https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2 |
| DOI: | 10.3390/info11060312 |
| Dostupnosť: | https://doi.org/10.3390/info11060312 https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2 |
| Prístupové číslo: | edsbas.A4F9FAF9 |
| Databáza: | BASE |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://doi.org/10.3390/info11060312# Name: EDS - BASE (s4221598) Category: fullText Text: View record from BASE – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Baccouche%20A Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edsbas DbLabel: BASE An: edsbas.A4F9FAF9 RelevancyScore: 905 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 905.017944335938 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Malicious Text Identification: Deep Learning from Public Comments and Emails – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Asma+Baccouche%22">Asma Baccouche</searchLink><br /><searchLink fieldCode="AR" term="%22Sadaf+Ahmed%22">Sadaf Ahmed</searchLink><br /><searchLink fieldCode="AR" term="%22Daniel+Sierra-Sosa%22">Daniel Sierra-Sosa</searchLink><br /><searchLink fieldCode="AR" term="%22Adel+Elmaghraby%22">Adel Elmaghraby</searchLink> – Name: TitleSource Label: Source Group: Src Data: Information, Vol 11, Iss 312, p 312 (2020) – Name: Publisher Label: Publisher Information Group: PubInfo Data: MDPI AG – Name: DatePubCY Label: Publication Year Group: Date Data: 2020 – Name: Subset Label: Collection Group: HoldingsInfo Data: Directory of Open Access Journals: DOAJ Articles – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22spam+text+filter%22">spam text filter</searchLink><br /><searchLink fieldCode="DE" term="%22text+mining%22">text mining</searchLink><br /><searchLink fieldCode="DE" term="%22content-based+classification%22">content-based classification</searchLink><br /><searchLink fieldCode="DE" term="%22natural+language+processing%22">natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22multi-label+classification%22">multi-label classification</searchLink><br /><searchLink fieldCode="DE" term="%22LSTM%22">LSTM</searchLink><br /><searchLink fieldCode="DE" term="%22Information+technology%22">Information technology</searchLink><br /><searchLink fieldCode="DE" term="%22T58%2E5-58%2E64%22">T58.5-58.64</searchLink> – Name: Abstract Label: Description Group: Ab Data: Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset. – Name: TypeDocument Label: Document Type Group: TypDoc Data: article in journal/newspaper – Name: Language Label: Language Group: Lang Data: English – Name: NoteTitleSource Label: Relation Group: SrcInfo Data: https://www.mdpi.com/2078-2489/11/6/312; https://doaj.org/toc/2078-2489; https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2 – Name: DOI Label: DOI Group: ID Data: 10.3390/info11060312 – Name: URL Label: Availability Group: URL Data: https://doi.org/10.3390/info11060312<br />https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2 – Name: AN Label: Accession Number Group: ID Data: edsbas.A4F9FAF9 |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.A4F9FAF9 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.3390/info11060312 Languages: – Text: English Subjects: – SubjectFull: spam text filter Type: general – SubjectFull: text mining Type: general – SubjectFull: content-based classification Type: general – SubjectFull: natural language processing Type: general – SubjectFull: multi-label classification Type: general – SubjectFull: LSTM Type: general – SubjectFull: Information technology Type: general – SubjectFull: T58.5-58.64 Type: general Titles: – TitleFull: Malicious Text Identification: Deep Learning from Public Comments and Emails Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Asma Baccouche – PersonEntity: Name: NameFull: Sadaf Ahmed – PersonEntity: Name: NameFull: Daniel Sierra-Sosa – PersonEntity: Name: NameFull: Adel Elmaghraby IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2020 Identifiers: – Type: issn-locals Value: edsbas – Type: issn-locals Value: edsbas.oa Titles: – TitleFull: Information, Vol 11, Iss 312, p 312 (2020 Type: main |
| ResultId | 1 |
Nájsť tento článok vo Web of Science