Malicious Text Identification: Deep Learning from Public Comments and Emails

Uložené v:
Podrobná bibliografia
Názov: Malicious Text Identification: Deep Learning from Public Comments and Emails
Autori: Asma Baccouche, Sadaf Ahmed, Daniel Sierra-Sosa, Adel Elmaghraby
Zdroj: Information, Vol 11, Iss 312, p 312 (2020)
Informácie o vydavateľovi: MDPI AG
Rok vydania: 2020
Zbierka: Directory of Open Access Journals: DOAJ Articles
Predmety: spam text filter, text mining, content-based classification, natural language processing, multi-label classification, LSTM, Information technology, T58.5-58.64
Popis: Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.
Druh dokumentu: article in journal/newspaper
Jazyk: English
Relation: https://www.mdpi.com/2078-2489/11/6/312; https://doaj.org/toc/2078-2489; https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2
DOI: 10.3390/info11060312
Dostupnosť: https://doi.org/10.3390/info11060312
https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2
Prístupové číslo: edsbas.A4F9FAF9
Databáza: BASE
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://doi.org/10.3390/info11060312#
    Name: EDS - BASE (s4221598)
    Category: fullText
    Text: View record from BASE
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Baccouche%20A
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsbas
DbLabel: BASE
An: edsbas.A4F9FAF9
RelevancyScore: 905
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 905.017944335938
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Malicious Text Identification: Deep Learning from Public Comments and Emails
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Asma+Baccouche%22">Asma Baccouche</searchLink><br /><searchLink fieldCode="AR" term="%22Sadaf+Ahmed%22">Sadaf Ahmed</searchLink><br /><searchLink fieldCode="AR" term="%22Daniel+Sierra-Sosa%22">Daniel Sierra-Sosa</searchLink><br /><searchLink fieldCode="AR" term="%22Adel+Elmaghraby%22">Adel Elmaghraby</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Information, Vol 11, Iss 312, p 312 (2020)
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: MDPI AG
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2020
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Directory of Open Access Journals: DOAJ Articles
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22spam+text+filter%22">spam text filter</searchLink><br /><searchLink fieldCode="DE" term="%22text+mining%22">text mining</searchLink><br /><searchLink fieldCode="DE" term="%22content-based+classification%22">content-based classification</searchLink><br /><searchLink fieldCode="DE" term="%22natural+language+processing%22">natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22multi-label+classification%22">multi-label classification</searchLink><br /><searchLink fieldCode="DE" term="%22LSTM%22">LSTM</searchLink><br /><searchLink fieldCode="DE" term="%22Information+technology%22">Information technology</searchLink><br /><searchLink fieldCode="DE" term="%22T58%2E5-58%2E64%22">T58.5-58.64</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: article in journal/newspaper
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: NoteTitleSource
  Label: Relation
  Group: SrcInfo
  Data: https://www.mdpi.com/2078-2489/11/6/312; https://doaj.org/toc/2078-2489; https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.3390/info11060312
– Name: URL
  Label: Availability
  Group: URL
  Data: https://doi.org/10.3390/info11060312<br />https://doaj.org/article/3bb5b7176d574560be4aface49bc8aa2
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsbas.A4F9FAF9
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.A4F9FAF9
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3390/info11060312
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: spam text filter
        Type: general
      – SubjectFull: text mining
        Type: general
      – SubjectFull: content-based classification
        Type: general
      – SubjectFull: natural language processing
        Type: general
      – SubjectFull: multi-label classification
        Type: general
      – SubjectFull: LSTM
        Type: general
      – SubjectFull: Information technology
        Type: general
      – SubjectFull: T58.5-58.64
        Type: general
    Titles:
      – TitleFull: Malicious Text Identification: Deep Learning from Public Comments and Emails
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Asma Baccouche
      – PersonEntity:
          Name:
            NameFull: Sadaf Ahmed
      – PersonEntity:
          Name:
            NameFull: Daniel Sierra-Sosa
      – PersonEntity:
          Name:
            NameFull: Adel Elmaghraby
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2020
          Identifiers:
            – Type: issn-locals
              Value: edsbas
            – Type: issn-locals
              Value: edsbas.oa
          Titles:
            – TitleFull: Information, Vol 11, Iss 312, p 312 (2020
              Type: main
ResultId 1