The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification
Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural...
Uložené v:
| Vydané v: | Journal of quantitative linguistics Ročník 28; číslo 2; s. 138 - 171 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Lisse
Routledge
03.04.2021
Taylor & Francis Ltd |
| Predmet: | |
| ISSN: | 0929-6174, 1744-5035 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F
∆max.
= 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity. |
|---|---|
| AbstractList | Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F
∆max.
= 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity. Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity. |
| Author | Fang, Alex Chengyu Huang, Chu-Ren Wan, Mingyu |
| Author_xml | – sequence: 1 givenname: Mingyu orcidid: 0000-0003-0083-5895 surname: Wan fullname: Wan, Mingyu email: pku.clara@gmail.com organization: Peking University – sequence: 2 givenname: Alex Chengyu surname: Fang fullname: Fang, Alex Chengyu organization: City University of Hong Kong – sequence: 3 givenname: Chu-Ren orcidid: 0000-0002-8526-5520 surname: Huang fullname: Huang, Chu-Ren organization: Peking University |
| BookMark | eNqFkEFLwzAYhoNMcJv-BKHguTNpmnTFi2PqHAwE3dmQpl8wo0tmkin797Z2XjzoKSR53pfve0ZoYJ0FhC4JnhA8xde4zEpOinySYVJOCOeUM3aChu1TnjJM2QANOybtoDM0CmGDMSkY5UP0un6D5M4E5c3WWBnNB1gIIXE6WdoI3someTnYKFU0KnmGnYcA7TUaZ0NibDLbR7eV3ecCrIdk3sgQjDbqGzlHp1o2AS6O5xitH-7X88d09bRYzmerVFE6jWkNuMCKVxzzCqAmikkFOa9kPa1oQfU0o5gCLxTRHDPCdKlzJlmRU1JDoegYXfW1O-_e9xCi2Lh9N3sQGcNlmZGSFC3Fekp5F4IHLXbt1tIfBMGiMyl-TIrOpDiabHM3v3LK9Aail6b5N33bp43Vzm_lp_NNLaI8NM5rL60yQdC_K74AdA-P-w |
| CitedBy_id | crossref_primary_10_1080_23311983_2025_2451513 crossref_primary_10_1109_ACCESS_2021_3056927 |
| Cites_doi | 10.1145/133160.133172 10.1080/01690969108406936 10.1093/oso/9780198235828.003.0011 10.1093/comjnl/41.8.537 10.1093/llc/fqz005 10.1515/PROBUS.2007.001 10.1017/CBO9780511621024 10.1145/564376.564403 10.1145/1183550.1183559 10.1155/2019/6979830 10.3115/977035.977055 10.1016/j.ipm.2004.06.004 10.3115/991250.991324 10.1162/089120100750105920 10.1007/BF00136979 10.1080/09296174.2017.1314411 10.1007/978-3-319-49508-8_25 10.1075/lali.00035.wan 10.1515/9783110214406.165 10.1145/505282.505283 10.1007/11816508_6 10.1109/ICASSP.2014.6854949 10.3115/976909.979622 10.4324/9780203783771 10.1007/978-3-662-45100-7 10.1109/MSP.2006.1598089 10.1515/cllt-2016-0062 10.1007/978-3-642-19400-9_14 10.1016/j.eswa.2015.09.018 10.1017/CBO9780511519871 10.1016/S0306-4573(01)00045-0 |
| ContentType | Journal Article |
| Copyright | 2019 Informa UK Limited, trading as Taylor & Francis Group 2019 2019 Informa UK Limited, trading as Taylor & Francis Group |
| Copyright_xml | – notice: 2019 Informa UK Limited, trading as Taylor & Francis Group 2019 – notice: 2019 Informa UK Limited, trading as Taylor & Francis Group |
| DBID | AAYXX CITATION 7T9 8BM |
| DOI | 10.1080/09296174.2019.1663655 |
| DatabaseName | CrossRef Linguistics and Language Behavior Abstracts (LLBA) ComDisDome |
| DatabaseTitle | CrossRef Linguistics and Language Behavior Abstracts (LLBA) ComDisDome |
| DatabaseTitleList | Linguistics and Language Behavior Abstracts (LLBA) |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Languages & Literatures |
| EISSN | 1744-5035 |
| EndPage | 171 |
| ExternalDocumentID | 10_1080_09296174_2019_1663655 1663655 |
| Genre | Research Article |
| GroupedDBID | .7I .QK 0BK 0R~ 29L 4.4 5GY 5VS AACJB AAGDL AAGZJ AAHIA AAMFJ AAMIU AAPUL AATTQ AAZMC ABCCR ABCCY ABFIM ABIVO ABJNI ABLIJ ABPEM ABTAI ABXUL ABXYU ABZLS ACGFS ACHQT ACTIO ACTOA ADAHI ADCVX ADKVQ ADLRE ADXPE AECIN AEFOU AEISY AEKEX AEOZL AEPSL AERSA AEYOC AEZRU AFRVT AGDLA AGMYJ AGRBW AHDZW AIJEM AIYEW AJWEG AKBVH ALMA_UNASSIGNED_HOLDINGS ALQZU AQTUD AVBZW AWYRJ BEJHT BLEHA BMOTO BOHLJ CCCUG CQ1 CS3 DGFLZ DKSSO DU5 EBS E~B E~C F5P G-F GTTXZ H13 HF~ HZ~ IPNFZ J.O KYCEM LJTGL M4Z MLAFT NA5 NV0 O9- P2P RIG RNANH ROSJB RSYQP S-F STATR TASJS TBQAZ TDBHL TEA TFH TFL TFW TNTFI TRJHH TUROJ UT5 UT9 VAE ~01 ~S~ AAYXX CITATION 7T9 8BM |
| ID | FETCH-LOGICAL-c338t-de070c6b606beed1c5ace46bad8b373f82303e67c1f60515f9f45a57431de7c3 |
| IEDL.DBID | TFW |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000487866900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0929-6174 |
| IngestDate | Sat Nov 08 19:10:17 EST 2025 Sat Nov 29 03:58:20 EST 2025 Tue Nov 18 22:08:53 EST 2025 Mon Oct 20 23:48:13 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c338t-de070c6b606beed1c5ace46bad8b373f82303e67c1f60515f9f45a57431de7c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-8526-5520 0000-0003-0083-5895 |
| PQID | 2509921917 |
| PQPubID | 2038271 |
| PageCount | 34 |
| ParticipantIDs | crossref_primary_10_1080_09296174_2019_1663655 crossref_citationtrail_10_1080_09296174_2019_1663655 informaworld_taylorfrancis_310_1080_09296174_2019_1663655 proquest_journals_2509921917 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-04-03 |
| PublicationDateYYYYMMDD | 2021-04-03 |
| PublicationDate_xml | – month: 04 year: 2021 text: 2021-04-03 day: 03 |
| PublicationDecade | 2020 |
| PublicationPlace | Lisse |
| PublicationPlace_xml | – name: Lisse |
| PublicationTitle | Journal of quantitative linguistics |
| PublicationYear | 2021 |
| Publisher | Routledge Taylor & Francis Ltd |
| Publisher_xml | – name: Routledge – name: Taylor & Francis Ltd |
| References | Fang C. A. (cit0011) 2015 Mehler A. (cit0034) 2007; 22 cit0032 Lidy T. (cit0027) 2016 cit0030 Quirk R. (cit0041) 1985 Rish I. (cit0042) 2001; 3 Wan M. Y. (cit0051) 2018; 9 Liu M. C. (cit0031) 2019; 2 cit0037 cit0038 Bekkerman R. (cit0001) 2004 cit0035 cit0036 Bird S. (cit0005) 2009 cit0022 Wan M. Y. (cit0050) 2018 cit0023 cit0021 Selic B. (cit0044) 1994; 2 Martin J. R. (cit0033) 1984 cit0028 cit0029 cit0026 Platt J. (cit0039) 1998; 3 cit0024 cit0025 cit0055 cit0053 cit0010 cit0019 cit0017 Hou R. (cit0014) cit0018 cit0015 cit0016 cit0013 cit0045 cit0043 cit0040 Fürnkranz J. (cit0012) 1998; 3 Fang A. C. (cit0009) 1996 Karlgren J. (cit0020) 2004 Witten I. H. (cit0054) 2016 cit0008 Wan M. Y. (cit0052) 2019 cit0006 cit0007 cit0004 cit0048 cit0049 cit0002 cit0046 cit0003 cit0047 |
| References_xml | – volume-title: Proceedings of MIREX2016 (pp. 1–4). New York, USA. year: 2016 ident: cit0027 – ident: cit0025 doi: 10.1145/133160.133172 – ident: cit0036 doi: 10.1080/01690969108406936 – ident: cit0040 – start-page: 142 volume-title: Comparing English world wide: The international corpus of English year: 1996 ident: cit0009 doi: 10.1093/oso/9780198235828.003.0011 – ident: cit0026 doi: 10.1093/comjnl/41.8.537 – volume: 2 volume-title: Real-time object-oriented modeling year: 1994 ident: cit0044 – ident: cit0015 doi: 10.1093/llc/fqz005 – ident: cit0007 doi: 10.1515/PROBUS.2007.001 – ident: cit0006 – volume-title: Proceedings of AAAI fall symposium on style and meaning in language, art and music year: 2004 ident: cit0020 – ident: cit0002 doi: 10.1017/CBO9780511621024 – ident: cit0024 doi: 10.1145/564376.564403 – ident: cit0029 doi: 10.1145/1183550.1183559 – start-page: 122 volume-title: Proceedings of the first workshop on financial technology and natural language processing year: 2019 ident: cit0052 – ident: cit0038 doi: 10.1155/2019/6979830 – ident: cit0035 – ident: cit0018 – ident: cit0055 doi: 10.3115/977035.977055 – ident: cit0028 doi: 10.1016/j.ipm.2004.06.004 – ident: cit0021 doi: 10.3115/991250.991324 – ident: cit0046 doi: 10.1162/089120100750105920 – volume: 3 start-page: 1 issue: 1998 year: 1998 ident: cit0012 publication-title: Austrian Research Institute for Artificial Intelligence – ident: cit0049 – volume: 2 start-page: 42 year: 2019 ident: cit0031 publication-title: 辞书研究 [Lexicographical Studies] – volume: 22 start-page: 51 issue: 2 year: 2007 ident: cit0034 publication-title: LDV Forum – ident: cit0003 doi: 10.1007/BF00136979 – ident: cit0016 doi: 10.1080/09296174.2017.1314411 – volume-title: Natural language processing with Python: Analyzing text with the natural language toolkit year: 2009 ident: cit0005 – ident: cit0030 doi: 10.1007/978-3-319-49508-8_25 – volume-title: Language, register and genre in children’s writing year: 1984 ident: cit0033 – ident: cit0053 doi: 10.1075/lali.00035.wan – ident: cit0014 publication-title: Journal of Natural Language Engineering – ident: cit0047 doi: 10.1515/9783110214406.165 – ident: cit0008 doi: 10.1145/505282.505283 – ident: cit0010 doi: 10.1007/11816508_6 – ident: cit0045 doi: 10.1109/ICASSP.2014.6854949 – ident: cit0022 doi: 10.3115/976909.979622 – volume-title: Data mining: Practical machine learning tools and techniques year: 2016 ident: cit0054 – volume-title: Using bigrams in text categorization year: 2004 ident: cit0001 – ident: cit0023 – ident: cit0013 doi: 10.4324/9780203783771 – volume-title: Text genres and registers: The computation of linguistic features year: 2015 ident: cit0011 doi: 10.1007/978-3-662-45100-7 – ident: cit0043 doi: 10.1109/MSP.2006.1598089 – ident: cit0017 doi: 10.1515/cllt-2016-0062 – ident: cit0032 doi: 10.1007/978-3-642-19400-9_14 – volume-title: A comprehensive grammar of the English language year: 1985 ident: cit0041 – ident: cit0037 doi: 10.1016/j.eswa.2015.09.018 – ident: cit0004 doi: 10.1017/CBO9780511519871 – ident: cit0048 doi: 10.1016/S0306-4573(01)00045-0 – volume: 9 start-page: 33 issue: 2 year: 2018 ident: cit0051 publication-title: International Journal of Knowledge and Language Processing – volume-title: Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation year: 2018 ident: cit0050 – volume: 3 start-page: 88 issue: 1 year: 1998 ident: cit0039 publication-title: MSRTR: Microsoft Research – volume: 3 start-page: 41 issue: 22 year: 2001 ident: cit0042 publication-title: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence – ident: cit0019 |
| SSID | ssj0017536 |
| Score | 2.199057 |
| Snippet | Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work... |
| SourceID | proquest crossref informaworld |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 138 |
| SubjectTerms | Academic writing Classification Computer generated language analysis Cues Differentiation Discrimination Error analysis Fiction Genre Linguistic complexity Machine learning News media Noun phrases Phrases Prepositional phrases Retrieval Speech Speeches Structural aspects Syntactic features Syntactic structures Writing |
| Title | The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification |
| URI | https://www.tandfonline.com/doi/abs/10.1080/09296174.2019.1663655 https://www.proquest.com/docview/2509921917 |
| Volume | 28 |
| WOSCitedRecordID | wos000487866900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAWR databaseName: Taylor and Francis Online Journals customDbUrl: eissn: 1744-5035 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017536 issn: 0929-6174 databaseCode: TFW dateStart: 19940101 isFulltext: true titleUrlDefault: https://www.tandfonline.com providerName: Taylor & Francis |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA5SPHjx_ahWyUG8bXU3-2iORa0eShEt2pNhk02gULaluxX8985ks8Ui0oPed8KSTOaVb74h5NLgLBo_k56SWnrgITKP6zDyfAg1tAStzizjzWs_GQw6oxF_cmjCwsEqMYc2FVGEtdV4uVNZ1Ii46xtw6eB4sSLi87YPPjOOsM0cXD_OMBj23pbvCBCM29dKjiAPEKl7eH5bZcU7rXCX_rDV1gH1dv7h13fJtos-abdSlz2yofN9ctx3NcuCXtH-kma5OCDvoET0boyWBREztWWkU0NdJXFCXz7z0jZa0WcLqnW9THlBxzntLsqp5YSlD6in1I7gRHCS_eSQDHv3w9tHzw1k8BRksqWXaTAQKpaQ9Ejwrb6KUqXDWKZZR7KEGXy0YzpOlG9inB1juAmjNMIgJdOJYkekkU9zfUIokt6kDNZRTIYq5jKMVSCDzKhOlAaKN0lYn4NQjqwcZ2ZMhF9zmrqdFLiTwu1kk7SXYrOKrWOdAP9-yKK0ZRJTzTQRbI1sq9YI4S5-ISCi5DzAJPj0D0ufka0AsTOIEGIt0ijnC31ONtVHOS7mF1bFvwAYh_fd |
| linkProvider | Taylor & Francis |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NS8MwFA86Bb34_TGdmoN4q9qPtMtxqHNi3UGH7mRo0gQGo5O1E_zvzUvTsSGyg977Qkle3ld-7_cQOlcwi8ZNuSO45I72EKlDZUAcV4cakmutTg3jzWscdbvNfp_O9sIArBJyaFUSRRhbDZcbitEVJO7qWvt07XmhJOLSS1c7zZCQZbRCtK8FLe-136YvCTocN--VFGAeWqbq4vltmTn_NMde-sNaGxfU3vyPn99CGzYAxa1SY7bRksx20EFsy5Y5vsDxlGk530XvWo_w7QCMC4BmKuOIRwrbYuIQv3xlhem1ws8GV2vbmbIcDzLcmhQjQwuL70FVsZnCCfgk88ke6rXvejcdx85kcIROZgsnldpGiJDrvIdr9-oKkggZhDxJm9yPfAXvdr4MI-GqEMbHKKoCkhCIU1IZCX8f1bJRJg8RBt6bxNfrCJ8HIqQ8CIXHvVSJJkk8QesoqA6CCctXDmMzhsytaE3tTjLYSWZ3so4up2IfJWHHIgE6e8qsMJUSVY41Yf4C2UalEsze_ZzpoJJSD_Lgoz8sfYbWOr2nmMUP3cdjtO4BlAYAQ34D1YrxRJ6gVfFZDPLxqdH3b9Ms_Ag |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA5aRbz4flSr5iDetrrv5lisq-JSihbtybDJJlAo29LdCv57M9lssYj0oPedsCSTeeWbbxC6lDCLxk6ZxZlglvIQqUWE51u2CjUEU1qdasab1zjsdluDAekZNGFuYJWQQ8uSKELbarjck1RWiLjrG-XSleOFiohNmrbymYHvr6I1FToHkH_1o7f5Q4KKxvVzJQGUh5Kpmnh-W2bBPS2Ql_4w1toDRdv_8O87aMuEn7hd6ssuWhHZHjqKTdEyx1c4nvMs5_voXWkR7gzBtABkpjKNeCyxKSWO8MtnVuhOK_ysUbWmmSnL8TDD7Vkx1qSw-B4UFesZnIBO0p8coH501799sMxEBourVLawUqEsBA-YynqYcq429xMuvIAlaYu5oSvh1c4VQchtGcDwGEmk5yc-RCmpCLl7iGrZOBPHCAPrTeKqdbjLPB4Q5gXcYU4qectPHE7qyKvOgXLDVg5DM0bUrkhNzU5S2ElqdrKOmnOxSUnXsUyAfD9kWug6iSyHmlB3iWyj0ghqbn5OVUhJiANZ8Mkflr5AG71OROPH7tMp2nQARwNoIbeBasV0Js7QOv8ohvn0XGv7F9aK-rk |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Discriminativeness+of+Internal+Syntactic+Representations+in+Automatic+Genre+Classification&rft.jtitle=Journal+of+quantitative+linguistics&rft.au=Wan%2C+Mingyu&rft.au=Fang%2C+Alex+Chengyu&rft.au=Chu-Ren%2C+Huang&rft.date=2021-04-03&rft.pub=Taylor+%26+Francis+Ltd&rft.issn=0929-6174&rft.eissn=1744-5035&rft.volume=28&rft.issue=2&rft.spage=138&rft.epage=171&rft_id=info:doi/10.1080%2F09296174.2019.1663655&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0929-6174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0929-6174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0929-6174&client=summon |