On the classification of text documents taking into account their structural features

A modification of the conventional bag of words model that can take into account the structural features of text documents in their classification (categorization) using machine learning techniques is studied. It is proposed to describe these features by relations on the set of certain lexemes and u...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computer & systems sciences international Vol. 55; no. 3; pp. 394 - 403
Main Authors: Gulin, V. V., Frolov, A. B.
Format: Journal Article
Language:English
Published: Moscow Pleiades Publishing 01.05.2016
Springer Nature B.V
Subjects:
ISSN:1064-2307, 1555-6530
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract A modification of the conventional bag of words model that can take into account the structural features of text documents in their classification (categorization) using machine learning techniques is studied. It is proposed to describe these features by relations on the set of certain lexemes and use the relation names, along with the lexeme names, as features. This is a distinction from the conventional model in which only unary relations are used. The effectiveness of the proposed machine learning techniques is analyzed using computer experiments on the class of the Reuters-21578 collection with eight known classifiers. It is shown that it is reasonable to apply the proposed models to classify documents using simple classifiers.
AbstractList A modification of the conventional bag of words model that can take into account the structural features of text documents in their classification (categorization) using machine learning techniques is studied. It is proposed to describe these features by relations on the set of certain lexemes and use the relation names, along with the lexeme names, as features. This is a distinction from the conventional model in which only unary relations are used. The effectiveness of the proposed machine learning techniques is analyzed using computer experiments on the class of the Reuters-21578 collection with eight known classifiers. It is shown that it is reasonable to apply the proposed models to classify documents using simple classifiers.
Author Frolov, A. B.
Gulin, V. V.
Author_xml – sequence: 1
  givenname: V. V.
  surname: Gulin
  fullname: Gulin, V. V.
  email: gulin.vladimir@gmail.com
  organization: Moscow Power Engineering Institute (National Research University)
– sequence: 2
  givenname: A. B.
  surname: Frolov
  fullname: Frolov, A. B.
  organization: Moscow Power Engineering Institute (National Research University)
BookMark eNp9kE1LAzEQhoNUsK3-AG8BL15WJ5vs11GKX1DoQXtestnZmrpNapIF_fem1oNU9DQD8zwzwzshI2MNEnLO4IoxLq6fGOQi5VCwHDgwSI_ImGVZluQZh1Hs4zjZzU_IxPs1AK9yEGOyXBgaXpCqXnqvO61k0NZQ29GA74G2Vg0bNMHTIF-1WVFtgqVSKTuYsBO1oz64QYXByZ52KGOD_pQcd7L3ePZdp2R5d_s8e0jmi_vH2c08UVxUIRFctB1veJOiQlW0PEdopGpYhVWTylI0jKWdyNIqA-R5Lsqu5aoRAopSlZGfksv93q2zbwP6UG-0V9j30qAdfM3KNGYAgvGIXhygazs4E7-LFAAUcWsZKbanlLPeO-zqrdMb6T5qBvUu6PpX0NEpDhylw1eMwUnd_2ume9PHK2aF7sdPf0qfwO-SLA
CitedBy_id crossref_primary_10_1007_s00500_020_05209_8
Cites_doi 10.1145/361219.361220
10.1023/A:1015142527070
10.1134/S1064230710010089
10.1017/CBO9780511809071
10.1006/jcss.1997.1504
10.1023/A:1015190410232
10.1145/381854.381890
10.1007/978-1-4757-2440-0
10.1080/00437956.1954.11659520
10.1023/A:1010933404324
10.1145/505282.505283
ContentType Journal Article
Copyright Pleiades Publishing, Ltd. 2016
Copyright_xml – notice: Pleiades Publishing, Ltd. 2016
DBID AAYXX
CITATION
3V.
7SC
7SP
7WY
7WZ
7XB
87Z
8AL
8FD
8FE
8FG
8FK
8FL
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
L.0
L6V
L7M
L~C
L~D
M0C
M0N
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
PYYUZ
Q9U
DOI 10.1134/S1064230716030102
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Global (Alumni Edition)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni)
ProQuest SciTech Premium Collection Technology Collection Materials Science & Engineering Database
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
ProQuest Technology Collection
ProQuest One
ProQuest Central Korea
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
ABI/INFORM Professional Standard
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
Engineering Collection
ABI/INFORM Collection China
ProQuest Central Basic
DatabaseTitle CrossRef
ProQuest Business Collection (Alumni Edition)
Computer Science Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ABI/INFORM Complete
ProQuest One Applied & Life Sciences
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
Engineering Database
ProQuest One Academic Eastern Edition
Electronics & Communications Abstracts
ProQuest Technology Collection
ProQuest Business Collection
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ABI/INFORM Global (Corporate)
ProQuest One Business
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
ABI/INFORM Professional Standard
ProQuest Central Korea
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
ProQuest Computing
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ABI/INFORM China
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList
ProQuest Business Collection (Alumni Edition)
Technology Research Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
Computer Science
EISSN 1555-6530
EndPage 403
ExternalDocumentID 4103064091
10_1134_S1064230716030102
Genre Feature
GroupedDBID -5B
-5G
-BR
-EM
-Y2
-~C
.4S
.VR
06D
0R~
0VY
1N0
29K
29~
2J2
2JN
2JY
2KG
2KM
2LR
2VQ
2~H
30V
3V.
4.4
408
40D
40E
5GY
5VS
6NX
7WY
8FE
8FG
8FL
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACGFO
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACREN
ACSNA
ACZOJ
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AETLH
AEVLU
AEXYK
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AZFZN
AZQEC
B-.
BA0
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
D-I
DDRTE
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GROUPED_ABI_INFORM_COMPLETE
H13
HCIFZ
HF~
HG6
HLICF
HMJXF
HRMNR
HVGLF
HZ~
IJ-
IKXTQ
IWAJR
IXD
I~X
I~Z
J-C
JBSCW
JZLTJ
K60
K6V
K6~
K7-
KOV
LLZTM
M0C
M0N
M4Y
MA-
MK~
ML~
N2Q
NB0
NPVJJ
NQJWS
NU0
O9-
O93
O9J
P2P
P62
P9P
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PT4
Q2X
QOS
R89
R9I
RIG
RNS
ROL
RSV
S16
S1Z
S27
S3B
SAP
SDH
SEG
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TSG
TUC
TUS
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WH7
WK8
XU3
YLTOR
Z7R
ZMTXR
~A9
AAPKM
AAYXX
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFFHD
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
CITATION
M7S
PHGZM
PHGZT
PQGLB
PTHSS
7SC
7SP
7XB
8AL
8FD
8FK
JQ2
L.-
L.0
L6V
L7M
L~C
L~D
PKEHL
PQEST
PQUKI
PUEGO
Q9U
ID FETCH-LOGICAL-c349t-434df3b3b2ecec7d36e0bacb19e9b2a84b112f452950e36648fd3cb44078c87d3
IEDL.DBID RSV
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000379020700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1064-2307
IngestDate Sun Sep 28 02:18:12 EDT 2025
Wed Sep 17 23:55:17 EDT 2025
Sat Nov 29 01:44:19 EST 2025
Tue Nov 18 22:25:48 EST 2025
Fri Feb 21 02:38:35 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c349t-434df3b3b2ecec7d36e0bacb19e9b2a84b112f452950e36648fd3cb44078c87d3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
PQID 1800074408
PQPubID 326259
PageCount 10
ParticipantIDs proquest_miscellaneous_1825550413
proquest_journals_1800074408
crossref_primary_10_1134_S1064230716030102
crossref_citationtrail_10_1134_S1064230716030102
springer_journals_10_1134_S1064230716030102
PublicationCentury 2000
PublicationDate 20160500
2016-5-00
20160501
PublicationDateYYYYMMDD 2016-05-01
PublicationDate_xml – month: 5
  year: 2016
  text: 20160500
PublicationDecade 2010
PublicationPlace Moscow
PublicationPlace_xml – name: Moscow
– name: Silver Spring
PublicationTitle Journal of computer & systems sciences international
PublicationTitleAbbrev J. Comput. Syst. Sci. Int
PublicationYear 2016
Publisher Pleiades Publishing
Springer Nature B.V
Publisher_xml – name: Pleiades Publishing
– name: Springer Nature B.V
References Joachims (CR5) 1998
Metzler, Strohman (CR8) 2010
Quinlan (CR25) 1993
Baeza-Yates, Baeza-Yates, Navarro (CR9) 1996; 25
Scott, Matwin (CR10) 1999
Frolov, Jako, Mezey (CR17) 2001; 30
Mezey (CR19) 1993
Vapnik, Chervonenkis (CR22) 1974
Hofmann, Cai (CR4) 2003
CR30
Freund, Schapire (CR26) 1997; 55
Manning, Raghavan, Schutze (CR6) 2008
Salton, Wong, Yang (CR13) 1975; 18
Buttcher, Clarke, Cormack (CR14) 2010
Tibshirani, Friedman (CR24) 2009
Harris (CR7) 1954; 10
Schapire (CR3) 1990
Manning, Schutze (CR11) 1999
Frolov (CR16) 2010; 49
Vapnik (CR2) 1995
Sebastiani (CR1) 2002; 34
Cavnar, Trenkle (CR12) 1994
Frolov, Jako, Mezey (CR18) 2001; 30
CR23
Gulin (CR15) 2011; 4
CR20
Breiman (CR27) 2001; 45
Gulin (CR28) 2012; 6
Gulin (CR29) 2013
van Rijsbergen (CR21) 1979
Zhuravlev, Ryazanov, Sen’ko (CR31) 2006
V. V. Gulin (6597_CR28) 2012; 6
V. V. Gulin (6597_CR29) 2013
P. G. Mezey (6597_CR19) 1993
V. Vapnik (6597_CR2) 1995
D. Metzler (6597_CR8) 2010
6597_CR20
V. K. Vapnik (6597_CR22) 1974
Yu. I. Zhuravlev (6597_CR31) 2006
C. Manning (6597_CR6) 2008
R. Baeza-Yates (6597_CR9) 1996; 25
A. B. Frolov (6597_CR16) 2010; 49
6597_CR23
V. V. Gulin (6597_CR15) 2011; 4
A. Frolov (6597_CR17) 2001; 30
L. Breiman (6597_CR27) 2001; 45
T. Hofmann (6597_CR4) 2003
J. R. Quinlan (6597_CR25) 1993
R. Tibshirani (6597_CR24) 2009
Y. Freund (6597_CR26) 1997; 55
S. Scott (6597_CR10) 1999
C. Buttcher (6597_CR14) 2010
6597_CR30
C. J. van Rijsbergen (6597_CR21) 1979
F. Sebastiani (6597_CR1) 2002; 34
Z. Harris (6597_CR7) 1954; 10
W. Cavnar (6597_CR12) 1994
G. Salton (6597_CR13) 1975; 18
T. Joachims (6597_CR5) 1998
D. Manning (6597_CR11) 1999
A. Frolov (6597_CR18) 2001; 30
R. Schapire (6597_CR3) 1990
References_xml – volume: 4
  start-page: 100
  year: 2011
  end-page: 108
  ident: CR15
  article-title: A comparative analysis of text document classification methods
  publication-title: Vestn. MEI
– start-page: 137
  year: 1998
  end-page: 142
  ident: CR5
  article-title: Text categorization with support vector machines: learning with many relevant features
  publication-title: in Pro-ceedings of the 10th European Conference on Machine Learning
– year: 1979
  ident: CR21
  publication-title: Information Retrieval
– ident: CR30
– year: 2010
  ident: CR8
  publication-title: Search Engines: Information Retrieval in Practice
– start-page: 161
  year: 1994
  end-page: 175
  ident: CR12
  article-title: N-Gram-based text categorization
  publication-title: in Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval SDAIR-94, Las Vegas, NV
– volume: 18
  start-page: 613
  issue: 11
  year: 1975
  end-page: 620
  ident: CR13
  article-title: A vector space model for automatic indexing
  publication-title: Commun. ACM
  doi: 10.1145/361219.361220
– volume: 30
  start-page: 411
  year: 2001
  end-page: 428
  ident: CR18
  article-title: Metric properties of factor space of molecular shapes
  publication-title: Math. Chem.
  doi: 10.1023/A:1015142527070
– volume: 6
  start-page: 124
  year: 2012
  end-page: 131
  ident: CR28
  article-title: Study of gradien boosting method on “inattentive” decision trees in text documents classification problem
  publication-title: Vestn. MEI
– volume: 49
  start-page: 65
  year: 2010
  ident: CR16
  article-title: A finite topology principle in recognizing topological forms
  publication-title: J. Comput. Syst. Sci. Int.
  doi: 10.1134/S1064230710010089
– year: 1993
  ident: CR19
  publication-title: Shape in Chemistry: An Introduction to Molecular Shape Topology
– ident: CR23
– year: 2008
  ident: CR6
  publication-title: Introduction to Information Retrieval
  doi: 10.1017/CBO9780511809071
– volume: 55
  start-page: 119
  year: 1997
  end-page: 139
  ident: CR26
  article-title: Learning and an application to boosting
  publication-title: J. Comput. Syst. Sci.
  doi: 10.1006/jcss.1997.1504
– year: 2010
  ident: CR14
  publication-title: Information Retrieval: Implementing and Evaluating Search Engines
– volume: 30
  start-page: 389
  year: 2001
  end-page: 409
  ident: CR17
  article-title: Logical models of molecular shapes and their families
  publication-title: Math. Chem.
  doi: 10.1023/A:1015190410232
– year: 1999
  ident: CR11
  publication-title: Foundations of Statistical Natural Language Processing
– year: 1974
  ident: CR22
  publication-title: Theory of Pattern Recognition
– year: 2013
  ident: CR29
  publication-title: Certificate of official registration of the computer program No. 2013612095, Machine Learning Library
– start-page: 197
  year: 1990
  end-page: 227
  ident: CR3
  article-title: The strength of weak Learnability
  publication-title: in Machine Learning
– volume: 25
  start-page: 67
  issue: 1
  year: 1996
  end-page: 79
  ident: CR9
  article-title: Integrating contents and structure in text retrieval
  publication-title: ACM SIGMOD Record
  doi: 10.1145/381854.381890
– start-page: 370
  year: 1999
  end-page: 388
  ident: CR10
  article-title: Feature engineering for text classification
  publication-title: in Proceedings of 16th International Con-ference on Machine Learning ICML-99, Bled, Slovenia
– year: 2006
  ident: CR31
  publication-title: Recognition. Mathematical Methods. Softwave System. Prac-tical Applications
– year: 1995
  ident: CR2
  publication-title: The Nature of Statistical Learning Theory
  doi: 10.1007/978-1-4757-2440-0
– volume: 10
  start-page: 146
  issue: 23
  year: 1954
  end-page: 162
  ident: CR7
  article-title: Distributional structure
  publication-title: Word
  doi: 10.1080/00437956.1954.11659520
– volume: 45
  start-page: 5
  issue: 1
  year: 2001
  end-page: 32
  ident: CR27
  article-title: Random forests
  publication-title: Machine Learning
  doi: 10.1023/A:1010933404324
– year: 2009
  ident: CR24
  publication-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics
– year: 1993
  ident: CR25
  publication-title: C4.5: Programs for Machine Learning
– volume: 34
  start-page: 1
  issue: 1
  year: 2002
  end-page: 47
  ident: CR1
  article-title: Machine learning in automated text categorization
  publication-title: ACM Comput. Surv.
  doi: 10.1145/505282.505283
– start-page: 182
  year: 2003
  end-page: 189
  ident: CR4
  article-title: Text categorization by boosting automatically extracted concepts
  publication-title: in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
– ident: CR20
– volume-title: Search Engines: Information Retrieval in Practice
  year: 2010
  ident: 6597_CR8
– start-page: 161
  volume-title: in Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval SDAIR-94, Las Vegas, NV
  year: 1994
  ident: 6597_CR12
– volume: 6
  start-page: 124
  year: 2012
  ident: 6597_CR28
  publication-title: Vestn. MEI
– start-page: 137
  volume-title: in Pro-ceedings of the 10th European Conference on Machine Learning
  year: 1998
  ident: 6597_CR5
– volume: 45
  start-page: 5
  issue: 1
  year: 2001
  ident: 6597_CR27
  publication-title: Machine Learning
  doi: 10.1023/A:1010933404324
– volume: 25
  start-page: 67
  issue: 1
  year: 1996
  ident: 6597_CR9
  publication-title: ACM SIGMOD Record
  doi: 10.1145/381854.381890
– volume-title: Foundations of Statistical Natural Language Processing
  year: 1999
  ident: 6597_CR11
– ident: 6597_CR30
– volume-title: Information Retrieval
  year: 1979
  ident: 6597_CR21
– ident: 6597_CR23
– volume: 49
  start-page: 65
  year: 2010
  ident: 6597_CR16
  publication-title: J. Comput. Syst. Sci. Int.
  doi: 10.1134/S1064230710010089
– volume: 4
  start-page: 100
  year: 2011
  ident: 6597_CR15
  publication-title: Vestn. MEI
– volume-title: Introduction to Information Retrieval
  year: 2008
  ident: 6597_CR6
  doi: 10.1017/CBO9780511809071
– volume-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics
  year: 2009
  ident: 6597_CR24
– volume: 18
  start-page: 613
  issue: 11
  year: 1975
  ident: 6597_CR13
  publication-title: Commun. ACM
  doi: 10.1145/361219.361220
– volume-title: The Nature of Statistical Learning Theory
  year: 1995
  ident: 6597_CR2
  doi: 10.1007/978-1-4757-2440-0
– volume-title: Recognition. Mathematical Methods. Softwave System. Prac-tical Applications
  year: 2006
  ident: 6597_CR31
– volume-title: Shape in Chemistry: An Introduction to Molecular Shape Topology
  year: 1993
  ident: 6597_CR19
– volume: 10
  start-page: 146
  issue: 23
  year: 1954
  ident: 6597_CR7
  publication-title: Word
  doi: 10.1080/00437956.1954.11659520
– start-page: 197
  volume-title: in Machine Learning
  year: 1990
  ident: 6597_CR3
– volume: 34
  start-page: 1
  issue: 1
  year: 2002
  ident: 6597_CR1
  publication-title: ACM Comput. Surv.
  doi: 10.1145/505282.505283
– volume: 30
  start-page: 389
  year: 2001
  ident: 6597_CR17
  publication-title: Math. Chem.
  doi: 10.1023/A:1015190410232
– volume-title: Information Retrieval: Implementing and Evaluating Search Engines
  year: 2010
  ident: 6597_CR14
– start-page: 182
  volume-title: in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
  year: 2003
  ident: 6597_CR4
– start-page: 370
  volume-title: in Proceedings of 16th International Con-ference on Machine Learning ICML-99, Bled, Slovenia
  year: 1999
  ident: 6597_CR10
– volume: 30
  start-page: 411
  year: 2001
  ident: 6597_CR18
  publication-title: Math. Chem.
  doi: 10.1023/A:1015142527070
– volume-title: C4.5: Programs for Machine Learning
  year: 1993
  ident: 6597_CR25
– volume: 55
  start-page: 119
  year: 1997
  ident: 6597_CR26
  publication-title: J. Comput. Syst. Sci.
  doi: 10.1006/jcss.1997.1504
– volume-title: Certificate of official registration of the computer program No. 2013612095, Machine Learning Library
  year: 2013
  ident: 6597_CR29
– ident: 6597_CR20
– volume-title: Theory of Pattern Recognition
  year: 1974
  ident: 6597_CR22
SSID ssj0039604
Score 2.0087345
Snippet A modification of the conventional bag of words model that can take into account the structural features of text documents in their classification...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 394
SubjectTerms Artificial intelligence
Classification
Classifiers
Collection
Computer science
Computer simulation
Control
Dictionaries
Documents
Engineering
Machine learning
Mathematical analysis
Mathematical models
Mechatronics
Names
Pattern Recognition and Image Processing
Random variables
Robotics
Studies
Text categorization
Texts
SummonAdditionalLinks – databaseName: Computer Science Database
  dbid: K7-
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFH64HfTgMiqOGxE8uBBsm3Q7iYgiCCqoMLeSrSBIO9oZf795aTpu6MVrmjaF95J86ff6fQD7mUhiHQeGCnu-ojw3iqKsGGVayJQnQsRcO7OJ9OYmGwzyO__BrfFlld2a6BZqXSv8Rn4SZm6740F2Onyh6BqF7Kq30JiG2TCKQszz65R2KzFD4RHHdiacYsGzZzVDxk_usRHb0GYZddW-7ksfYPMbP-q2ncul_77wMix6wEnO2gxZgSlT9WCpM3Mgfm73YOGTMmEPVnx7Qw68MPXhKjzeVsTiRaIQcWOJkYsqqUuC5SPEvtLY_TBHRs7iijxVo5qI1o6COEaCtHK1KPVBSuM0RZs1eLy8eDi_ot6WgSrG8xHljOuSSSYjo4xKNUtMIIWSYW5yGYmMS4vhSiR0bQ6wJOFZqZmSHBlDldn-6zBT1ZXZABLJNLb4lMUKRXJKIXUYBUpbUFfyMBRBH4IuKIXymuVonfFcuLML48WPOPbhaHLLsBXs-Kvzdhe7ws_dpvgIXB_2JpftrEMqRVSmHmMfexSLA4sA-nDcZcinR_w24ObfA27BvAVkSVtQuQ0zNiZmB-bU2-iped11yf0O8ab8HA
  priority: 102
  providerName: ProQuest
Title On the classification of text documents taking into account their structural features
URI https://link.springer.com/article/10.1134/S1064230716030102
https://www.proquest.com/docview/1800074408
https://www.proquest.com/docview/1825550413
Volume 55
WOSCitedRecordID wos000379020700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: ABI/INFORM Collection
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: 7WY
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/abicomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ABI/INFORM Global
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: M0C
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/abiglobal
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: P5Z
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: K7-
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: M7S
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: BENPR
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLink Journals
  customDbUrl:
  eissn: 1555-6530
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0039604
  issn: 1064-2307
  databaseCode: RSV
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Za9wwEB6a46F9aJpNSrdNFwX60CSI2Cv5emxDQqFku2ST5ngxugwLxQ7xpr-_M7KcJr0gfRHYHh94NNInZvR9AO9ylSY2iRxXuL7isnCGE60YF1bpTKZKJdJ6sYlsMskvLopp2Mfd9tXufUrSj9Sd7ojcn8WElbFLkjAyMaEtwQrOdjlF48nsaz_8CmIb8SnOVHIyD6nMPz7i4WT0E2H-khT1c83R2n995Qt4HqAl-9D1hXV44uoBrPWyDSxE8QCe3eMgHMB6ON-y94GCemcDzr7UDJEhM4StqZjI-481FaNCEWYbc-u3xrGFF7Ni83rRMNUJTzCfe2AdMS2RerDKefbQdhPOjg5PDz7xIMDAjZDFgkshbSW00GNnnMmsSF2kldFx4Qo9VrnUiNYqSt2it0WayryywmhJuUGTo_1LWK6b2r0CNtZZgkhUJIbocCqlbTyOjEX4Vsk4VtEQot4TpQns5CSS8a30qxQhy9_-7BB272657qg5_mW81bu3DFHalnHuIZSM8iFs313G-KKkiapdc0s2uOhKIpzrh7DXu_zeI_72wtePsn4DTxGJpV0l5RYso4vcW1g13xfz9mYES9n55QhWPh5Opid49Dnj2B5HB9RmM2ynydXIh8EPlL73Sg
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3fS90wFD6oG8w9zHmneKdzGUzYlGDbpL3twxjDTRT1KqjgW82vgiCt2uvG_qn9jZ6TtuoUffPBx6ZpQpKTk3N6Tr4P4HOqktjGgeMK_SsuM2c4wYpxYZUeyESpWFpPNjEYDtOjo2xvDP51d2EorbLTiV5R28rQP_LVMPXHnQzS72fnnFijKLraUWg0YrHl_v5Bl63-tvkT13cpitZ_Haxt8JZVgBshsxGXQtpCaKEjZ5wZWJG4QCujw8xlOlKp1GiCFBSPxCGIJJFpYYXRkgJeJsX62O44vMBnQYpgJ1jrNL8goBMfXU0kpwTrNooaCrm6T4VURrTOhOP2_zl4Y9zeicf6Y2596rlN0Ft40xrU7EezA6ZhzJU9mOrIKliru3rw-hbyYg-m2_KafWmBt7--g8PdkqE9zAx5FJRC5aWWVQWj9BiGU3DpLwSykafwYiflqGKqodtgPuLCGjhegjJhhfOYqfUMHD7J-GdhoqxKNwcs0oMY7W8RGwIBKpS2YRQYi0ZrIcNQBX0IOiHITYvJTtQgp7n3zYTM78lNH5avPzlrAEkeq7zQyUre6qY6vxGUPny6fo1ahUJFqnTVJdVBVzMO0MLpw0onkbeaeKjD9493-BFebRzsbOfbm8OteZhE4zNpkkcXYALXx32Al-b36KS-WPQbi8HxUwvqFTPDWgM
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1ZS8QwEB50FdEHj1VxPSP44EGx3aTd9lHURVFWwQPfSq6CIK3Y6u83k6brLYiv7TQtnUn7hW_yfQBbMY9CFfra42Z95bFESw9lxTyquOixiPOQKWs20RsM4ru75NL5nJZNt3tDSdZ7GlClKa_2H1XmPEjY_lWAuNmUJ5okoyraKIwx9AzC5frVbfMppqg8YunOiHkY7mjNb4f4-GN6Q5ufCFL73-nP_PuJZ2HaQU5yUNfIHIzovA0zjZ0DcbO7DVPvtAnbMOeOl2TbSVPvzMPNRU4MYiQSMTc2Gdm8kiIj2EBCVCGf7ZY5UlmTK3KfVwXhtSEFsZwEqQVrUeyDZNqqipYLcNM_vj488ZwxgycpSyqPUaYyKqjoaqllT9FI-4JLESQ6EV0eM2FQXIaUrqkCGkUszhSVgiFnKGMTvwitvMj1EpCu6IUGodJQokxOxoUKur5UBtZlLAi43wG_yUoqnWo5mmc8pHb1Qln65c12YHd4yWMt2fFb8GqT6tTN3jINYgutmB93YHN42sw7JFN4rotnjDGLsdA3GKADe0363w3x0w2X_xS9AROXR_30_HRwtgKTBqxFdbPlKrRMtvQajMuX6r58Wrd1_wqErPzH
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+the+classification+of+text+documents+taking+into+account+their+structural+features&rft.jtitle=Journal+of+computer+%26+systems+sciences+international&rft.au=Gulin%2C+V.+V.&rft.au=Frolov%2C+A.+B.&rft.date=2016-05-01&rft.pub=Pleiades+Publishing&rft.issn=1064-2307&rft.eissn=1555-6530&rft.volume=55&rft.issue=3&rft.spage=394&rft.epage=403&rft_id=info:doi/10.1134%2FS1064230716030102&rft.externalDocID=10_1134_S1064230716030102
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1064-2307&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1064-2307&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1064-2307&client=summon