Keyword weight optimization using gradient strategies in event focused web crawling

•A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient de...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Pattern recognition letters Ročník 142; s. 3 - 10
Hlavní autoři: Rajiv, S, Navaneethan, C
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier B.V 01.02.2021
Elsevier Science Ltd
Témata:
ISSN:0167-8655, 1872-7344
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient descent is a popular algorithm to achieve optimization and is well suited for large data optimization.•The proposed algorithm is focused on building the optimal keyword set to retrieve relevant documents. At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data.
AbstractList •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient descent is a popular algorithm to achieve optimization and is well suited for large data optimization.•The proposed algorithm is focused on building the optimal keyword set to retrieve relevant documents. At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data.
At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data.
Author Navaneethan, C
Rajiv, S
Author_xml – sequence: 1
  givenname: S
  surname: Rajiv
  fullname: Rajiv, S
  email: realrajiv@gmail.com
  organization: Research Scholar, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
– sequence: 2
  givenname: C
  surname: Navaneethan
  fullname: Navaneethan, C
  organization: Associate Professor, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
BookMark eNqFkMtOwzAQRS1UJNrCH7CIxDrBryQOCyRU8RKVWABry3EmxVEbB9ttVb4el7JiARuPNLpnrHsmaNTbHhA6JzgjmBSXXTao4EBnFNO4ohnG7AiNiShpWjLOR2gcY2Uqijw_QRPvO4xxwSoxRi9PsNta1yRbMIv3kNghmJX5VMHYPll70y-ShVONgT4kPjgVYGHAJ6ZPYLPftVavPezxOtFObZeROEXHrVp6OPuZU_R2d_s6e0jnz_ePs5t5qhnjIQXakjxvioo2irK6Yi1pCS00MKYbpnWplQJSEy54qeqa1a3mbXyEIk1RcsGm6OJwd3D2Yw0-yM6uXR-_lJRXpCqpEHlMXR1S2lnvHbRSm_DdL9YxS0mw3EuUnTxIlHuJklAZJUaY_4IHZ1bK7f7Drg8YxPobA056HRVqaEyMBtlY8_eBL142klQ
CitedBy_id crossref_primary_10_1007_s40747_023_01121_4
crossref_primary_10_1155_2022_6705986
crossref_primary_10_3390_a15080272
crossref_primary_10_1007_s40747_022_00707_8
crossref_primary_10_1155_2022_7353151
crossref_primary_10_1007_s10489_022_03180_5
crossref_primary_10_1016_j_eswa_2023_119798
crossref_primary_10_1016_j_mtcomm_2023_105979
crossref_primary_10_3390_sym16111439
Cites_doi 10.1007/s00799-018-0258-6
10.1007/978-3-642-22185-9_6
10.1016/j.asoc.2011.01.037
10.1016/j.asoc.2016.12.028
10.1155/2016/6406901
10.1007/s11280-015-0349-x
10.1007/s00799-016-0207-1
ContentType Journal Article
Copyright 2020
Copyright Elsevier Science Ltd. Feb 2021
Copyright_xml – notice: 2020
– notice: Copyright Elsevier Science Ltd. Feb 2021
DBID AAYXX
CITATION
7SC
7TK
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.patrec.2020.12.003
DatabaseName CrossRef
Computer and Information Systems Abstracts
Neurosciences Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Neurosciences Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1872-7344
EndPage 10
ExternalDocumentID 10_1016_j_patrec_2020_12_003
S0167865520304335
GroupedDBID --K
--M
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADMXK
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WH7
WUQ
XFK
XPP
Y6R
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
7TK
8FD
AFXIZ
AGCQF
AGRNS
BNPGV
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c334t-e2f155d692da23b93f1f126ce33cd3cc7caae1b14847abb3bfc4fbfc8a1d67483
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000613175200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8655
IngestDate Fri Jul 25 22:55:32 EDT 2025
Tue Nov 18 22:29:38 EST 2025
Sat Nov 29 03:58:50 EST 2025
Fri Feb 23 02:44:13 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Web crawler
Stochastic gradient descent (SGD)
Term frequency (TF) based feature extraction
Support vector machine (SVM) classifiers
Web-based retrieval system
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c334t-e2f155d692da23b93f1f126ce33cd3cc7caae1b14847abb3bfc4fbfc8a1d67483
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2491972885
PQPubID 2047552
PageCount 8
ParticipantIDs proquest_journals_2491972885
crossref_citationtrail_10_1016_j_patrec_2020_12_003
crossref_primary_10_1016_j_patrec_2020_12_003
elsevier_sciencedirect_doi_10_1016_j_patrec_2020_12_003
PublicationCentury 2000
PublicationDate February 2021
2021-02-00
20210201
PublicationDateYYYYMMDD 2021-02-01
PublicationDate_xml – month: 02
  year: 2021
  text: February 2021
PublicationDecade 2020
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Pattern recognition letters
PublicationYear 2021
Publisher Elsevier B.V
Elsevier Science Ltd
Publisher_xml – name: Elsevier B.V
– name: Elsevier Science Ltd
References Hosseinkhani, Taherdoost, Keikhaee (bib0011) 2019
Thenmalar, Geetha (bib0001) 2014; 13
Farag, Lee, Fox (bib0016) 2018; 19
Pranav, Chauhan (bib0005) 2015; 4
Yan, Pan (bib0013) 2018
Janbandhu, Dahiwale, Raghuwanshi (bib0018) 2014; 2
Gossen, Risse, Demidova (bib0019) 2020; 21
Klein, Balakireva, Van de Sompel (bib0009) 2018
Capuano, Rinaldi, Russo (bib0010) 2019
Goyal, Bhatia, Kumar (bib0012) 2016; 2016
Gossen, Demidova, Risse (bib0006) 2017
Sekhar, Siddesh, Manvi, Srinivasa (bib0003) 2019; 19
Yang, Deb, Fong (bib0024) 2011
Ruder (bib0022) 2016
Gossen, Risse, Demidova (bib0007) 2020; 21
Shawon, Zuhori, Mahmud, Rahman (bib0020) 2018, December
Lu, Zhan, Zhou, He (bib0004) 2016; 2016
Nickabadi, Ebadzadeh, Safabakhsh (bib0021) 2011; 11
Liakos, Ntoulas, Labrinidis, Delis (bib0017) 2016; 19
Aggarwal (bib0002) 2019
Castillo (bib0008) 2005; 39
Xu, Jiang, Ma, Daneshmand, Xie (bib0014) 2019
Saleh, Abulwafa, Al Rahmawy (bib0015) 2017; 53
Zhou (bib0023) 2015
Pranav (10.1016/j.patrec.2020.12.003_bib0005) 2015; 4
Yang (10.1016/j.patrec.2020.12.003_bib0024) 2011
Gossen (10.1016/j.patrec.2020.12.003_bib0006) 2017
Xu (10.1016/j.patrec.2020.12.003_bib0014) 2019
Liakos (10.1016/j.patrec.2020.12.003_bib0017) 2016; 19
Goyal (10.1016/j.patrec.2020.12.003_bib0012) 2016; 2016
Yan (10.1016/j.patrec.2020.12.003_bib0013) 2018
Ruder (10.1016/j.patrec.2020.12.003_bib0022) 2016
Zhou (10.1016/j.patrec.2020.12.003_bib0023) 2015
Klein (10.1016/j.patrec.2020.12.003_bib0009) 2018
Hosseinkhani (10.1016/j.patrec.2020.12.003_bib0011) 2019
Nickabadi (10.1016/j.patrec.2020.12.003_bib0021) 2011; 11
Janbandhu (10.1016/j.patrec.2020.12.003_bib0018) 2014; 2
Farag (10.1016/j.patrec.2020.12.003_bib0016) 2018; 19
Shawon (10.1016/j.patrec.2020.12.003_bib0020) 2018
Gossen (10.1016/j.patrec.2020.12.003_bib0007) 2020; 21
Aggarwal (10.1016/j.patrec.2020.12.003_bib0002) 2019
Lu (10.1016/j.patrec.2020.12.003_bib0004) 2016; 2016
Sekhar (10.1016/j.patrec.2020.12.003_bib0003) 2019; 19
Thenmalar (10.1016/j.patrec.2020.12.003_bib0001) 2014; 13
Capuano (10.1016/j.patrec.2020.12.003_bib0010) 2019
Saleh (10.1016/j.patrec.2020.12.003_bib0015) 2017; 53
Castillo (10.1016/j.patrec.2020.12.003_bib0008) 2005; 39
Gossen (10.1016/j.patrec.2020.12.003_bib0019) 2020; 21
References_xml – start-page: 53
  year: 2011
  end-page: 66
  ident: bib0024
  article-title: Accelerated particle swarm optimization and support vector machine for business optimization and applications
  publication-title: Netw. Digit. Technol.
– start-page: 319
  year: 2018
  end-page: 323
  ident: bib0013
  article-title: Designing focused crawler based on improved genetic algorithm
  publication-title: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)
– start-page: 1
  year: 2019
  end-page: 7
  ident: bib0014
  article-title: VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm
  publication-title: Proceedings of the ACM Turing Celebration Conference-China
– volume: 19
  start-page: 605
  year: 2016
  end-page: 631
  ident: bib0017
  article-title: Focused crawling for the hidden web
  publication-title: World Wide Web
– volume: 2
  start-page: 488
  year: 2014
  end-page: 492
  ident: bib0018
  article-title: Analysis of web crawling algorithms
  publication-title: Int. J. Recent Innov. Trends Comput. Commun.
– start-page: 131
  year: 2019
  end-page: 138
  ident: bib0002
  article-title: An efficient focused web crawling approach
  publication-title: Software Engineering
– start-page: 1
  year: 2019
  end-page: 22
  ident: bib0010
  article-title: An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques
  publication-title: Multimedia Tools Appl.
– year: 2016
  ident: bib0022
  article-title: An Overview of Gradient Descent Optimization Algorithms
– volume: 2016
  year: 2016
  ident: bib0004
  article-title: An improved focused crawler: using web page classification and link priority evaluation
  publication-title: Math. Probl. Eng.
– volume: 53
  start-page: 181
  year: 2017
  end-page: 204
  ident: bib0015
  article-title: A web page distillation strategy for efficient focused crawling based on optimized Naïve bayes (ONB) classifier
  publication-title: Appl. Soft Comput.
– volume: 13
  start-page: 525
  year: 2014
  end-page: 538
  ident: bib0001
  article-title: The modified concept based focused crawling using ontology
  publication-title: J. Web Eng.
– start-page: 116
  year: 2017
  end-page: 127
  ident: bib0006
  article-title: Extracting event-centric document collections from large-scale web archives
  publication-title: International Conference on Theory and Practice of Digital Libraries
– volume: 39
  start-page: 55
  year: 2005
  end-page: 56
  ident: bib0008
  article-title: Effective web crawling
  publication-title: Acmsigir forum
– volume: 19
  start-page: 146
  year: 2019
  end-page: 158
  ident: bib0003
  article-title: Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources.
  publication-title: Cybern. Inf. Technol.
– volume: 21
  start-page: 31
  year: 2020
  end-page: 45
  ident: bib0019
  article-title: Towards extracting event-centric collections from Web archives
  publication-title: Int. J. Digit. Libr.
– volume: 11
  start-page: 3658
  year: 2011
  end-page: 3670
  ident: bib0021
  article-title: A novel particle swarm optimization algorithm with adaptive inertia weight
  publication-title: Appl. Soft Comput.
– volume: 2016
  start-page: 1
  year: 2016
  end-page: 6
  ident: bib0012
  article-title: A genetic algorithm based focused Web crawler for automatic webpage classification
  publication-title: 3rd International Conference on Electrical, Electronics, Engineering Trends, Communication, Optimization and Sciences
– volume: 19
  start-page: 3
  year: 2018
  end-page: 19
  ident: bib0016
  article-title: Focused crawler for events
  publication-title: Int. J. Digit. Libr.
– start-page: 1
  year: 2019
  end-page: 14
  ident: bib0011
  article-title: ANTON framework based on semantic focused crawler to support web crime mining using SVM
  publication-title: Ann. Data Sci.
– volume: 21
  start-page: 31
  year: 2020
  end-page: 45
  ident: bib0007
  article-title: Towards extracting event-centric collections from Web archives
  publication-title: Int. J. Digit. Libr.
– year: 2015
  ident: bib0023
  article-title: Design and realization of SVM topic crawler based on incremental learning
  publication-title: 2015 3rd International Conference on Machinery, Materials and Information Technology Applications
– start-page: 1
  year: 2018, December
  end-page: 6
  ident: bib0020
  article-title: Website classification using word based multiple N-Gram models and random search oriented feature parameters
  publication-title: 2018 21st International Conference of Computer and Information Technology (ICCIT)
– start-page: 333
  year: 2018
  end-page: 342
  ident: bib0009
  article-title: Focused crawl of web archives to build event collections
  publication-title: Proceedings of the 10th ACM Conference on Web Science
– volume: 4
  start-page: 545
  year: 2015
  end-page: 551
  ident: bib0005
  article-title: Efficient focused web crawling approach for search engine
  publication-title: Int. J. Comput. Sci. Mob. Comput.
– volume: 21
  start-page: 31
  issue: 1
  year: 2020
  ident: 10.1016/j.patrec.2020.12.003_bib0007
  article-title: Towards extracting event-centric collections from Web archives
  publication-title: Int. J. Digit. Libr.
  doi: 10.1007/s00799-018-0258-6
– year: 2015
  ident: 10.1016/j.patrec.2020.12.003_bib0023
  article-title: Design and realization of SVM topic crawler based on incremental learning
– start-page: 53
  year: 2011
  ident: 10.1016/j.patrec.2020.12.003_bib0024
  article-title: Accelerated particle swarm optimization and support vector machine for business optimization and applications
  publication-title: Netw. Digit. Technol.
  doi: 10.1007/978-3-642-22185-9_6
– start-page: 131
  year: 2019
  ident: 10.1016/j.patrec.2020.12.003_bib0002
  article-title: An efficient focused web crawling approach
– volume: 21
  start-page: 31
  issue: 1
  year: 2020
  ident: 10.1016/j.patrec.2020.12.003_bib0019
  article-title: Towards extracting event-centric collections from Web archives
  publication-title: Int. J. Digit. Libr.
  doi: 10.1007/s00799-018-0258-6
– volume: 19
  start-page: 146
  issue: 2
  year: 2019
  ident: 10.1016/j.patrec.2020.12.003_bib0003
  article-title: Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources.
  publication-title: Cybern. Inf. Technol.
– volume: 2
  start-page: 488
  issue: 3
  year: 2014
  ident: 10.1016/j.patrec.2020.12.003_bib0018
  article-title: Analysis of web crawling algorithms
  publication-title: Int. J. Recent Innov. Trends Comput. Commun.
– start-page: 116
  year: 2017
  ident: 10.1016/j.patrec.2020.12.003_bib0006
  article-title: Extracting event-centric document collections from large-scale web archives
– start-page: 333
  year: 2018
  ident: 10.1016/j.patrec.2020.12.003_bib0009
  article-title: Focused crawl of web archives to build event collections
– volume: 39
  start-page: 55
  year: 2005
  ident: 10.1016/j.patrec.2020.12.003_bib0008
  article-title: Effective web crawling
– volume: 13
  start-page: 525
  issue: 5&6
  year: 2014
  ident: 10.1016/j.patrec.2020.12.003_bib0001
  article-title: The modified concept based focused crawling using ontology
  publication-title: J. Web Eng.
– start-page: 1
  year: 2019
  ident: 10.1016/j.patrec.2020.12.003_bib0011
  article-title: ANTON framework based on semantic focused crawler to support web crime mining using SVM
  publication-title: Ann. Data Sci.
– volume: 4
  start-page: 545
  issue: 5
  year: 2015
  ident: 10.1016/j.patrec.2020.12.003_bib0005
  article-title: Efficient focused web crawling approach for search engine
  publication-title: Int. J. Comput. Sci. Mob. Comput.
– volume: 11
  start-page: 3658
  issue: 4
  year: 2011
  ident: 10.1016/j.patrec.2020.12.003_bib0021
  article-title: A novel particle swarm optimization algorithm with adaptive inertia weight
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2011.01.037
– volume: 2016
  start-page: 1
  year: 2016
  ident: 10.1016/j.patrec.2020.12.003_bib0012
  article-title: A genetic algorithm based focused Web crawler for automatic webpage classification
– start-page: 1
  year: 2018
  ident: 10.1016/j.patrec.2020.12.003_bib0020
  article-title: Website classification using word based multiple N-Gram models and random search oriented feature parameters
– year: 2016
  ident: 10.1016/j.patrec.2020.12.003_bib0022
– start-page: 1
  year: 2019
  ident: 10.1016/j.patrec.2020.12.003_bib0014
  article-title: VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm
– volume: 53
  start-page: 181
  year: 2017
  ident: 10.1016/j.patrec.2020.12.003_bib0015
  article-title: A web page distillation strategy for efficient focused crawling based on optimized Naïve bayes (ONB) classifier
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2016.12.028
– volume: 2016
  year: 2016
  ident: 10.1016/j.patrec.2020.12.003_bib0004
  article-title: An improved focused crawler: using web page classification and link priority evaluation
  publication-title: Math. Probl. Eng.
  doi: 10.1155/2016/6406901
– start-page: 1
  year: 2019
  ident: 10.1016/j.patrec.2020.12.003_bib0010
  article-title: An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques
  publication-title: Multimedia Tools Appl.
– start-page: 319
  year: 2018
  ident: 10.1016/j.patrec.2020.12.003_bib0013
  article-title: Designing focused crawler based on improved genetic algorithm
– volume: 19
  start-page: 605
  issue: 4
  year: 2016
  ident: 10.1016/j.patrec.2020.12.003_bib0017
  article-title: Focused crawling for the hidden web
  publication-title: World Wide Web
  doi: 10.1007/s11280-015-0349-x
– volume: 19
  start-page: 3
  issue: 1
  year: 2018
  ident: 10.1016/j.patrec.2020.12.003_bib0016
  article-title: Focused crawler for events
  publication-title: Int. J. Digit. Libr.
  doi: 10.1007/s00799-016-0207-1
SSID ssj0006398
Score 2.384672
Snippet •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword...
At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3
SubjectTerms Algorithms
Classifiers
Feature extraction
Heuristic methods
Information retrieval
Keywords
Optimization
Particle swarm optimization
Search engines
Smoothness
Stochastic gradient descent (SGD)
Support vector machine (SVM) classifiers
Support vector machines
Term frequency (TF) based feature extraction
Web crawler
Web-based retrieval system
Websites
Weight
Title Keyword weight optimization using gradient strategies in event focused web crawling
URI https://dx.doi.org/10.1016/j.patrec.2020.12.003
https://www.proquest.com/docview/2491972885
Volume 142
WOSCitedRecordID wos000613175200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7344
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006398
  issn: 0167-8655
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NTxsxELUi4EAPlI9WhQLyobcoiLU38e4RVVQtoAgJinKzbK-3SpRuULJJ4N8z47WzAdQCBy5WtIotb-Z5ZjyZeUPIN5OriOkMT1qGpNoAY2Ux7S_XOo4NN9ZVyN1ciG436fXSy0bjItTCzIaiKJK7u_T2XUUNz0DYWDr7BnEvFoUH8BmEDiOIHcZXCf7c3s-xBnDugp7NEeiEv77Ysjl1kYE_Y5fnVTYnZSCKcMwhM0fUNDLTCeakW900YzUfBtvmPdhLR8iJRTA-8wiWHbqaoDprXg36s0dh1a4Ch91aDNPXkVkfa2BRSE-uw4-gVrGU9ZH-rOixvAbkS6a0Slh9pqSreMHgCKP9Fmkk2bELyR7z2iiFP-Kf2KpFBmFIThvIahWJq8iISUf9uspEOwU1vXry67R3trDM4I0lgesd3yKUUrp8v-e7-Zer8sRoO0_kepNs-CsEPalEv0UattgmH0N7Duq19Tb5sMQ1uUOuPC5ohQu6jAvqcEEDLmiNC9ovqMMF9biA6ZoGXHwiv3-cXn__2fIdNVqG87hsWZaD_5h1UpYpxnXK8yiPWMdYzk3GjRFGKRtpuCLHQmnNdW7iHIZERRl2peGfyUoxKuwXQjOkgcpi20nBCReR1R3DE9UWPLZKwJ1-l_Dw40nj6eax68lQ_k90u6S1mHVb0a288H0R5CK9y1i5ghLA9sLM_SBG6U_vRMK-sQ1fkrT33riRr2S9PjD7ZKUcT-0BWTOzsj8ZH3ogPgCoSp1w
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Keyword+weight+optimization+using+gradient+strategies+in+event+focused+web+crawling&rft.jtitle=Pattern+recognition+letters&rft.au=Rajiv%2C+S&rft.au=Navaneethan%2C+C&rft.date=2021-02-01&rft.issn=0167-8655&rft.volume=142&rft.spage=3&rft.epage=10&rft_id=info:doi/10.1016%2Fj.patrec.2020.12.003&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patrec_2020_12_003
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8655&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8655&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8655&client=summon