Keyword weight optimization using gradient strategies in event focused web crawling
•A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient de...
Uloženo v:
| Vydáno v: | Pattern recognition letters Ročník 142; s. 3 - 10 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Amsterdam
Elsevier B.V
01.02.2021
Elsevier Science Ltd |
| Témata: | |
| ISSN: | 0167-8655, 1872-7344 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient descent is a popular algorithm to achieve optimization and is well suited for large data optimization.•The proposed algorithm is focused on building the optimal keyword set to retrieve relevant documents.
At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data. |
|---|---|
| AbstractList | •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword set enhancement.•The web crawler is a primary unit of such search engines and its optimization improves the efficiency of search.•Gradient descent is a popular algorithm to achieve optimization and is well suited for large data optimization.•The proposed algorithm is focused on building the optimal keyword set to retrieve relevant documents.
At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data. At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data. |
| Author | Navaneethan, C Rajiv, S |
| Author_xml | – sequence: 1 givenname: S surname: Rajiv fullname: Rajiv, S email: realrajiv@gmail.com organization: Research Scholar, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India – sequence: 2 givenname: C surname: Navaneethan fullname: Navaneethan, C organization: Associate Professor, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India |
| BookMark | eNqFkMtOwzAQRS1UJNrCH7CIxDrBryQOCyRU8RKVWABry3EmxVEbB9ttVb4el7JiARuPNLpnrHsmaNTbHhA6JzgjmBSXXTao4EBnFNO4ohnG7AiNiShpWjLOR2gcY2Uqijw_QRPvO4xxwSoxRi9PsNta1yRbMIv3kNghmJX5VMHYPll70y-ShVONgT4kPjgVYGHAJ6ZPYLPftVavPezxOtFObZeROEXHrVp6OPuZU_R2d_s6e0jnz_ePs5t5qhnjIQXakjxvioo2irK6Yi1pCS00MKYbpnWplQJSEy54qeqa1a3mbXyEIk1RcsGm6OJwd3D2Yw0-yM6uXR-_lJRXpCqpEHlMXR1S2lnvHbRSm_DdL9YxS0mw3EuUnTxIlHuJklAZJUaY_4IHZ1bK7f7Drg8YxPobA056HRVqaEyMBtlY8_eBL142klQ |
| CitedBy_id | crossref_primary_10_1007_s40747_023_01121_4 crossref_primary_10_1155_2022_6705986 crossref_primary_10_3390_a15080272 crossref_primary_10_1007_s40747_022_00707_8 crossref_primary_10_1155_2022_7353151 crossref_primary_10_1007_s10489_022_03180_5 crossref_primary_10_1016_j_eswa_2023_119798 crossref_primary_10_1016_j_mtcomm_2023_105979 crossref_primary_10_3390_sym16111439 |
| Cites_doi | 10.1007/s00799-018-0258-6 10.1007/978-3-642-22185-9_6 10.1016/j.asoc.2011.01.037 10.1016/j.asoc.2016.12.028 10.1155/2016/6406901 10.1007/s11280-015-0349-x 10.1007/s00799-016-0207-1 |
| ContentType | Journal Article |
| Copyright | 2020 Copyright Elsevier Science Ltd. Feb 2021 |
| Copyright_xml | – notice: 2020 – notice: Copyright Elsevier Science Ltd. Feb 2021 |
| DBID | AAYXX CITATION 7SC 7TK 8FD JQ2 L7M L~C L~D |
| DOI | 10.1016/j.patrec.2020.12.003 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Neurosciences Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts Neurosciences Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1872-7344 |
| EndPage | 10 |
| ExternalDocumentID | 10_1016_j_patrec_2020_12_003 S0167865520304335 |
| GroupedDBID | --K --M .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WH7 WUQ XFK XPP Y6R ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 7TK 8FD AFXIZ AGCQF AGRNS BNPGV JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c334t-e2f155d692da23b93f1f126ce33cd3cc7caae1b14847abb3bfc4fbfc8a1d67483 |
| ISICitedReferencesCount | 10 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000613175200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-8655 |
| IngestDate | Fri Jul 25 22:55:32 EDT 2025 Tue Nov 18 22:29:38 EST 2025 Sat Nov 29 03:58:50 EST 2025 Fri Feb 23 02:44:13 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Web crawler Stochastic gradient descent (SGD) Term frequency (TF) based feature extraction Support vector machine (SVM) classifiers Web-based retrieval system |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c334t-e2f155d692da23b93f1f126ce33cd3cc7caae1b14847abb3bfc4fbfc8a1d67483 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2491972885 |
| PQPubID | 2047552 |
| PageCount | 8 |
| ParticipantIDs | proquest_journals_2491972885 crossref_citationtrail_10_1016_j_patrec_2020_12_003 crossref_primary_10_1016_j_patrec_2020_12_003 elsevier_sciencedirect_doi_10_1016_j_patrec_2020_12_003 |
| PublicationCentury | 2000 |
| PublicationDate | February 2021 2021-02-00 20210201 |
| PublicationDateYYYYMMDD | 2021-02-01 |
| PublicationDate_xml | – month: 02 year: 2021 text: February 2021 |
| PublicationDecade | 2020 |
| PublicationPlace | Amsterdam |
| PublicationPlace_xml | – name: Amsterdam |
| PublicationTitle | Pattern recognition letters |
| PublicationYear | 2021 |
| Publisher | Elsevier B.V Elsevier Science Ltd |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier Science Ltd |
| References | Hosseinkhani, Taherdoost, Keikhaee (bib0011) 2019 Thenmalar, Geetha (bib0001) 2014; 13 Farag, Lee, Fox (bib0016) 2018; 19 Pranav, Chauhan (bib0005) 2015; 4 Yan, Pan (bib0013) 2018 Janbandhu, Dahiwale, Raghuwanshi (bib0018) 2014; 2 Gossen, Risse, Demidova (bib0019) 2020; 21 Klein, Balakireva, Van de Sompel (bib0009) 2018 Capuano, Rinaldi, Russo (bib0010) 2019 Goyal, Bhatia, Kumar (bib0012) 2016; 2016 Gossen, Demidova, Risse (bib0006) 2017 Sekhar, Siddesh, Manvi, Srinivasa (bib0003) 2019; 19 Yang, Deb, Fong (bib0024) 2011 Ruder (bib0022) 2016 Gossen, Risse, Demidova (bib0007) 2020; 21 Shawon, Zuhori, Mahmud, Rahman (bib0020) 2018, December Lu, Zhan, Zhou, He (bib0004) 2016; 2016 Nickabadi, Ebadzadeh, Safabakhsh (bib0021) 2011; 11 Liakos, Ntoulas, Labrinidis, Delis (bib0017) 2016; 19 Aggarwal (bib0002) 2019 Castillo (bib0008) 2005; 39 Xu, Jiang, Ma, Daneshmand, Xie (bib0014) 2019 Saleh, Abulwafa, Al Rahmawy (bib0015) 2017; 53 Zhou (bib0023) 2015 Pranav (10.1016/j.patrec.2020.12.003_bib0005) 2015; 4 Yang (10.1016/j.patrec.2020.12.003_bib0024) 2011 Gossen (10.1016/j.patrec.2020.12.003_bib0006) 2017 Xu (10.1016/j.patrec.2020.12.003_bib0014) 2019 Liakos (10.1016/j.patrec.2020.12.003_bib0017) 2016; 19 Goyal (10.1016/j.patrec.2020.12.003_bib0012) 2016; 2016 Yan (10.1016/j.patrec.2020.12.003_bib0013) 2018 Ruder (10.1016/j.patrec.2020.12.003_bib0022) 2016 Zhou (10.1016/j.patrec.2020.12.003_bib0023) 2015 Klein (10.1016/j.patrec.2020.12.003_bib0009) 2018 Hosseinkhani (10.1016/j.patrec.2020.12.003_bib0011) 2019 Nickabadi (10.1016/j.patrec.2020.12.003_bib0021) 2011; 11 Janbandhu (10.1016/j.patrec.2020.12.003_bib0018) 2014; 2 Farag (10.1016/j.patrec.2020.12.003_bib0016) 2018; 19 Shawon (10.1016/j.patrec.2020.12.003_bib0020) 2018 Gossen (10.1016/j.patrec.2020.12.003_bib0007) 2020; 21 Aggarwal (10.1016/j.patrec.2020.12.003_bib0002) 2019 Lu (10.1016/j.patrec.2020.12.003_bib0004) 2016; 2016 Sekhar (10.1016/j.patrec.2020.12.003_bib0003) 2019; 19 Thenmalar (10.1016/j.patrec.2020.12.003_bib0001) 2014; 13 Capuano (10.1016/j.patrec.2020.12.003_bib0010) 2019 Saleh (10.1016/j.patrec.2020.12.003_bib0015) 2017; 53 Castillo (10.1016/j.patrec.2020.12.003_bib0008) 2005; 39 Gossen (10.1016/j.patrec.2020.12.003_bib0019) 2020; 21 |
| References_xml | – start-page: 53 year: 2011 end-page: 66 ident: bib0024 article-title: Accelerated particle swarm optimization and support vector machine for business optimization and applications publication-title: Netw. Digit. Technol. – start-page: 319 year: 2018 end-page: 323 ident: bib0013 article-title: Designing focused crawler based on improved genetic algorithm publication-title: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI) – start-page: 1 year: 2019 end-page: 7 ident: bib0014 article-title: VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm publication-title: Proceedings of the ACM Turing Celebration Conference-China – volume: 19 start-page: 605 year: 2016 end-page: 631 ident: bib0017 article-title: Focused crawling for the hidden web publication-title: World Wide Web – volume: 2 start-page: 488 year: 2014 end-page: 492 ident: bib0018 article-title: Analysis of web crawling algorithms publication-title: Int. J. Recent Innov. Trends Comput. Commun. – start-page: 131 year: 2019 end-page: 138 ident: bib0002 article-title: An efficient focused web crawling approach publication-title: Software Engineering – start-page: 1 year: 2019 end-page: 22 ident: bib0010 article-title: An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques publication-title: Multimedia Tools Appl. – year: 2016 ident: bib0022 article-title: An Overview of Gradient Descent Optimization Algorithms – volume: 2016 year: 2016 ident: bib0004 article-title: An improved focused crawler: using web page classification and link priority evaluation publication-title: Math. Probl. Eng. – volume: 53 start-page: 181 year: 2017 end-page: 204 ident: bib0015 article-title: A web page distillation strategy for efficient focused crawling based on optimized Naïve bayes (ONB) classifier publication-title: Appl. Soft Comput. – volume: 13 start-page: 525 year: 2014 end-page: 538 ident: bib0001 article-title: The modified concept based focused crawling using ontology publication-title: J. Web Eng. – start-page: 116 year: 2017 end-page: 127 ident: bib0006 article-title: Extracting event-centric document collections from large-scale web archives publication-title: International Conference on Theory and Practice of Digital Libraries – volume: 39 start-page: 55 year: 2005 end-page: 56 ident: bib0008 article-title: Effective web crawling publication-title: Acmsigir forum – volume: 19 start-page: 146 year: 2019 end-page: 158 ident: bib0003 article-title: Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. publication-title: Cybern. Inf. Technol. – volume: 21 start-page: 31 year: 2020 end-page: 45 ident: bib0019 article-title: Towards extracting event-centric collections from Web archives publication-title: Int. J. Digit. Libr. – volume: 11 start-page: 3658 year: 2011 end-page: 3670 ident: bib0021 article-title: A novel particle swarm optimization algorithm with adaptive inertia weight publication-title: Appl. Soft Comput. – volume: 2016 start-page: 1 year: 2016 end-page: 6 ident: bib0012 article-title: A genetic algorithm based focused Web crawler for automatic webpage classification publication-title: 3rd International Conference on Electrical, Electronics, Engineering Trends, Communication, Optimization and Sciences – volume: 19 start-page: 3 year: 2018 end-page: 19 ident: bib0016 article-title: Focused crawler for events publication-title: Int. J. Digit. Libr. – start-page: 1 year: 2019 end-page: 14 ident: bib0011 article-title: ANTON framework based on semantic focused crawler to support web crime mining using SVM publication-title: Ann. Data Sci. – volume: 21 start-page: 31 year: 2020 end-page: 45 ident: bib0007 article-title: Towards extracting event-centric collections from Web archives publication-title: Int. J. Digit. Libr. – year: 2015 ident: bib0023 article-title: Design and realization of SVM topic crawler based on incremental learning publication-title: 2015 3rd International Conference on Machinery, Materials and Information Technology Applications – start-page: 1 year: 2018, December end-page: 6 ident: bib0020 article-title: Website classification using word based multiple N-Gram models and random search oriented feature parameters publication-title: 2018 21st International Conference of Computer and Information Technology (ICCIT) – start-page: 333 year: 2018 end-page: 342 ident: bib0009 article-title: Focused crawl of web archives to build event collections publication-title: Proceedings of the 10th ACM Conference on Web Science – volume: 4 start-page: 545 year: 2015 end-page: 551 ident: bib0005 article-title: Efficient focused web crawling approach for search engine publication-title: Int. J. Comput. Sci. Mob. Comput. – volume: 21 start-page: 31 issue: 1 year: 2020 ident: 10.1016/j.patrec.2020.12.003_bib0007 article-title: Towards extracting event-centric collections from Web archives publication-title: Int. J. Digit. Libr. doi: 10.1007/s00799-018-0258-6 – year: 2015 ident: 10.1016/j.patrec.2020.12.003_bib0023 article-title: Design and realization of SVM topic crawler based on incremental learning – start-page: 53 year: 2011 ident: 10.1016/j.patrec.2020.12.003_bib0024 article-title: Accelerated particle swarm optimization and support vector machine for business optimization and applications publication-title: Netw. Digit. Technol. doi: 10.1007/978-3-642-22185-9_6 – start-page: 131 year: 2019 ident: 10.1016/j.patrec.2020.12.003_bib0002 article-title: An efficient focused web crawling approach – volume: 21 start-page: 31 issue: 1 year: 2020 ident: 10.1016/j.patrec.2020.12.003_bib0019 article-title: Towards extracting event-centric collections from Web archives publication-title: Int. J. Digit. Libr. doi: 10.1007/s00799-018-0258-6 – volume: 19 start-page: 146 issue: 2 year: 2019 ident: 10.1016/j.patrec.2020.12.003_bib0003 article-title: Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. publication-title: Cybern. Inf. Technol. – volume: 2 start-page: 488 issue: 3 year: 2014 ident: 10.1016/j.patrec.2020.12.003_bib0018 article-title: Analysis of web crawling algorithms publication-title: Int. J. Recent Innov. Trends Comput. Commun. – start-page: 116 year: 2017 ident: 10.1016/j.patrec.2020.12.003_bib0006 article-title: Extracting event-centric document collections from large-scale web archives – start-page: 333 year: 2018 ident: 10.1016/j.patrec.2020.12.003_bib0009 article-title: Focused crawl of web archives to build event collections – volume: 39 start-page: 55 year: 2005 ident: 10.1016/j.patrec.2020.12.003_bib0008 article-title: Effective web crawling – volume: 13 start-page: 525 issue: 5&6 year: 2014 ident: 10.1016/j.patrec.2020.12.003_bib0001 article-title: The modified concept based focused crawling using ontology publication-title: J. Web Eng. – start-page: 1 year: 2019 ident: 10.1016/j.patrec.2020.12.003_bib0011 article-title: ANTON framework based on semantic focused crawler to support web crime mining using SVM publication-title: Ann. Data Sci. – volume: 4 start-page: 545 issue: 5 year: 2015 ident: 10.1016/j.patrec.2020.12.003_bib0005 article-title: Efficient focused web crawling approach for search engine publication-title: Int. J. Comput. Sci. Mob. Comput. – volume: 11 start-page: 3658 issue: 4 year: 2011 ident: 10.1016/j.patrec.2020.12.003_bib0021 article-title: A novel particle swarm optimization algorithm with adaptive inertia weight publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2011.01.037 – volume: 2016 start-page: 1 year: 2016 ident: 10.1016/j.patrec.2020.12.003_bib0012 article-title: A genetic algorithm based focused Web crawler for automatic webpage classification – start-page: 1 year: 2018 ident: 10.1016/j.patrec.2020.12.003_bib0020 article-title: Website classification using word based multiple N-Gram models and random search oriented feature parameters – year: 2016 ident: 10.1016/j.patrec.2020.12.003_bib0022 – start-page: 1 year: 2019 ident: 10.1016/j.patrec.2020.12.003_bib0014 article-title: VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm – volume: 53 start-page: 181 year: 2017 ident: 10.1016/j.patrec.2020.12.003_bib0015 article-title: A web page distillation strategy for efficient focused crawling based on optimized Naïve bayes (ONB) classifier publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2016.12.028 – volume: 2016 year: 2016 ident: 10.1016/j.patrec.2020.12.003_bib0004 article-title: An improved focused crawler: using web page classification and link priority evaluation publication-title: Math. Probl. Eng. doi: 10.1155/2016/6406901 – start-page: 1 year: 2019 ident: 10.1016/j.patrec.2020.12.003_bib0010 article-title: An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques publication-title: Multimedia Tools Appl. – start-page: 319 year: 2018 ident: 10.1016/j.patrec.2020.12.003_bib0013 article-title: Designing focused crawler based on improved genetic algorithm – volume: 19 start-page: 605 issue: 4 year: 2016 ident: 10.1016/j.patrec.2020.12.003_bib0017 article-title: Focused crawling for the hidden web publication-title: World Wide Web doi: 10.1007/s11280-015-0349-x – volume: 19 start-page: 3 issue: 1 year: 2018 ident: 10.1016/j.patrec.2020.12.003_bib0016 article-title: Focused crawler for events publication-title: Int. J. Digit. Libr. doi: 10.1007/s00799-016-0207-1 |
| SSID | ssj0006398 |
| Score | 2.384672 |
| Snippet | •A web crawling system for obtaining the set of web data regarding key events is essential.•This work has proposed a new and efficient method for such keyword... At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 3 |
| SubjectTerms | Algorithms Classifiers Feature extraction Heuristic methods Information retrieval Keywords Optimization Particle swarm optimization Search engines Smoothness Stochastic gradient descent (SGD) Support vector machine (SVM) classifiers Support vector machines Term frequency (TF) based feature extraction Web crawler Web-based retrieval system Websites Weight |
| Title | Keyword weight optimization using gradient strategies in event focused web crawling |
| URI | https://dx.doi.org/10.1016/j.patrec.2020.12.003 https://www.proquest.com/docview/2491972885 |
| Volume | 142 |
| WOSCitedRecordID | wos000613175200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7344 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006398 issn: 0167-8655 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NTxsxELUi4EAPlI9WhQLyobcoiLU38e4RVVQtoAgJinKzbK-3SpRuULJJ4N8z47WzAdQCBy5WtIotb-Z5ZjyZeUPIN5OriOkMT1qGpNoAY2Ux7S_XOo4NN9ZVyN1ciG436fXSy0bjItTCzIaiKJK7u_T2XUUNz0DYWDr7BnEvFoUH8BmEDiOIHcZXCf7c3s-xBnDugp7NEeiEv77Ysjl1kYE_Y5fnVTYnZSCKcMwhM0fUNDLTCeakW900YzUfBtvmPdhLR8iJRTA-8wiWHbqaoDprXg36s0dh1a4Ch91aDNPXkVkfa2BRSE-uw4-gVrGU9ZH-rOixvAbkS6a0Slh9pqSreMHgCKP9Fmkk2bELyR7z2iiFP-Kf2KpFBmFIThvIahWJq8iISUf9uspEOwU1vXry67R3trDM4I0lgesd3yKUUrp8v-e7-Zer8sRoO0_kepNs-CsEPalEv0UattgmH0N7Duq19Tb5sMQ1uUOuPC5ohQu6jAvqcEEDLmiNC9ovqMMF9biA6ZoGXHwiv3-cXn__2fIdNVqG87hsWZaD_5h1UpYpxnXK8yiPWMdYzk3GjRFGKRtpuCLHQmnNdW7iHIZERRl2peGfyUoxKuwXQjOkgcpi20nBCReR1R3DE9UWPLZKwJ1-l_Dw40nj6eax68lQ_k90u6S1mHVb0a288H0R5CK9y1i5ghLA9sLM_SBG6U_vRMK-sQ1fkrT33riRr2S9PjD7ZKUcT-0BWTOzsj8ZH3ogPgCoSp1w |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Keyword+weight+optimization+using+gradient+strategies+in+event+focused+web+crawling&rft.jtitle=Pattern+recognition+letters&rft.au=Rajiv%2C+S&rft.au=Navaneethan%2C+C&rft.date=2021-02-01&rft.issn=0167-8655&rft.volume=142&rft.spage=3&rft.epage=10&rft_id=info:doi/10.1016%2Fj.patrec.2020.12.003&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patrec_2020_12_003 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8655&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8655&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8655&client=summon |