Performance Optimization of Web Crawlers via Parallel and Asynchronous Processing.
Uloženo v:
| Název: | Performance Optimization of Web Crawlers via Parallel and Asynchronous Processing. |
|---|---|
| Autoři: | Kim, Min-Sun1 mskim8033@suwon.ac.kr, Jeon, Sanghoon1 shjeon@suwon.ac.kr |
| Zdroj: | KSII Transactions on Internet & Information Systems. Dec2025, Vol. 19 Issue 12, p4415-4436. 22p. |
| Témata: | Parallel processing, Data extraction, Big data, Distributed computing, Intelligent agents, Resource allocation |
| Abstrakt: | With the rapid development of the Internet, a vast amount of data is being generated, making various types of information easily accessible. Consequently, big data analysis, which involves collecting, storing, processing, and predicting data, has become increasingly important. Web crawlers have gained attention as tools for extracting data from specific web pages. They are utilized in various fields, including price comparison shopping, Search Engine Optimization (SEO), and Rich Site Summary (RSS) aggregation. Different types of web crawlers rely on static or dynamic crawling methods. Notable web crawlers include Scrapy, Selenium, BeautifulSoup, and Playwright, which are designed to effectively handle either static or dynamic web pages. In this paper, we focus on improving the execution performance of these crawlers by applying two tuning techniques: parallel and asynchronous processing. To evaluate their performance, we used four key metrics: Time per Image (TPI), Images per Second (IPS), CPU utilization, and memory consumption. Through controlled experiments across various web page configurations, we demonstrate how each tuning method affects the execution efficiency and system resource usage of different crawler architectures. Our findings highlight the practical trade-offs between performance and resource efficiency, providing useful insights for applying crawler optimization strategies to real-world data collection tasks. [ABSTRACT FROM AUTHOR] |
| Databáze: | Supplemental Index |
Buďte první, kdo okomentuje tento záznam!
Full Text Finder
Nájsť tento článok vo Web of Science