Scrimmo: A Real-Time Web Scraper Monitoring the Belgian Real Estate Market

Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) s. 335 - 338
Hlavní autoři: Barzin, Felix, Yernaux, Gonzague, Vanhoof, Wim
Médium: Konferenční příspěvek
Jazyk:angličtina
japonština
Vydáno: IEEE 26.10.2023
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concrete application, namely the automatic extraction of data concerning the Belgian real estate market. We introduce a real-time Web scraper called SCRIMMO and tailored to collect data from websites containing real estate classified ads. The tool is developed in a continuous iterative process and based on an innovative cloud architecture. The paper also briefly addresses the ethical aspects of Web scraping. By integrating insights from previous research and ethical guidelines, this study provides researchers with a comprehensive understanding of Web scraping and its potential benefits, while promoting responsible and ethical practices in data collection and analysis.
AbstractList Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concrete application, namely the automatic extraction of data concerning the Belgian real estate market. We introduce a real-time Web scraper called SCRIMMO and tailored to collect data from websites containing real estate classified ads. The tool is developed in a continuous iterative process and based on an innovative cloud architecture. The paper also briefly addresses the ethical aspects of Web scraping. By integrating insights from previous research and ethical guidelines, this study provides researchers with a comprehensive understanding of Web scraping and its potential benefits, while promoting responsible and ethical practices in data collection and analysis.
Author Barzin, Felix
Yernaux, Gonzague
Vanhoof, Wim
Author_xml – sequence: 1
  givenname: Felix
  surname: Barzin
  fullname: Barzin, Felix
  organization: University of Namur,Belgium
– sequence: 2
  givenname: Gonzague
  surname: Yernaux
  fullname: Yernaux, Gonzague
  email: gonzague.yernaux@unamur.be
  organization: University of Namur,Belgium
– sequence: 3
  givenname: Wim
  surname: Vanhoof
  fullname: Vanhoof, Wim
  organization: University of Namur,Belgium
BookMark eNotj81OwzAQhI0EByh9A4T8Aglrb0xsbqEqENSqUgnqsdrES7HIT5X6wtsTAaeRZj6NZq7EeT_0LMStglQpcHe7MimLyjhrbapBYwoAJjsTc5c7iwYQnLL2Ury-NWPouuFBFnLL1CZV6FjuuJZTQEce5XroQxzG0B9k_GT5yO0hUP8Ly-UpUmS5pvGL47W4-KD2xPN_nYn3p2W1eElWm-dyUaySoJ2LCXlvNVmlFJLWhj1g44gzcEZpl0FtrVaNhyZjz5NpPLocG5_XFvHeZDgTN3-9gZn3x2k-jd97BdMrhYA_l4tJfA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/WI-IAT59888.2023.00054
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350309188
EndPage 338
ExternalDocumentID 10350130
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i299t-add82a81113a225ed03c9ae409512940b8821cd0c4ede0955d3973cd7b8336543
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001139644800047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 10 09:27:55 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
Japanese
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i299t-add82a81113a225ed03c9ae409512940b8821cd0c4ede0955d3973cd7b8336543
OpenAccessLink https://cir.nii.ac.jp/crid/1873116917767441920
PageCount 4
ParticipantIDs ieee_primary_10350130
PublicationCentury 2000
PublicationDate 2023-10-26
PublicationDateYYYYMMDD 2023-10-26
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-26
  day: 26
PublicationDecade 2020
PublicationTitle 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)
PublicationTitleAbbrev WI-IAT
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8485909
Snippet Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data...
SourceID ieee
SourceType Publisher
StartPage 335
SubjectTerms Data analysis
Data collection
Data extraction
Data gathering
Data mining
Ethics
Intelligent agents
Iterative methods
Real-time systems
Web crawling
Web scraping
Title Scrimmo: A Real-Time Web Scraper Monitoring the Belgian Real Estate Market
URI https://ieeexplore.ieee.org/document/10350130
WOSCitedRecordID wos001139644800047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1BS8MwFA46PHhSsaJOJQevmV2StYm3KRtOdAydbLfRJa8ykG7Uzt_ve-lUPHjwFpqWwkte3veSfO9j7NJJKsKVYG6ifCK0TYwwLs9Ekne0lylpQoY6sw_pcGimUzvakNUDFwYAwuUzaFEznOX7pVvTVhl6OB2DKczQt9M0qclaG9ZvO7ZXk4EYdMcdi0ldi1TBWwGQ_JJNCVGjv_fP_-2z6Id_x0ffkeWAbUFxyO6f0cFx1lzzLn9CeCeIvcEnMOfYka2g5LV_0kYdR1jHb-DtFQc_vMx7gTnEHwPJOWIv_d749k5slBDEAsNFJXARMjIzJAufoQOCj5WzGWjCR9LqeI44ue187DR4oKJyHmGGcj6dG6WIPXrEGsWygGPGrfQWUvxEadCdPMboBMo74xFZQZzJExaRIWarutjF7MsGp388b7JdsjUt5zI5Y42qXMM523Ef1eK9vAhD9Anhp5DR
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmuhJjRi_7cFrcWnLbusNDQR02RDFwI0s7UBIDBAEf78zZdV48OCt2e5mk2mn86btm8fYjZNUhCvG3ET5WGgbG2HcOBfxuKa9TEgTMtSZTZMsM4OB7RZk9cCFAYBw-Qwq1Axn-X7u1rRVhh5Ox2AKM_Rtks4q6FoF77ca2dt-W7TrvZrFtK5CuuCVAEl-CaeEuNHc_-cfD1j5h4HHu9-x5ZBtweyIPb6gi-O8ueN1_owATxB_g_dhxLEjX8CSbzyUtuo4Ajt-D28THP7wMm8E7hDvBJpzmb02G72Hlii0EMQUA8ZK4DJkZG5IGD5HFwQfKWdz0ISQpNXRCJFy1fnIafBAZeU8Ag3lfDIyShF_9JiVZvMZnDBupbeQ4CdKg66NI4xPoLwzHrEVRLk8ZWUyxHCxKXcx_LLB2R_Pr9luq9dJh2k7ezpne2R3WtxlfMFKq-UaLtmO-1hN35dXYbg-AaBFlBo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Web+Intelligence+and+Intelligent+Agent+Technology+%28WI-IAT%29&rft.atitle=Scrimmo%3A+A+Real-Time+Web+Scraper+Monitoring+the+Belgian+Real+Estate+Market&rft.au=Barzin%2C+Felix&rft.au=Yernaux%2C+Gonzague&rft.au=Vanhoof%2C+Wim&rft.date=2023-10-26&rft.pub=IEEE&rft.spage=335&rft.epage=338&rft_id=info:doi/10.1109%2FWI-IAT59888.2023.00054&rft.externalDocID=10350130