Scrimmo: A Real-Time Web Scraper Monitoring the Belgian Real Estate Market
Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concr...
Uloženo v:
| Vydáno v: | 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) s. 335 - 338 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina japonština |
| Vydáno: |
IEEE
26.10.2023
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concrete application, namely the automatic extraction of data concerning the Belgian real estate market. We introduce a real-time Web scraper called SCRIMMO and tailored to collect data from websites containing real estate classified ads. The tool is developed in a continuous iterative process and based on an innovative cloud architecture. The paper also briefly addresses the ethical aspects of Web scraping. By integrating insights from previous research and ethical guidelines, this study provides researchers with a comprehensive understanding of Web scraping and its potential benefits, while promoting responsible and ethical practices in data collection and analysis. |
|---|---|
| AbstractList | Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concrete application, namely the automatic extraction of data concerning the Belgian real estate market. We introduce a real-time Web scraper called SCRIMMO and tailored to collect data from websites containing real estate classified ads. The tool is developed in a continuous iterative process and based on an innovative cloud architecture. The paper also briefly addresses the ethical aspects of Web scraping. By integrating insights from previous research and ethical guidelines, this study provides researchers with a comprehensive understanding of Web scraping and its potential benefits, while promoting responsible and ethical practices in data collection and analysis. |
| Author | Barzin, Felix Yernaux, Gonzague Vanhoof, Wim |
| Author_xml | – sequence: 1 givenname: Felix surname: Barzin fullname: Barzin, Felix organization: University of Namur,Belgium – sequence: 2 givenname: Gonzague surname: Yernaux fullname: Yernaux, Gonzague email: gonzague.yernaux@unamur.be organization: University of Namur,Belgium – sequence: 3 givenname: Wim surname: Vanhoof fullname: Vanhoof, Wim organization: University of Namur,Belgium |
| BookMark | eNotj81OwzAQhI0EByh9A4T8Aglrb0xsbqEqENSqUgnqsdrES7HIT5X6wtsTAaeRZj6NZq7EeT_0LMStglQpcHe7MimLyjhrbapBYwoAJjsTc5c7iwYQnLL2Ury-NWPouuFBFnLL1CZV6FjuuJZTQEce5XroQxzG0B9k_GT5yO0hUP8Ly-UpUmS5pvGL47W4-KD2xPN_nYn3p2W1eElWm-dyUaySoJ2LCXlvNVmlFJLWhj1g44gzcEZpl0FtrVaNhyZjz5NpPLocG5_XFvHeZDgTN3-9gZn3x2k-jd97BdMrhYA_l4tJfA |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/WI-IAT59888.2023.00054 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350309188 |
| EndPage | 338 |
| ExternalDocumentID | 10350130 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i299t-add82a81113a225ed03c9ae409512940b8821cd0c4ede0955d3973cd7b8336543 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001139644800047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 10 09:27:55 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English Japanese |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i299t-add82a81113a225ed03c9ae409512940b8821cd0c4ede0955d3973cd7b8336543 |
| OpenAccessLink | https://cir.nii.ac.jp/crid/1873116917767441920 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_10350130 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-10-26 |
| PublicationDateYYYYMMDD | 2023-10-26 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-26 day: 26 |
| PublicationDecade | 2020 |
| PublicationTitle | 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) |
| PublicationTitleAbbrev | WI-IAT |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.8485909 |
| Snippet | Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 335 |
| SubjectTerms | Data analysis Data collection Data extraction Data gathering Data mining Ethics Intelligent agents Iterative methods Real-time systems Web crawling Web scraping |
| Title | Scrimmo: A Real-Time Web Scraper Monitoring the Belgian Real Estate Market |
| URI | https://ieeexplore.ieee.org/document/10350130 |
| WOSCitedRecordID | wos001139644800047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1BS8MwFA46PHhSsaJOJQevmV2StYm3KRtOdAydbLfRJa8ykG7Uzt_ve-lUPHjwFpqWwkte3veSfO9j7NJJKsKVYG6ifCK0TYwwLs9Ekne0lylpQoY6sw_pcGimUzvakNUDFwYAwuUzaFEznOX7pVvTVhl6OB2DKczQt9M0qclaG9ZvO7ZXk4EYdMcdi0ldi1TBWwGQ_JJNCVGjv_fP_-2z6Id_x0ffkeWAbUFxyO6f0cFx1lzzLn9CeCeIvcEnMOfYka2g5LV_0kYdR1jHb-DtFQc_vMx7gTnEHwPJOWIv_d749k5slBDEAsNFJXARMjIzJAufoQOCj5WzGWjCR9LqeI44ue187DR4oKJyHmGGcj6dG6WIPXrEGsWygGPGrfQWUvxEadCdPMboBMo74xFZQZzJExaRIWarutjF7MsGp388b7JdsjUt5zI5Y42qXMM523Ef1eK9vAhD9Anhp5DR |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmuhJjRi_7cFrcWnLbusNDQR02RDFwI0s7UBIDBAEf78zZdV48OCt2e5mk2mn86btm8fYjZNUhCvG3ET5WGgbG2HcOBfxuKa9TEgTMtSZTZMsM4OB7RZk9cCFAYBw-Qwq1Axn-X7u1rRVhh5Ox2AKM_Rtks4q6FoF77ca2dt-W7TrvZrFtK5CuuCVAEl-CaeEuNHc_-cfD1j5h4HHu9-x5ZBtweyIPb6gi-O8ueN1_owATxB_g_dhxLEjX8CSbzyUtuo4Ajt-D28THP7wMm8E7hDvBJpzmb02G72Hlii0EMQUA8ZK4DJkZG5IGD5HFwQfKWdz0ISQpNXRCJFy1fnIafBAZeU8Ag3lfDIyShF_9JiVZvMZnDBupbeQ4CdKg66NI4xPoLwzHrEVRLk8ZWUyxHCxKXcx_LLB2R_Pr9luq9dJh2k7ezpne2R3WtxlfMFKq-UaLtmO-1hN35dXYbg-AaBFlBo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Web+Intelligence+and+Intelligent+Agent+Technology+%28WI-IAT%29&rft.atitle=Scrimmo%3A+A+Real-Time+Web+Scraper+Monitoring+the+Belgian+Real+Estate+Market&rft.au=Barzin%2C+Felix&rft.au=Yernaux%2C+Gonzague&rft.au=Vanhoof%2C+Wim&rft.date=2023-10-26&rft.pub=IEEE&rft.spage=335&rft.epage=338&rft_id=info:doi/10.1109%2FWI-IAT59888.2023.00054&rft.externalDocID=10350130 |