Collecting data on textiles from the internet using web crawling and web scraping tools

Fibre population surveys are a necessary part of the forensic fibres examination field. They provide valuable information as to which fibres are the most popular and help estimate the likelihood of observing similar properties in a fibre unrelated to the event. The time needed to carry these types o...

Full description

Saved in:
Bibliographic Details
Published in:Forensic science international Vol. 322; p. 110753
Main Authors: Muehlethaler, Cyril, Albert, René
Format: Journal Article
Language:English
Published: Ireland Elsevier B.V 01.05.2021
Elsevier Limited
Subjects:
ISSN:0379-0738, 1872-6283, 1872-6283
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fibre population surveys are a necessary part of the forensic fibres examination field. They provide valuable information as to which fibres are the most popular and help estimate the likelihood of observing similar properties in a fibre unrelated to the event. The time needed to carry these types of studies is however a major obstacle to wider use. With the advent of e-commerce and digital computation, collecting information from digital sources and structuring it in a convenient way may provide meaningful information on fibres population. It has become more affordable for researchers who can now devote most of their time to extracting meaningful information from the structured data. In this article, we have used a scrapy and kibana/elastic search interface to crawl and scrape a major online clothes retailer. In less than 24 h we have extracted 68 text-based field describing a total of 24,701 clothes to help provide precise estimations of fibres types and color frequencies. We were able to provide data that cotton, polyester, viscose and elastane are the 4 main types of fibres used in the textile industry. Elastane, while being very popular in garments, rarely accounts for more than 10% of the mass while cotton accounts for up to 80% of content. The most common colors are white, black, and blue, with important dependencies to the fibre type. Through further statistics and examples we demonstrate that web scraping techniques have the potential to provide near real-time population studies that can greatly benefit forensic practitioners. [Display omitted] •We used a scrapy and kibana/elastic search interface to crawl and scrap a major online clothes retailer.•In less than 24 h, 68 text-based field describing a total of 24,701 clothes were extracted.•Cotton, polyester, viscose and elastane are the 4 main types of fibers used in the textile industry.•The most common colors are white, black, and blue, with important dependencies to the fiber type.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0379-0738
1872-6283
1872-6283
DOI:10.1016/j.forsciint.2021.110753