Regular Expressions and Essential String Functions

One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usual...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining S. 196 - 218
Hauptverfasser: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Format: Buchkapitel
Sprache:Englisch
Veröffentlicht: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Schlagworte:
ISBN:111883481X, 9781118834817
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usually proceeds in three steps. First it gathers the unstructured text, second determines the recurring patterns behind the information looking for, and third applies these patterns to the unstructured text to extract the information. This chapter focuses on the last two steps. It introduces powerful tool that helps retrieve data in such settings‐regular expressions. The chapter also introduces regular expressions as implemented in R. It provides an overview on how string manipulation can be used in practice. This is done by presenting commands that are available in the stringr package. The chapter concludes with some aspects of character encodings'an important concept in web scraping.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch8