Regular Expressions and Essential String Functions

One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usual...

Full description

Saved in:
Bibliographic Details
Published in:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining pp. 196 - 218
Main Authors: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Format: Book Chapter
Language:English
Published: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Subjects:
ISBN:111883481X, 9781118834817
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usually proceeds in three steps. First it gathers the unstructured text, second determines the recurring patterns behind the information looking for, and third applies these patterns to the unstructured text to extract the information. This chapter focuses on the last two steps. It introduces powerful tool that helps retrieve data in such settings‐regular expressions. The chapter also introduces regular expressions as implemented in R. It provides an overview on how string manipulation can be used in practice. This is done by presenting commands that are available in the stringr package. The chapter concludes with some aspects of character encodings'an important concept in web scraping.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch8