Regular Expressions and Essential String Functions
One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usual...
Saved in:
| Published in: | Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining pp. 196 - 218 |
|---|---|
| Main Authors: | , , , |
| Format: | Book Chapter |
| Language: | English |
| Published: |
Chichester, UK
John Wiley & Sons, Ltd
28.07.2014
|
| Subjects: | |
| ISBN: | 111883481X, 9781118834817 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usually proceeds in three steps. First it gathers the unstructured text, second determines the recurring patterns behind the information looking for, and third applies these patterns to the unstructured text to extract the information. This chapter focuses on the last two steps. It introduces powerful tool that helps retrieve data in such settings‐regular expressions. The chapter also introduces regular expressions as implemented in R. It provides an overview on how string manipulation can be used in practice. This is done by presenting commands that are available in the stringr package. The chapter concludes with some aspects of character encodings'an important concept in web scraping. |
|---|---|
| ISBN: | 111883481X 9781118834817 |
| DOI: | 10.1002/9781118834732.ch8 |

