Regular Expressions and Essential String Functions
One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usual...
Uložené v:
| Vydané v: | Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 196 - 218 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Kapitola |
| Jazyk: | English |
| Vydavateľské údaje: |
Chichester, UK
John Wiley & Sons, Ltd
28.07.2014
|
| Predmet: | |
| ISBN: | 111883481X, 9781118834817 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | One of the central tasks in web scraping is to collect the relevant information for the research problem from heaps of textual data. Within the unstructured text we are often interested in systematic information—especially when we want to analyze the data using quantitative methods. The method usually proceeds in three steps. First it gathers the unstructured text, second determines the recurring patterns behind the information looking for, and third applies these patterns to the unstructured text to extract the information. This chapter focuses on the last two steps. It introduces powerful tool that helps retrieve data in such settings‐regular expressions. The chapter also introduces regular expressions as implemented in R. It provides an overview on how string manipulation can be used in practice. This is done by presenting commands that are available in the stringr package. The chapter concludes with some aspects of character encodings'an important concept in web scraping. |
|---|---|
| ISBN: | 111883481X 9781118834817 |
| DOI: | 10.1002/9781118834732.ch8 |

