Parsing Information from Semistructured Documents
This chapter demonstrates how to construct a parser that is able to transform pure character data into R data structures. As an example one identifies climate data that are offered by the Natural Resources Conservation Service at the United States Department of Agriculture. The chapter focuses on a...
Saved in:
| Published in: | Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining pp. 359 - 370 |
|---|---|
| Main Authors: | , , , |
| Format: | Book Chapter |
| Language: | English |
| Published: |
Chichester, UK
John Wiley & Sons, Ltd
28.07.2014
|
| Subjects: | |
| ISBN: | 111883481X, 9781118834817 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This chapter demonstrates how to construct a parser that is able to transform pure character data into R data structures. As an example one identifies climate data that are offered by the Natural Resources Conservation Service at the United States Department of Agriculture. The chapter focuses on a set of text files that can be downloaded from an file transfer protocol (FTP) server. While the download procedure is simple, the files cannot be put into an R data structure directly. The displayed data are structured in a way which is human‐readable but not (yet) understandable by a computer program. The main goal is to describe the structure in a way that a computer can handle them. RCurl provides functionality to access data from FTP servers and stringr offers consistent functions for string processing with R. |
|---|---|
| ISBN: | 111883481X 9781118834817 |
| DOI: | 10.1002/9781118834732.ch13 |

