Parsing Information from Semistructured Documents

This chapter demonstrates how to construct a parser that is able to transform pure character data into R data structures. As an example one identifies climate data that are offered by the Natural Resources Conservation Service at the United States Department of Agriculture. The chapter focuses on a...

Full description

Saved in:
Bibliographic Details
Published in:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining pp. 359 - 370
Main Authors: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Format: Book Chapter
Language:English
Published: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Subjects:
ISBN:111883481X, 9781118834817
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This chapter demonstrates how to construct a parser that is able to transform pure character data into R data structures. As an example one identifies climate data that are offered by the Natural Resources Conservation Service at the United States Department of Agriculture. The chapter focuses on a set of text files that can be downloaded from an file transfer protocol (FTP) server. While the download procedure is simple, the files cannot be put into an R data structure directly. The displayed data are structured in a way which is human‐readable but not (yet) understandable by a computer program. The main goal is to describe the structure in a way that a computer can handle them. RCurl provides functionality to access data from FTP servers and stringr offers consistent functions for string processing with R.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch13