Managing Data Projects

Deploying a successful data collection project requires more than knowledge of web technologies. This chapter focuses on R and operation system functionality that will be required for setting up and maintaining large‐scale, automated data collection projects. Additionally, it discusses good practice...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 322 - 339
Hlavní autori: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Médium: Kapitola
Jazyk:English
Vydavateľské údaje: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Predmet:
ISBN:111883481X, 9781118834817
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deploying a successful data collection project requires more than knowledge of web technologies. This chapter focuses on R and operation system functionality that will be required for setting up and maintaining large‐scale, automated data collection projects. Additionally, it discusses good practices to organize and write code that adds robustness and traceability in case of errors. The chapter provides an overview of R functions for interacting with the local file system. It shows the methods for iterative code execution for downloading pages or extracting relevant information from multiple web documents. The chapter provides a template for organizing extraction code and making it more robust to failed specification. It concludes with an overview of system tools that can executive R scripts automatically, which is a key requirement for building datasets from regularly updated Internet resources.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch11