Managing Data Projects
Deploying a successful data collection project requires more than knowledge of web technologies. This chapter focuses on R and operation system functionality that will be required for setting up and maintaining large‐scale, automated data collection projects. Additionally, it discusses good practice...
Uložené v:
| Vydané v: | Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 322 - 339 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Kapitola |
| Jazyk: | English |
| Vydavateľské údaje: |
Chichester, UK
John Wiley & Sons, Ltd
28.07.2014
|
| Predmet: | |
| ISBN: | 111883481X, 9781118834817 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Deploying a successful data collection project requires more than knowledge of web technologies. This chapter focuses on R and operation system functionality that will be required for setting up and maintaining large‐scale, automated data collection projects. Additionally, it discusses good practices to organize and write code that adds robustness and traceability in case of errors. The chapter provides an overview of R functions for interacting with the local file system. It shows the methods for iterative code execution for downloading pages or extracting relevant information from multiple web documents. The chapter provides a template for organizing extraction code and making it more robust to failed specification. It concludes with an overview of system tools that can executive R scripts automatically, which is a key requirement for building datasets from regularly updated Internet resources. |
|---|---|
| ISBN: | 111883481X 9781118834817 |
| DOI: | 10.1002/9781118834732.ch11 |

