Scraping the Web
This chapter addresses three main aspects of web scraping with R. The first is how to retrieve data from the Web in different scenarios. The chapter looks at the stage where one can try to get resources from servers into R. It provides an introduction to Selenium, a browser automation tool that can...
Uložené v:
| Vydané v: | Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 219 - 294 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Kapitola |
| Jazyk: | English |
| Vydavateľské údaje: |
Chichester, UK
John Wiley & Sons, Ltd
28.07.2014
|
| Predmet: | |
| ISBN: | 111883481X, 9781118834817 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | This chapter addresses three main aspects of web scraping with R. The first is how to retrieve data from the Web in different scenarios. The chapter looks at the stage where one can try to get resources from servers into R. It provides an introduction to Selenium, a browser automation tool that can be used to gather content from JavaScript‐enriched pages. The chapter turns to strategies for extracting information from gathered resources. It sheds light on these techniques from a more practical perspective, providing a stylized sketch of the strategies and discuss their advantages and disadvantages. The chapter addresses an important, but sometimes disregarded aspect of web scraping. It deals with the question of how to behave nicely on the Web as a web scraper. The chapter concludes with a glimpse of ongoing efforts for giving R more interfaces with web data and on lighthouses of web scraping more generally. |
|---|---|
| ISBN: | 111883481X 9781118834817 |
| DOI: | 10.1002/9781118834732.ch9 |

