Scraping the Web

This chapter addresses three main aspects of web scraping with R. The first is how to retrieve data from the Web in different scenarios. The chapter looks at the stage where one can try to get resources from servers into R. It provides an introduction to Selenium, a browser automation tool that can...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 219 - 294
Hlavní autori:	Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Médium:	Kapitola
Jazyk:	English
Vydavateľské údaje:	Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Predmet:	JavaScript‐enriched pages Selenium web scraping
ISBN:	111883481X, 9781118834817
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This chapter addresses three main aspects of web scraping with R. The first is how to retrieve data from the Web in different scenarios. The chapter looks at the stage where one can try to get resources from servers into R. It provides an introduction to Selenium, a browser automation tool that can be used to gather content from JavaScript‐enriched pages. The chapter turns to strategies for extracting information from gathered resources. It sheds light on these techniques from a more practical perspective, providing a stylized sketch of the strategies and discuss their advantages and disadvantages. The chapter addresses an important, but sometimes disregarded aspect of web scraping. It deals with the question of how to behave nicely on the Web as a web scraper. The chapter concludes with a glimpse of ongoing efforts for giving R more interfaces with web data and on lighthouses of web scraping more generally.
ISBN:	111883481X 9781118834817
DOI:	10.1002/9781118834732.ch9