HTML

This chapter introduces the fundamentals of Hyper Text Markup Language (HTML) from the perspective of a web data collector. One can learn how to use browsers to display the source code of webpages and inspect specific HTML elements. The chapter develops the logic of markup languages in general and t...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining s. 15 - 40
Hlavní autori: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Médium: Kapitola
Jazyk:English
Vydavateľské údaje: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Predmet:
ISBN:111883481X, 9781118834817
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:This chapter introduces the fundamentals of Hyper Text Markup Language (HTML) from the perspective of a web data collector. One can learn how to use browsers to display the source code of webpages and inspect specific HTML elements. The chapter develops the logic of markup languages in general and the syntax of HTML as a specific instance of a markup language. It presents the most important vocabulary in HTML. The chapter considers parsing— the process of reconstructing the structure and semantics of HTML documents—and how it helps to retrieve information from web documents. Start tags and end tags are also known as opening and closing tags. Tags are always enclosed by < and > to distinguish them from the content. Reserved characters are used for control purposes in a language. The chapter focuses on a subset of tags that are of special interest in the context of web data collection.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch2