HTML

This chapter introduces the fundamentals of Hyper Text Markup Language (HTML) from the perspective of a web data collector. One can learn how to use browsers to display the source code of webpages and inspect specific HTML elements. The chapter develops the logic of markup languages in general and t...

Full description

Saved in:
Bibliographic Details
Published in:Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining pp. 15 - 40
Main Authors: Munzert, Simon, Rubba, Christian, Meißner, Peter, Nyhuis, Dominic
Format: Book Chapter
Language:English
Published: Chichester, UK John Wiley & Sons, Ltd 28.07.2014
Subjects:
ISBN:111883481X, 9781118834817
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This chapter introduces the fundamentals of Hyper Text Markup Language (HTML) from the perspective of a web data collector. One can learn how to use browsers to display the source code of webpages and inspect specific HTML elements. The chapter develops the logic of markup languages in general and the syntax of HTML as a specific instance of a markup language. It presents the most important vocabulary in HTML. The chapter considers parsing— the process of reconstructing the structure and semantics of HTML documents—and how it helps to retrieve information from web documents. Start tags and end tags are also known as opening and closing tags. Tags are always enclosed by < and > to distinguish them from the content. Reserved characters are used for control purposes in a language. The chapter focuses on a subset of tags that are of special interest in the context of web data collection.
ISBN:111883481X
9781118834817
DOI:10.1002/9781118834732.ch2