A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures

SUMMARY This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are signi...

Full description

Saved in:
Bibliographic Details
Published in:Electronics and communications in Japan Vol. 97; no. 10; pp. 1 - 10
Main Authors: Yamasaki, Takahiro, Tokiwa, Kin-Ichiroh
Format: Journal Article
Language:English
Published: Blackwell Publishing Ltd 01.10.2014
Subjects:
ISSN:1942-9533, 1942-9541
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:SUMMARY This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are significant factors. We extract features of impression and complexity from plain text and additional data, such as HTML tags. In order to compare the effect of extracting features, we assess readability rank by machine learning. We conduct fivefold cross validation for each domain and calculate the root mean squared error between the actual rank and the estimated rank. Cross validation experiments confirm that the performance of our method is high, showing the effectiveness of extracting features about the impression and complexity for readability assessment.
Bibliography:ArticleID:ECJ11565
ark:/67375/WNG-G1WDZ2R1-9
istex:5DED0EAB71BC0C2E3702E902E4B2498135D0A3FB
ISSN:1942-9533
1942-9541
DOI:10.1002/ecj.11565