A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures

SUMMARY This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are signi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Electronics and communications in Japan Ročník 97; číslo 10; s. 1 - 10
Hlavní autoři: Yamasaki, Takahiro, Tokiwa, Kin-Ichiroh
Médium: Journal Article
Jazyk:angličtina
Vydáno: Blackwell Publishing Ltd 01.10.2014
Témata:
ISSN:1942-9533, 1942-9541
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:SUMMARY This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are significant factors. We extract features of impression and complexity from plain text and additional data, such as HTML tags. In order to compare the effect of extracting features, we assess readability rank by machine learning. We conduct fivefold cross validation for each domain and calculate the root mean squared error between the actual rank and the estimated rank. Cross validation experiments confirm that the performance of our method is high, showing the effectiveness of extracting features about the impression and complexity for readability assessment.
Bibliografie:ArticleID:ECJ11565
ark:/67375/WNG-G1WDZ2R1-9
istex:5DED0EAB71BC0C2E3702E902E4B2498135D0A3FB
ISSN:1942-9533
1942-9541
DOI:10.1002/ecj.11565