A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures
SUMMARY This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are signi...
Uloženo v:
| Vydáno v: | Electronics and communications in Japan Ročník 97; číslo 10; s. 1 - 10 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Blackwell Publishing Ltd
01.10.2014
|
| Témata: | |
| ISSN: | 1942-9533, 1942-9541 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | SUMMARY
This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are significant factors. We extract features of impression and complexity from plain text and additional data, such as HTML tags. In order to compare the effect of extracting features, we assess readability rank by machine learning. We conduct fivefold cross validation for each domain and calculate the root mean squared error between the actual rank and the estimated rank. Cross validation experiments confirm that the performance of our method is high, showing the effectiveness of extracting features about the impression and complexity for readability assessment. |
|---|---|
| Bibliografie: | ArticleID:ECJ11565 ark:/67375/WNG-G1WDZ2R1-9 istex:5DED0EAB71BC0C2E3702E902E4B2498135D0A3FB |
| ISSN: | 1942-9533 1942-9541 |
| DOI: | 10.1002/ecj.11565 |