Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limit...
Saved in:
| Published in: | IEEE signal processing letters Vol. 21; no. 9; pp. 1115 - 1119 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.09.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1070-9908, 1558-2361 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limitation, we develop a language-independent text-line extraction algorithm. Our method is based on connected components (CCs), however, unlike conventional methods, we analyze strokes and partition under-segmented CCs into normalized ones. Due to this normalization, the proposed method is able to estimate the states of CCs for a range of different languages and writing styles. From the estimated states, we build a cost function whose minimization yields text-lines. Experimental results show that the proposed method yields the state-of-the-art performance on Latin-based and Chinese script databases. Further, we submitted the proposed algorithm to the ICDAR 2013 handwriting segmentation competition and our method showed the best text-line extraction performance among 10 participant methods. |
|---|---|
| AbstractList | Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limitation, we develop a language-independent text-line extraction algorithm. Our method is based on connected components (CCs), however, unlike conventional methods, we analyze strokes and partition under-segmented CCs into normalized ones. Due to this normalization, the proposed method is able to estimate the states of CCs for a range of different languages and writing styles. From the estimated states, we build a cost function whose minimization yields text-lines. Experimental results show that the proposed method yields the state-of-the-art performance on Latin-based and Chinese script databases. Further, we submitted the proposed algorithm to the ICDAR 2013 handwriting segmentation competition and our method showed the best text-line extraction performance among 10 participant methods. |
| Author | Cho, Nam Ik Koo, Hyung Il Ryu, Jewoong |
| Author_xml | – sequence: 1 givenname: Jewoong surname: Ryu fullname: Ryu, Jewoong email: youjw@ispl.snu.ac.kr organization: INMC, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea – sequence: 2 givenname: Hyung Il surname: Koo fullname: Koo, Hyung Il email: hikoo@ajou.ac.kr organization: Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea – sequence: 3 givenname: Nam Ik surname: Cho fullname: Cho, Nam Ik email: nicho@snu.ac.kr organization: INMC, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea |
| BookMark | eNp9kDtPwzAURi0EElDYkVgisbCk-BXbGatSHlIQSMBsJc5NCUrtYjui_HtcFTEwsPhxfY5173eM9q2zgNAZwVNCcHlVPT9NKSZ8ShktSo730BEpCpVTJsh-OmOJ87LE6hAdh_COMVZEFUfooartcqyXkN_bFtaQFhuzF9jEvOotZItN9LWJvbPZbFg638e3VdY5n93Vtv1M1wg2u3ZmXCUvnKCDrh4CnP7sE_R6s3iZ3-XV4-39fFblhlEe86JtgQFTjDdUcMKNNKYTpjGi60zBWqk6kG16IUpJ0xTbQiO5ElKStm0aNkGXu3_X3n2MEKJe9cHAMNQW3Bg0EaJUqpQcJ_TiD_ruRm9Td5oUXGBMORWJEjvKeBeCh06bPtbbsdP4_aAJ1tuUdUpZb1PWPyknEf8R175f1f7rP-V8p_QA8IsLRUpMGfsG_GCJ5g |
| CODEN | ISPLEM |
| CitedBy_id | crossref_primary_10_1007_s11042_021_11858_0 crossref_primary_10_1016_j_patcog_2016_10_023 crossref_primary_10_1007_s10032_018_0304_3 crossref_primary_10_1016_j_jksuci_2022_04_021 crossref_primary_10_1049_iet_ipr_2019_1437 crossref_primary_10_1016_j_eswa_2021_115666 crossref_primary_10_1007_s10586_017_1567_z crossref_primary_10_1007_s10032_018_0305_2 crossref_primary_10_1007_s11042_020_09624_9 crossref_primary_10_1007_s10032_021_00362_8 crossref_primary_10_1109_ACCESS_2021_3128536 crossref_primary_10_1186_s13640_017_0229_7 crossref_primary_10_1007_s10032_015_0252_0 crossref_primary_10_1145_3474118 crossref_primary_10_1007_s41870_023_01230_w crossref_primary_10_1016_j_eswa_2022_118498 crossref_primary_10_1007_s44443_025_00168_2 crossref_primary_10_1080_02564602_2016_1160805 crossref_primary_10_1016_j_eswa_2019_112916 crossref_primary_10_1109_LSP_2015_2389852 crossref_primary_10_4018_IJACI_313967 crossref_primary_10_1007_s10032_024_00488_5 crossref_primary_10_1109_TIP_2016_2607418 crossref_primary_10_1007_s10032_019_00332_1 crossref_primary_10_1109_ACCESS_2021_3093568 crossref_primary_10_1007_s10032_021_00370_8 crossref_primary_10_1007_s10032_021_00377_1 |
| Cites_doi | 10.1109/34.506792 10.1109/ICDAR.2013.152 10.1109/34.244677 10.1109/ICIP.2008.4711927 10.1109/ICDAR.2009.243 10.1109/ICDAR.2011.119 10.1109/CVPR.2010.5540041 10.1109/ICDAR.2011.73 10.1109/TIP.2011.2166972 10.1016/j.patcog.2008.05.011 10.1016/j.patcog.2008.12.013 10.1109/ICDAR.2009.245 10.1006/cviu.1998.0684 10.1109/ICASSP.2008.4518379 10.1109/TIP.2013.2249082 10.1109/ICDAR.2009.206 10.1109/ICDAR.2013.283 10.1007/s10032-006-0037-6 10.1016/j.patcog.2008.12.021 10.1142/S0218001403002538 10.1109/ICDAR.2009.79 10.1016/j.patcog.2008.12.016 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2014 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2014 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1109/LSP.2014.2325940 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Technology Research Database Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2361 |
| EndPage | 1119 |
| ExternalDocumentID | 3377806561 10_1109_LSP_2014_2325940 6819023 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National IT Industry Promotion grantid: NIPA-2014-H0301-14-1019 – fundername: Samsung funderid: 10.13039/100004358 |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 85S 97E AAJGR AARMG AASAJ AAWTH AAYJJ ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D RIG F28 FR3 |
| ID | FETCH-LOGICAL-c324t-5dde3e3834b26414c7ccf6cbc6ffc53d78fe7d6411887cb5d78fb7486771ddbb3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 45 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000337149800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1070-9908 |
| IngestDate | Sun Sep 28 10:02:59 EDT 2025 Sun Jun 29 12:51:45 EDT 2025 Tue Nov 18 22:33:11 EST 2025 Sat Nov 29 01:48:43 EST 2025 Wed Aug 27 02:05:27 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c324t-5dde3e3834b26414c7ccf6cbc6ffc53d78fe7d6411887cb5d78fb7486771ddbb3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| PQID | 1546002426 |
| PQPubID | 75747 |
| PageCount | 5 |
| ParticipantIDs | proquest_journals_1546002426 ieee_primary_6819023 proquest_miscellaneous_1669889740 crossref_primary_10_1109_LSP_2014_2325940 crossref_citationtrail_10_1109_LSP_2014_2325940 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-09-01 |
| PublicationDateYYYYMMDD | 2014-09-01 |
| PublicationDate_xml | – month: 09 year: 2014 text: 2014-09-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE signal processing letters |
| PublicationTitleAbbrev | LSP |
| PublicationYear | 2014 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref16 ref19 ref18 arivazhagan (ref21) 2007; 6500 ref24 ref23 ref25 koo (ref17) 2010 ref20 ref22 ref8 ref7 ref9 ref4 ref3 lemaitre (ref26) 2013; 9021 ref5 bosch (ref6) 2011 |
| References_xml | – ident: ref18 doi: 10.1109/34.506792 – start-page: 201 year: 2011 ident: ref6 article-title: Statistical text line analysis in handwritten documents publication-title: Int'l Conf Frontiers in Handwriting Recognition (ICFHR'08) – ident: ref12 doi: 10.1109/ICDAR.2013.152 – ident: ref1 doi: 10.1109/34.244677 – ident: ref24 doi: 10.1109/ICIP.2008.4711927 – ident: ref10 doi: 10.1109/ICDAR.2009.243 – volume: 9021 start-page: 90 210d year: 2013 ident: ref26 article-title: Handwritten text segmentation using blurred image publication-title: Proc SPIE 9021 Doc Recognit Retrieval XXI – ident: ref20 doi: 10.1109/ICDAR.2011.119 – ident: ref19 doi: 10.1109/CVPR.2010.5540041 – ident: ref7 doi: 10.1109/ICDAR.2011.73 – ident: ref8 doi: 10.1109/TIP.2011.2166972 – start-page: 421 year: 2010 ident: ref17 article-title: State estimation in a document image and its application in text block identification and text line extraction publication-title: Eur Conf Computer Vision (ECCV) – ident: ref4 doi: 10.1016/j.patcog.2008.05.011 – ident: ref3 doi: 10.1016/j.patcog.2008.12.013 – ident: ref14 doi: 10.1109/ICDAR.2009.245 – volume: 6500 start-page: 65 000t?1 year: 2007 ident: ref21 article-title: A statistical approach to line segmentation in handwritten documents publication-title: Document Recognition and Retrieval XIV Proceedings of SPIE – ident: ref2 doi: 10.1006/cviu.1998.0684 – ident: ref11 doi: 10.1109/ICASSP.2008.4518379 – ident: ref13 doi: 10.1109/TIP.2013.2249082 – ident: ref5 doi: 10.1109/ICDAR.2009.206 – ident: ref15 doi: 10.1109/ICDAR.2013.283 – ident: ref16 doi: 10.1007/s10032-006-0037-6 – ident: ref22 doi: 10.1016/j.patcog.2008.12.021 – ident: ref23 doi: 10.1142/S0218001403002538 – ident: ref25 doi: 10.1109/ICDAR.2009.79 – ident: ref9 doi: 10.1016/j.patcog.2008.12.016 |
| SSID | ssj0008185 |
| Score | 2.334195 |
| Snippet | Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1115 |
| SubjectTerms | Algorithms Connected component based algorithm Cost function Data mining Estimates Extraction Feature extraction handwritten documents language-independent algorithm Minimization Partitioning algorithms Partitions Segmentation Signal processing algorithms State of the art text-line extraction text-line segmentation |
| Title | Language-Independent Text-Line Extraction Algorithm for Handwritten Documents |
| URI | https://ieeexplore.ieee.org/document/6819023 https://www.proquest.com/docview/1546002426 https://www.proquest.com/docview/1669889740 |
| Volume | 21 |
| WOSCitedRecordID | wos000337149800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2361 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008185 issn: 1070-9908 databaseCode: RIE dateStart: 19940101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED-24YM--DXF6ZQKvgh265Y2SR-HbkyYY-CUvZU2SXUwW-k69c_30nZloAi-tU3alLvk7neXyx3AlcNQCzJOTco7aKAol5gcUYjZDVG7htwnKkul9Dxi4zGfzdxJBW7KszBKqSz4TLX0ZbaXL2Ox0q6yNtXqq0uqUGWM5me1SqmrFU8eX2iZKGH5ekvSctujx4mO4bJbiB4cV7s5NlRQVlPlhyDOtMtg73__tQ-7BYo0ejnbD6CiokPY2cgtWIeHUeGJNO_LSrepMdWGLtqfyuh_pUl-qMHoLV7iZJ6-vhmIYI2hH8lPvEUwbdwVYy-P4GnQn94OzaJ2gikQIqWmg2IL6cyJHSDk6diCCRFSEQgahsIhkvFQMYktHZQyInD0g4Dp9HusI2UQkGOoRXGkTsAgVEhJLUWkK2yJBhr3HZ8S4Xd5SDn3G9Bek9MTRWJxXd9i4WUGhuV6yABPM8ArGNCA6_KN9zypxh9965rgZb-C1g1orjnmFatu6SEcpDnoaMBl2YzrRW-C-JGKV9iHUpfj9LSt09-_fAbbevw8jqwJtTRZqXPYEh_pfJlcZJPuG34T0_w |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_mB6gPfk1xflbwRbBbt7Rp-ii6sWE3Bk7ZW2mTVAfaSdepf76XNiuCIvjWNmkb7pK7310udwAXjota0GXUpKyJBor0iMkQhZitGLVrzEIi81RKj747GLDx2BtW4Ko8CyOlzIPPZF1d5nv5YsrnylXWoEp9tcgSrKjKWfq0Vil3leopIgwtE2UsW2xKWl7Dvx-qKC67jvjB8ZSj45sSyquq_BDFuX7pbP1vZNuwqXGkcV0wfgcqMtmFjW_ZBavQ97Uv0uyVtW4zY6RMXbRApdH-zNLiWINx_fI0TSfZ86uBGNbohon4wFuE08at_vdsDx467dFN19TVE0yOICkzHRRcSGlG7AhBT9PmLucx5RGnccwdIlwWS1dgSxPlDI8c9SByVQI-tylEFJF9WE6miTwAg1AuBLUkER63BZpoLHRCSnjYYjFlLKxBY0HOgOvU4qrCxUuQmxiWFyADAsWAQDOgBpflG29FWo0_-lYVwct-mtY1OF5wLNDrbhYgIKQF7KjBedmMK0Ztg4SJnM6xD6UewwlqW4e_f_kM1rqjvh_4vcHdEayrsRRRZcewnKVzeQKr_D2bzNLTfAJ-Aa-810U |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Language-Independent+Text-Line+Extraction+Algorithm+for+Handwritten+Documents&rft.jtitle=IEEE+signal+processing+letters&rft.au=Ryu%2C+Jewoong&rft.au=Koo%2C+Hyung+Il&rft.au=Cho%2C+Nam+Ik&rft.date=2014-09-01&rft.issn=1070-9908&rft.eissn=1558-2361&rft.volume=21&rft.issue=9&rft.spage=1115&rft.epage=1119&rft_id=info:doi/10.1109%2FLSP.2014.2325940&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LSP_2014_2325940 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon |