Language-Independent Text-Line Extraction Algorithm for Handwritten Documents

Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limit...

Full description

Saved in:
Bibliographic Details
Published in:IEEE signal processing letters Vol. 21; no. 9; pp. 1115 - 1119
Main Authors: Ryu, Jewoong, Koo, Hyung Il, Cho, Nam Ik
Format: Journal Article
Language:English
Published: New York IEEE 01.09.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1070-9908, 1558-2361
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limitation, we develop a language-independent text-line extraction algorithm. Our method is based on connected components (CCs), however, unlike conventional methods, we analyze strokes and partition under-segmented CCs into normalized ones. Due to this normalization, the proposed method is able to estimate the states of CCs for a range of different languages and writing styles. From the estimated states, we build a cost function whose minimization yields text-lines. Experimental results show that the proposed method yields the state-of-the-art performance on Latin-based and Chinese script databases. Further, we submitted the proposed algorithm to the ICDAR 2013 handwriting segmentation competition and our method showed the best text-line extraction performance among 10 participant methods.
AbstractList Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address this problem. However, most of them exploit features of specific languages and work only for a given language. In order to overcome this limitation, we develop a language-independent text-line extraction algorithm. Our method is based on connected components (CCs), however, unlike conventional methods, we analyze strokes and partition under-segmented CCs into normalized ones. Due to this normalization, the proposed method is able to estimate the states of CCs for a range of different languages and writing styles. From the estimated states, we build a cost function whose minimization yields text-lines. Experimental results show that the proposed method yields the state-of-the-art performance on Latin-based and Chinese script databases. Further, we submitted the proposed algorithm to the ICDAR 2013 handwriting segmentation competition and our method showed the best text-line extraction performance among 10 participant methods.
Author Cho, Nam Ik
Koo, Hyung Il
Ryu, Jewoong
Author_xml – sequence: 1
  givenname: Jewoong
  surname: Ryu
  fullname: Ryu, Jewoong
  email: youjw@ispl.snu.ac.kr
  organization: INMC, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
– sequence: 2
  givenname: Hyung Il
  surname: Koo
  fullname: Koo, Hyung Il
  email: hikoo@ajou.ac.kr
  organization: Department of Electrical and Computer Engineering, Ajou University, Suwon, Korea
– sequence: 3
  givenname: Nam Ik
  surname: Cho
  fullname: Cho, Nam Ik
  email: nicho@snu.ac.kr
  organization: INMC, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
BookMark eNp9kDtPwzAURi0EElDYkVgisbCk-BXbGatSHlIQSMBsJc5NCUrtYjui_HtcFTEwsPhxfY5173eM9q2zgNAZwVNCcHlVPT9NKSZ8ShktSo730BEpCpVTJsh-OmOJ87LE6hAdh_COMVZEFUfooartcqyXkN_bFtaQFhuzF9jEvOotZItN9LWJvbPZbFg638e3VdY5n93Vtv1M1wg2u3ZmXCUvnKCDrh4CnP7sE_R6s3iZ3-XV4-39fFblhlEe86JtgQFTjDdUcMKNNKYTpjGi60zBWqk6kG16IUpJ0xTbQiO5ElKStm0aNkGXu3_X3n2MEKJe9cHAMNQW3Bg0EaJUqpQcJ_TiD_ruRm9Td5oUXGBMORWJEjvKeBeCh06bPtbbsdP4_aAJ1tuUdUpZb1PWPyknEf8R175f1f7rP-V8p_QA8IsLRUpMGfsG_GCJ5g
CODEN ISPLEM
CitedBy_id crossref_primary_10_1007_s11042_021_11858_0
crossref_primary_10_1016_j_patcog_2016_10_023
crossref_primary_10_1007_s10032_018_0304_3
crossref_primary_10_1016_j_jksuci_2022_04_021
crossref_primary_10_1049_iet_ipr_2019_1437
crossref_primary_10_1016_j_eswa_2021_115666
crossref_primary_10_1007_s10586_017_1567_z
crossref_primary_10_1007_s10032_018_0305_2
crossref_primary_10_1007_s11042_020_09624_9
crossref_primary_10_1007_s10032_021_00362_8
crossref_primary_10_1109_ACCESS_2021_3128536
crossref_primary_10_1186_s13640_017_0229_7
crossref_primary_10_1007_s10032_015_0252_0
crossref_primary_10_1145_3474118
crossref_primary_10_1007_s41870_023_01230_w
crossref_primary_10_1016_j_eswa_2022_118498
crossref_primary_10_1007_s44443_025_00168_2
crossref_primary_10_1080_02564602_2016_1160805
crossref_primary_10_1016_j_eswa_2019_112916
crossref_primary_10_1109_LSP_2015_2389852
crossref_primary_10_4018_IJACI_313967
crossref_primary_10_1007_s10032_024_00488_5
crossref_primary_10_1109_TIP_2016_2607418
crossref_primary_10_1007_s10032_019_00332_1
crossref_primary_10_1109_ACCESS_2021_3093568
crossref_primary_10_1007_s10032_021_00370_8
crossref_primary_10_1007_s10032_021_00377_1
Cites_doi 10.1109/34.506792
10.1109/ICDAR.2013.152
10.1109/34.244677
10.1109/ICIP.2008.4711927
10.1109/ICDAR.2009.243
10.1109/ICDAR.2011.119
10.1109/CVPR.2010.5540041
10.1109/ICDAR.2011.73
10.1109/TIP.2011.2166972
10.1016/j.patcog.2008.05.011
10.1016/j.patcog.2008.12.013
10.1109/ICDAR.2009.245
10.1006/cviu.1998.0684
10.1109/ICASSP.2008.4518379
10.1109/TIP.2013.2249082
10.1109/ICDAR.2009.206
10.1109/ICDAR.2013.283
10.1007/s10032-006-0037-6
10.1016/j.patcog.2008.12.021
10.1142/S0218001403002538
10.1109/ICDAR.2009.79
10.1016/j.patcog.2008.12.016
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2014
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2014
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/LSP.2014.2325940
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Technology Research Database

Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2361
EndPage 1119
ExternalDocumentID 3377806561
10_1109_LSP_2014_2325940
6819023
Genre orig-research
GrantInformation_xml – fundername: National IT Industry Promotion
  grantid: NIPA-2014-H0301-14-1019
– fundername: Samsung
  funderid: 10.13039/100004358
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
85S
97E
AAJGR
AARMG
AASAJ
AAWTH
AAYJJ
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
RIG
F28
FR3
ID FETCH-LOGICAL-c324t-5dde3e3834b26414c7ccf6cbc6ffc53d78fe7d6411887cb5d78fb7486771ddbb3
IEDL.DBID RIE
ISICitedReferencesCount 45
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000337149800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1070-9908
IngestDate Sun Sep 28 10:02:59 EDT 2025
Sun Jun 29 12:51:45 EDT 2025
Tue Nov 18 22:33:11 EST 2025
Sat Nov 29 01:48:43 EST 2025
Wed Aug 27 02:05:27 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c324t-5dde3e3834b26414c7ccf6cbc6ffc53d78fe7d6411887cb5d78fb7486771ddbb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
PQID 1546002426
PQPubID 75747
PageCount 5
ParticipantIDs proquest_journals_1546002426
ieee_primary_6819023
proquest_miscellaneous_1669889740
crossref_primary_10_1109_LSP_2014_2325940
crossref_citationtrail_10_1109_LSP_2014_2325940
PublicationCentury 2000
PublicationDate 2014-09-01
PublicationDateYYYYMMDD 2014-09-01
PublicationDate_xml – month: 09
  year: 2014
  text: 2014-09-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE signal processing letters
PublicationTitleAbbrev LSP
PublicationYear 2014
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref11
ref10
ref2
ref1
ref16
ref19
ref18
arivazhagan (ref21) 2007; 6500
ref24
ref23
ref25
koo (ref17) 2010
ref20
ref22
ref8
ref7
ref9
ref4
ref3
lemaitre (ref26) 2013; 9021
ref5
bosch (ref6) 2011
References_xml – ident: ref18
  doi: 10.1109/34.506792
– start-page: 201
  year: 2011
  ident: ref6
  article-title: Statistical text line analysis in handwritten documents
  publication-title: Int'l Conf Frontiers in Handwriting Recognition (ICFHR'08)
– ident: ref12
  doi: 10.1109/ICDAR.2013.152
– ident: ref1
  doi: 10.1109/34.244677
– ident: ref24
  doi: 10.1109/ICIP.2008.4711927
– ident: ref10
  doi: 10.1109/ICDAR.2009.243
– volume: 9021
  start-page: 90 210d
  year: 2013
  ident: ref26
  article-title: Handwritten text segmentation using blurred image
  publication-title: Proc SPIE 9021 Doc Recognit Retrieval XXI
– ident: ref20
  doi: 10.1109/ICDAR.2011.119
– ident: ref19
  doi: 10.1109/CVPR.2010.5540041
– ident: ref7
  doi: 10.1109/ICDAR.2011.73
– ident: ref8
  doi: 10.1109/TIP.2011.2166972
– start-page: 421
  year: 2010
  ident: ref17
  article-title: State estimation in a document image and its application in text block identification and text line extraction
  publication-title: Eur Conf Computer Vision (ECCV)
– ident: ref4
  doi: 10.1016/j.patcog.2008.05.011
– ident: ref3
  doi: 10.1016/j.patcog.2008.12.013
– ident: ref14
  doi: 10.1109/ICDAR.2009.245
– volume: 6500
  start-page: 65 000t?1
  year: 2007
  ident: ref21
  article-title: A statistical approach to line segmentation in handwritten documents
  publication-title: Document Recognition and Retrieval XIV Proceedings of SPIE
– ident: ref2
  doi: 10.1006/cviu.1998.0684
– ident: ref11
  doi: 10.1109/ICASSP.2008.4518379
– ident: ref13
  doi: 10.1109/TIP.2013.2249082
– ident: ref5
  doi: 10.1109/ICDAR.2009.206
– ident: ref15
  doi: 10.1109/ICDAR.2013.283
– ident: ref16
  doi: 10.1007/s10032-006-0037-6
– ident: ref22
  doi: 10.1016/j.patcog.2008.12.021
– ident: ref23
  doi: 10.1142/S0218001403002538
– ident: ref25
  doi: 10.1109/ICDAR.2009.79
– ident: ref9
  doi: 10.1016/j.patcog.2008.12.016
SSID ssj0008185
Score 2.334195
Snippet Text-line extraction in handwritten documents is an important step for document image understanding, and a number of algorithms have been proposed to address...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1115
SubjectTerms Algorithms
Connected component based algorithm
Cost function
Data mining
Estimates
Extraction
Feature extraction
handwritten documents
language-independent algorithm
Minimization
Partitioning algorithms
Partitions
Segmentation
Signal processing algorithms
State of the art
text-line extraction
text-line segmentation
Title Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
URI https://ieeexplore.ieee.org/document/6819023
https://www.proquest.com/docview/1546002426
https://www.proquest.com/docview/1669889740
Volume 21
WOSCitedRecordID wos000337149800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2361
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008185
  issn: 1070-9908
  databaseCode: RIE
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED-24YM--DXF6ZQKvgh265Y2SR-HbkyYY-CUvZU2SXUwW-k69c_30nZloAi-tU3alLvk7neXyx3AlcNQCzJOTco7aKAol5gcUYjZDVG7htwnKkul9Dxi4zGfzdxJBW7KszBKqSz4TLX0ZbaXL2Ox0q6yNtXqq0uqUGWM5me1SqmrFU8eX2iZKGH5ekvSctujx4mO4bJbiB4cV7s5NlRQVlPlhyDOtMtg73__tQ-7BYo0ejnbD6CiokPY2cgtWIeHUeGJNO_LSrepMdWGLtqfyuh_pUl-qMHoLV7iZJ6-vhmIYI2hH8lPvEUwbdwVYy-P4GnQn94OzaJ2gikQIqWmg2IL6cyJHSDk6diCCRFSEQgahsIhkvFQMYktHZQyInD0g4Dp9HusI2UQkGOoRXGkTsAgVEhJLUWkK2yJBhr3HZ8S4Xd5SDn3G9Bek9MTRWJxXd9i4WUGhuV6yABPM8ArGNCA6_KN9zypxh9965rgZb-C1g1orjnmFatu6SEcpDnoaMBl2YzrRW-C-JGKV9iHUpfj9LSt09-_fAbbevw8jqwJtTRZqXPYEh_pfJlcZJPuG34T0_w
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_mB6gPfk1xflbwRbBbt7Rp-ii6sWE3Bk7ZW2mTVAfaSdepf76XNiuCIvjWNmkb7pK7310udwAXjota0GXUpKyJBor0iMkQhZitGLVrzEIi81RKj747GLDx2BtW4Ko8CyOlzIPPZF1d5nv5YsrnylXWoEp9tcgSrKjKWfq0Vil3leopIgwtE2UsW2xKWl7Dvx-qKC67jvjB8ZSj45sSyquq_BDFuX7pbP1vZNuwqXGkcV0wfgcqMtmFjW_ZBavQ97Uv0uyVtW4zY6RMXbRApdH-zNLiWINx_fI0TSfZ86uBGNbohon4wFuE08at_vdsDx467dFN19TVE0yOICkzHRRcSGlG7AhBT9PmLucx5RGnccwdIlwWS1dgSxPlDI8c9SByVQI-tylEFJF9WE6miTwAg1AuBLUkER63BZpoLHRCSnjYYjFlLKxBY0HOgOvU4qrCxUuQmxiWFyADAsWAQDOgBpflG29FWo0_-lYVwct-mtY1OF5wLNDrbhYgIKQF7KjBedmMK0Ztg4SJnM6xD6UewwlqW4e_f_kM1rqjvh_4vcHdEayrsRRRZcewnKVzeQKr_D2bzNLTfAJ-Aa-810U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Language-Independent+Text-Line+Extraction+Algorithm+for+Handwritten+Documents&rft.jtitle=IEEE+signal+processing+letters&rft.au=Ryu%2C+Jewoong&rft.au=Koo%2C+Hyung+Il&rft.au=Cho%2C+Nam+Ik&rft.date=2014-09-01&rft.issn=1070-9908&rft.eissn=1558-2361&rft.volume=21&rft.issue=9&rft.spage=1115&rft.epage=1119&rft_id=info:doi/10.1109%2FLSP.2014.2325940&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LSP_2014_2325940
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon