Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

SUMMARYIn wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three‐poi...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Concurrency and computation Ročník 26; číslo 3; s. 821 - 840
Hlavní autori: Li, Rongchun, Dou, Yong, Zou, Dan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Blackwell Publishing Ltd 10.03.2014
Predmet:
ISSN:1532-0626, 1532-0634
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract SUMMARYIn wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three‐point VDA (TVDA) was proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, the forward, trace‐back, and decoding phases. In this paper, we analyze the parallelism of TVDA and propose parallel TVDA on the multi‐core CPU, graphics processing unit (GPU), and field programmable gate array (FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGA computing platforms. For CPU platforms, we perform two optimization methods, single instruction multiple data and multithreading to gain over 145 × speedup over the naive CPU version on a quad‐core CPU platform. For GPU platforms, we propose the combination of cached memory optimization, coalesced global memory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughput of 404.65 Mbps and 12 × speedup over initial GPU versions on an NVIDIA GeForce GTX580 card and 7 × speedup over Intel quad‐core CPU i5‐2300, under the same manufacturing year and both with fully optimized schemes. In addition, for FPGA platforms, we customize a radix‐4 pipelined architecture for the TVDA in a 45‐nm FPGA chip from Xilinx (XC6VLX760). Under 209.15‐MHz clock rate, it achieves a throughput of 418.30 Mbps. Finally, we also discuss the performance evaluation and efficiency comparison of different flexible architectures for real‐time Viterbi decoding in terms of the decoding throughput, power consumption, optimization schemes, programming costs, and price costs.Copyright © 2013 John Wiley & Sons, Ltd.
AbstractList In wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three‐point VDA (TVDA) was proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, the forward, trace‐back, and decoding phases. In this paper, we analyze the parallelism of TVDA and propose parallel TVDA on the multi‐core CPU, graphics processing unit (GPU), and field programmable gate array (FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGA computing platforms. For CPU platforms, we perform two optimization methods, single instruction multiple data and multithreading to gain over 145 × speedup over the naive CPU version on a quad‐core CPU platform. For GPU platforms, we propose the combination of cached memory optimization, coalesced global memory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughput of 404.65 Mbps and 12 × speedup over initial GPU versions on an NVIDIA GeForce GTX580 card and 7 × speedup over Intel quad‐core CPU i5‐2300, under the same manufacturing year and both with fully optimized schemes. In addition, for FPGA platforms, we customize a radix‐4 pipelined architecture for the TVDA in a 45‐nm FPGA chip from Xilinx (XC6VLX760). Under 209.15‐MHz clock rate, it achieves a throughput of 418.30 Mbps. Finally, we also discuss the performance evaluation and efficiency comparison of different flexible architectures for real‐time Viterbi decoding in terms of the decoding throughput, power consumption, optimization schemes, programming costs, and price costs.Copyright © 2013 John Wiley & Sons, Ltd.
SUMMARYIn wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three‐point VDA (TVDA) was proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, the forward, trace‐back, and decoding phases. In this paper, we analyze the parallelism of TVDA and propose parallel TVDA on the multi‐core CPU, graphics processing unit (GPU), and field programmable gate array (FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGA computing platforms. For CPU platforms, we perform two optimization methods, single instruction multiple data and multithreading to gain over 145 × speedup over the naive CPU version on a quad‐core CPU platform. For GPU platforms, we propose the combination of cached memory optimization, coalesced global memory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughput of 404.65 Mbps and 12 × speedup over initial GPU versions on an NVIDIA GeForce GTX580 card and 7 × speedup over Intel quad‐core CPU i5‐2300, under the same manufacturing year and both with fully optimized schemes. In addition, for FPGA platforms, we customize a radix‐4 pipelined architecture for the TVDA in a 45‐nm FPGA chip from Xilinx (XC6VLX760). Under 209.15‐MHz clock rate, it achieves a throughput of 418.30 Mbps. Finally, we also discuss the performance evaluation and efficiency comparison of different flexible architectures for real‐time Viterbi decoding in terms of the decoding throughput, power consumption, optimization schemes, programming costs, and price costs.Copyright © 2013 John Wiley & Sons, Ltd.
In wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional characteristic. Recently, the three-point VDA (TVDA) was proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, the forward, trace-back, and decoding phases. In this paper, we analyze the parallelism of TVDA and propose parallel TVDA on the multi-core CPU, graphics processing unit (GPU), and field programmable gate array (FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGA computing platforms. For CPU platforms, we perform two optimization methods, single instruction multiple data and multithreading to gain over 145speedup over the naive CPU version on a quad-core CPU platform. For GPU platforms, we propose the combination of cached memory optimization, coalesced global memory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughput of 404.65Mbps and 12speedup over initial GPU versions on an NVIDIA GeForce GTX580 card and 7speedup over Intel quad-core CPU i5-2300, under the same manufacturing year and both with fully optimized schemes. In addition, for FPGA platforms, we customize a radix-4 pipelined architecture for the TVDA in a 45-nm FPGA chip from Xilinx (XC6VLX760). Under 209.15-MHz clock rate, it achieves a throughput of 418.30Mbps. Finally, we also discuss the performance evaluation and efficiency comparison of different flexible architectures for real-time Viterbi decoding in terms of the decoding throughput, power consumption, optimization schemes, programming costs, and price costs.Copyright copyright 2013 John Wiley & Sons, Ltd.
Author Zou, Dan
Dou, Yong
Li, Rongchun
Author_xml – sequence: 1
  givenname: Rongchun
  surname: Li
  fullname: Li, Rongchun
  email: Correspondence to: Rongchun Li, National Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Changsha, China., rongchunli@nudt.edu.cn
  organization: National Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Changsha, China
– sequence: 2
  givenname: Yong
  surname: Dou
  fullname: Dou, Yong
  organization: National Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Changsha, China
– sequence: 3
  givenname: Dan
  surname: Zou
  fullname: Zou, Dan
  organization: National Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Changsha, China
BookMark eNp1kF1P2zAUhi0EEp_SfoIvd0GKHSexc4mqtkxUWyWK4M46dY7BmxMHO7Dx75eqE9PQuDlfes578RyT_S50SMgnziacsfzC9DgRrBZ75IiXIs9YJYr9tzmvDslxSt8Z45wJfkRgZq0zDruB9hDBe_TUtb3HdjzB4EJHg6XDY0TM-uBG7MUNGDeONmhC47oHCv4hRDc8tnSEp6vbc7rYFugaOl8tLk_JgQWf8OxPPyHr-Ww9vcqW3xZfppfLzBQsF1mjals2RuUFM0zW0gpVS8WbSglQAJvSVqZCZgqJIBU0cgO8HFe-4ZZJJU7I511sH8PTM6ZBty4Z9B46DM9J8zJndaFEzf6iJoaUIlrdR9dCfNWc6a1EPUrUW4kjOnmHGrfTMkRw_n8P2e7hp_P4-mGwnq5m__IuDfjrjYf4Q1dSyFLffV1oeV_ez8vrG63Eb-OzksI
CitedBy_id crossref_primary_10_1016_j_jnca_2016_08_020
crossref_primary_10_1145_3470642
crossref_primary_10_1007_s00607_017_0557_6
crossref_primary_10_1109_TCSI_2018_2825362
crossref_primary_10_1002_cpe_3488
crossref_primary_10_1002_cpe_3833
crossref_primary_10_1109_TCSS_2021_3059318
crossref_primary_10_1002_cpe_5437
crossref_primary_10_1109_ACCESS_2018_2882455
Cites_doi 10.4218/etrij.08.0208.0196
10.1109/MCOM.2010.5434388
10.1109/TIT.1967.1054010
10.1109/VTCF.2006.176
10.1109/ICCT.2006.341948
10.1109/PROC.1973.9030
10.1109/ICISE.2009.265
10.1007/s10470-011-9764-9
10.1109/26.221067
10.1007/978-3-642-11515-8_26
10.1109/wicom.2011.6036680
10.1109/WCSP.2011.6096781
10.1109/TVLSI.2004.842930
10.1109/SOCDC.2009.5423923
10.1109/SIPS.2009.5336249
10.1002/cpe.1913
10.1109/ICEEE.2006.251908
ContentType Journal Article
Copyright Copyright © 2013 John Wiley & Sons, Ltd.
Copyright_xml – notice: Copyright © 2013 John Wiley & Sons, Ltd.
DBID BSCLL
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1002/cpe.3093
DatabaseName Istex
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList CrossRef

Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1532-0634
EndPage 840
ExternalDocumentID 10_1002_cpe_3093
CPE3093
ark_67375_WNG_7X5XF5KS_8
Genre article
GrantInformation_xml – fundername: National Science Foundation of China
  funderid: 61125201
GroupedDBID .3N
.DC
.GA
.Y3
05W
0R~
10A
1L6
1OC
33P
3SF
3WU
4.4
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5GY
5VS
66C
702
7PT
8-0
8-1
8-3
8-4
8-5
8UM
930
A03
AAESR
AAEVG
AAHQN
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAZKR
ABCQN
ABCUV
ABEML
ABIJN
ACAHQ
ACBWZ
ACCZN
ACPOU
ACRPL
ACSCC
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADMLS
ADNMO
ADOZA
ADXAS
ADZMN
AEIGN
AEIMD
AEUYR
AEYWJ
AFBPY
AFFPM
AFGKR
AFWVQ
AGQPQ
AGYGG
AHBTC
AITYG
AIURR
AJXKR
ALMA_UNASSIGNED_HOLDINGS
ALVPJ
AMBMR
AMYDB
ATUGU
AUFTA
AZBYB
BAFTC
BDRZF
BFHJK
BHBCM
BMNLL
BROTX
BRXPI
BSCLL
BY8
CS3
D-E
D-F
DCZOG
DPXWK
DR2
DRFUL
DRSTM
EBS
EJD
F00
F01
F04
F5P
G-S
G.N
GNP
GODZA
HGLYW
HHY
HZ~
IX1
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
O66
O9-
OIG
P2W
P2X
P4D
PQQKQ
Q.N
Q11
QB0
QRW
R.K
ROL
RX1
SUPJJ
TN5
UB1
V2E
W8V
W99
WBKPD
WIH
WIK
WOHZO
WQJ
WXSBR
WYISQ
WZISG
XG1
XV2
~IA
~WT
AAYXX
CITATION
O8X
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c4023-d89f5dc8240c0797f389781d683a8aab5f6c6e0c47ea78ad7ba150c41b1f0783
IEDL.DBID DRFUL
ISICitedReferencesCount 17
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000331020300012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1532-0626
IngestDate Sun Nov 09 08:58:16 EST 2025
Sat Nov 29 01:41:13 EST 2025
Tue Nov 18 21:16:21 EST 2025
Tue Nov 11 03:12:19 EST 2025
Tue Nov 11 03:33:39 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License http://onlinelibrary.wiley.com/termsAndConditions#vor
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4023-d89f5dc8240c0797f389781d683a8aab5f6c6e0c47ea78ad7ba150c41b1f0783
Notes National Science Foundation of China - No. 61125201
istex:74029AA1651A9C4BEA3F73C2E1521182656D7303
ArticleID:CPE3093
ark:/67375/WNG-7X5XF5KS-8
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 1520948390
PQPubID 23500
PageCount 20
ParticipantIDs proquest_miscellaneous_1520948390
crossref_primary_10_1002_cpe_3093
crossref_citationtrail_10_1002_cpe_3093
wiley_primary_10_1002_cpe_3093_CPE3093
istex_primary_ark_67375_WNG_7X5XF5KS_8
PublicationCentury 2000
PublicationDate 10 March 2014
PublicationDateYYYYMMDD 2014-03-10
PublicationDate_xml – month: 03
  year: 2014
  text: 10 March 2014
  day: 10
PublicationDecade 2010
PublicationTitle Concurrency and computation
PublicationTitleAlternate Concurrency Computat.: Pract. Exper
PublicationYear 2014
Publisher Blackwell Publishing Ltd
Publisher_xml – name: Blackwell Publishing Ltd
References Kim J, Seungheon H, Seungwon C. Implementation of an SDR system using graphics processing unit. IEEE Communications Magazine 2010; 48(3):156-162.
Choi SW, Kang KM, Choi SS. A Two-stage radix-4 Viterbi decoder for multiband OFDM UWB system. ETRI Journal 2008; 30(6):850-852.
Zou D, Dou Y, Xia F. Optimization schemes and performance evaluation of Smith-Waterman algorithm on CPU, GPU and FPGA. Concurrency and Computation-Practice & Experience 2012; 24(14):1625-1644.
Forney GD. The Viterbi algorithm. Proceedings of the IEEE 1973; 61(3):268-278.
Ahn C, Kim J, Ju J, Choi J, Choi B, Choi S. Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system. Analog Integrated Circuits and Signal Processing 2011; 69(2):107-117.
Feygin G, Gulak PG. Architectural tradeoffs for survivor sequence memory management in Viterbi decoders. IEEE Transactions on Communications 1993; 41(3):425-429.
Mesmay FD, Chellappa S, Franchetti F, Markus P. Computer generation of efficient software Viterbi decoders. Lecture Notes in Computer Science 2010; 5952:353-368.
Tessier R, Swaminathan S, Ramaswamy R, Goeckel D, Burleson W. A reconfigurable, power-efficient adaptive Viterbi decoder. IEEE Transactions on Very Large Scale Integration VLSI Systems 2005; 13(4):484-488.
Viterbi AJ. Error bounds for convolutional codes and asymptoticaIly optimum decoding algorithm. IEEE Transactions on Information Theory 1967; IT-13(4):260-269.
1973; 61
2010; 48
1967; IT‐13
2011
1993; 41
2009
2007
2006
2005
2008; 30
2011; 69
2010; 5952
2012; 24
2005; 13
e_1_2_8_17_1
e_1_2_8_18_1
e_1_2_8_19_1
e_1_2_8_13_1
e_1_2_8_24_1
e_1_2_8_14_1
e_1_2_8_15_1
e_1_2_8_16_1
e_1_2_8_3_1
e_1_2_8_2_1
e_1_2_8_5_1
e_1_2_8_4_1
e_1_2_8_7_1
e_1_2_8_6_1
e_1_2_8_9_1
e_1_2_8_8_1
e_1_2_8_20_1
e_1_2_8_10_1
e_1_2_8_21_1
e_1_2_8_11_1
e_1_2_8_22_1
e_1_2_8_12_1
e_1_2_8_23_1
References_xml – reference: Mesmay FD, Chellappa S, Franchetti F, Markus P. Computer generation of efficient software Viterbi decoders. Lecture Notes in Computer Science 2010; 5952:353-368.
– reference: Viterbi AJ. Error bounds for convolutional codes and asymptoticaIly optimum decoding algorithm. IEEE Transactions on Information Theory 1967; IT-13(4):260-269.
– reference: Ahn C, Kim J, Ju J, Choi J, Choi B, Choi S. Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system. Analog Integrated Circuits and Signal Processing 2011; 69(2):107-117.
– reference: Choi SW, Kang KM, Choi SS. A Two-stage radix-4 Viterbi decoder for multiband OFDM UWB system. ETRI Journal 2008; 30(6):850-852.
– reference: Forney GD. The Viterbi algorithm. Proceedings of the IEEE 1973; 61(3):268-278.
– reference: Tessier R, Swaminathan S, Ramaswamy R, Goeckel D, Burleson W. A reconfigurable, power-efficient adaptive Viterbi decoder. IEEE Transactions on Very Large Scale Integration VLSI Systems 2005; 13(4):484-488.
– reference: Feygin G, Gulak PG. Architectural tradeoffs for survivor sequence memory management in Viterbi decoders. IEEE Transactions on Communications 1993; 41(3):425-429.
– reference: Kim J, Seungheon H, Seungwon C. Implementation of an SDR system using graphics processing unit. IEEE Communications Magazine 2010; 48(3):156-162.
– reference: Zou D, Dou Y, Xia F. Optimization schemes and performance evaluation of Smith-Waterman algorithm on CPU, GPU and FPGA. Concurrency and Computation-Practice & Experience 2012; 24(14):1625-1644.
– start-page: 1
  year: 2007
  end-page: 4
– start-page: 185
  year: 2009
  end-page: 190
– volume: 5952
  start-page: 353
  year: 2010
  end-page: 368
  article-title: Computer generation of efficient software Viterbi decoders
  publication-title: Lecture Notes in Computer Science
– volume: IT‐13
  start-page: 260
  issue: 4
  year: 1967
  end-page: 269
  article-title: Error bounds for convolutional codes and asymptoticaIly optimum decoding algorithm
  publication-title: IEEE Transactions on Information Theory
– year: 2005
– volume: 24
  start-page: 1625
  issue: 14
  year: 2012
  end-page: 1644
  article-title: Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA
  publication-title: Concurrency and Computation‐Practice & Experience
– volume: 69
  start-page: 107
  issue: 2
  year: 2011
  end-page: 117
  article-title: Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system
  publication-title: Analog Integrated Circuits and Signal Processing
– volume: 41
  start-page: 425
  issue: 3
  year: 1993
  end-page: 429
  article-title: Architectural tradeoffs for survivor sequence memory management in Viterbi decoders
  publication-title: IEEE Transactions on Communications
– start-page: 121
  year: 2009
  end-page: 124
– start-page: 51
  year: 2006
  end-page: 55
– start-page: 1
  year: 2011
  end-page: 4
– start-page: 1
  year: 2006
  end-page: 4
– start-page: 1
  year: 2006
  end-page: 5
– volume: 30
  start-page: 850
  issue: 6
  year: 2008
  end-page: 852
  article-title: A Two‐stage radix‐4 Viterbi decoder for multiband OFDM UWB system
  publication-title: ETRI Journal
– volume: 13
  start-page: 484
  issue: 4
  year: 2005
  end-page: 488
  article-title: A reconfigurable, power‐efficient adaptive Viterbi decoder
  publication-title: IEEE Transactions on Very Large Scale Integration VLSI Systems
– start-page: 468
  year: 2009
  end-page: 471
– start-page: 1
  year: 2011
  end-page: 6
– start-page: 237
  year: 2007
  end-page: 241
– volume: 61
  start-page: 268
  issue: 3
  year: 1973
  end-page: 278
  article-title: The Viterbi algorithm
  publication-title: Proceedings of the IEEE
– volume: 48
  start-page: 156
  issue: 3
  year: 2010
  end-page: 162
  article-title: Implementation of an SDR system using graphics processing unit
  publication-title: IEEE Communications Magazine
– ident: e_1_2_8_5_1
– ident: e_1_2_8_23_1
  doi: 10.4218/etrij.08.0208.0196
– ident: e_1_2_8_8_1
  doi: 10.1109/MCOM.2010.5434388
– ident: e_1_2_8_2_1
  doi: 10.1109/TIT.1967.1054010
– ident: e_1_2_8_6_1
– ident: e_1_2_8_11_1
– ident: e_1_2_8_16_1
  doi: 10.1109/VTCF.2006.176
– ident: e_1_2_8_17_1
– ident: e_1_2_8_19_1
– ident: e_1_2_8_18_1
  doi: 10.1109/ICCT.2006.341948
– ident: e_1_2_8_3_1
  doi: 10.1109/PROC.1973.9030
– ident: e_1_2_8_13_1
  doi: 10.1109/ICISE.2009.265
– ident: e_1_2_8_9_1
  doi: 10.1007/s10470-011-9764-9
– ident: e_1_2_8_10_1
  doi: 10.1109/26.221067
– ident: e_1_2_8_12_1
  doi: 10.1007/978-3-642-11515-8_26
– ident: e_1_2_8_14_1
  doi: 10.1109/wicom.2011.6036680
– ident: e_1_2_8_22_1
  doi: 10.1109/WCSP.2011.6096781
– ident: e_1_2_8_4_1
  doi: 10.1109/TVLSI.2004.842930
– ident: e_1_2_8_21_1
  doi: 10.1109/SOCDC.2009.5423923
– ident: e_1_2_8_20_1
  doi: 10.1109/SIPS.2009.5336249
– ident: e_1_2_8_24_1
  doi: 10.1002/cpe.1913
– ident: e_1_2_8_7_1
– ident: e_1_2_8_15_1
  doi: 10.1109/ICEEE.2006.251908
SSID ssj0011031
Score 2.1109731
Snippet SUMMARYIn wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX,...
In wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G...
SourceID proquest
crossref
wiley
istex
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 821
SubjectTerms Algorithms
Central processing units
CUDA
Decoding
Field programmable gate arrays
FPGA
GPU
OpenMP
Optimization
Platforms
SDR
SSE
viterbi
Viterbi decoding
Wireless communication
Title Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA
URI https://api.istex.fr/ark:/67375/WNG-7X5XF5KS-8/fulltext.pdf
https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fcpe.3093
https://www.proquest.com/docview/1520948390
Volume 26
WOSCitedRecordID wos000331020300012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVWIB
  databaseName: Wiley Online Library Full Collection 2020
  customDbUrl:
  eissn: 1532-0634
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011031
  issn: 1532-0626
  databaseCode: DRFUL
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://onlinelibrary.wiley.com
  providerName: Wiley-Blackwell
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NbtNAEF5BwoELKQXUAkWLVNELps5mnV0fozQOEiiKaCtyW43Xa7Dq2lGSVhz7CH1GnqQz_glUAgmJi30Z_2hmx_ONZ_Ybxg4xiIahFNbTsp96UmCCAs6lHgjfCqET39qkGjahZjO9WITzpquS9sLU_BDbH27kGdX3mhwc4vXxL9JQu3QfqIz3kHUFLlvZYd2TL9H5520NgQYY1GypwvMRt7fUs744bq-9F4y6pNcf95Dm73i1CjhR739edYc9aWAmH9Xr4il74Ipd1mtHOPDGo58xO6koJDDycCIBz3OX8-yy7Skno_Ey5Rs0uPt5c7ssMxS8pl3LccYTzFwp8nHIv5WrbPP9kqP4eH7-nk_pAEXCo_l09JydRZOz8UevmbvgWcwmB16iwzRIrMZgb30VqhRBjUJcO9QD0ABxkA7t0PlWKgdKQ6JiQFhpZT_up1QVfME6RVm4PcZDF4SxUxrlQglOgoJADsAmFnPjYZzus6NW_8Y2nOQ0GiM3NZuyMKg6Q6rbZ2-3ksuah-MPMu8qE24FYHVBfWsqMF9nU6MWwSIKPp0ajTdrbWzQm6hEAoUrr9amT11BEkGjjzerTPrXp5nxfELnl_8q-Io9RrQlvaoZ8DXrbFZX7oA9stebbL1606zeO1aq84c
linkProvider Wiley-Blackwell
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NbtNAEB6VBAkutAUqCqUsEoILbh1nnV2rpyrEadUQRZCK3Fbr9bpYuHaUphVHHoFn7JN0xj9pK4GExMW-jNfWzo7nm53ZbwDeoRMNAu4ZR_JO4nAPAxRtbeJozzWeJ2PXmLhsNiHGYzmbBZM1OGjOwlT8EKsNN7KM8n9NBk4b0vu3rKFmbvcoj_cA2hxXkd-C9qcv4elolUSgDgYVXarnuAjcG-5Z19tvnr3njdo0sT_vQc27gLX0OOH6f33rBjypgSY7rFbGJqzZ_CmsN00cWG3Tz8AMShIJ9D2MaMCzzGYsPW-qykltrEjYElVur3_9nhcpCl7RueUoZTHGruT7mM7OikW6_H7OULw_Of3IhnTReczCyfDwOUzDwbR_5NSdFxyD8WTXiWWQ-LGR6O6NKwKRIKwRiGx7squl1pGf9EzPuoYLq4XUsYg0AkvDO1EnobzgFrTyIrcvgAXWDyIrJMoFXFuuhfZ5V5vYYHTci5Jt-NAoQJmalZyaY2Sq4lP2FE6doqnbhrcryXnFxPEHmfelDlcCevGDKteEr76Nh0rM_Fnon3xVEgdrlKzQnihJonNbXF6oDtUFcYSNLg5W6vSvb1P9yYDuL_9V8A08Opp-HqnR8fjkFTxG7MWdsjRwB1rLxaV9DQ_N1TK9WOzWS_kG52P3dw
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwEB6VLkJcKE9RnkZCcCE0m3ViW5yq7WZBrVYRtGJvluNHG5Emq-226rE_gd_IL2Gcx0IlkJC4JJeJE3k8mW88428AXqMTFYJGOuB06AIaYYCirHWBikIdRdyEWpum2QSbzfh8LrIN-NCfhWn5IdYbbt4ymv-1N3C7MG7nF2uoXtj3Po93AwY0Fgla5WDvc3p0sE4i-A4GLV1qFIQI3Hvu2TDa6Z-95o0GfmIvr0HN3wFr43HSrf_61rtwpwOaZLddGfdgw1b3Yatv4kA6m34AetKQSKDvIZ4GvCxtSYrTvqrcq43UjqxQ5fbH1fdFXaDghT-3nBfEYOzqfR9R5XG9LFYnpwTFx9nROzL1F1UZkmbT3YdwmE4Oxx-DrvNCoDGeHAWGCxcbzdHd65AJ5hDWMES2CR8prlQeu0QnNtSUWcW4MixXCCw1HeZD5_OCj2Czqiv7GIiwscgt4ygnqLJUMRXTkdJGY3Sc5G4b3vYKkLpjJffNMUrZ8ilHEqdO-qnbhldryUXLxPEHmTeNDtcCavnNV66xWH6dTSWbx_M03v8iOQ7WK1miPfkkiapsfX4mh74uiCJsDHGwRqd_fZscZxN_f_Kvgi_hVraXyoNPs_2ncBuhFw2aysBnsLlantvncFNfrIqz5YtuJf8Ef9H28g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+parallel+implementation+of+three-point+viterbi+decoding+algorithm+on+CPU%2C+GPU%2C+and+FPGA&rft.jtitle=Concurrency+and+computation&rft.au=Li%2C+Rongchun&rft.au=Dou%2C+Yong&rft.au=Zou%2C+Dan&rft.date=2014-03-10&rft.issn=1532-0626&rft.eissn=1532-0634&rft.volume=26&rft.issue=3&rft.spage=821&rft.epage=840&rft_id=info:doi/10.1002%2Fcpe.3093&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1532-0626&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1532-0626&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1532-0626&client=summon