Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse a...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 59; číslo 3; s. 1229 - 1251
Hlavní autori: Wang, Li, Xue, Jingling, Yang, Xuejun
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Boston Springer US 01.03.2012
Predmet:
ISSN:0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.
AbstractList Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.
Author Xue, Jingling
Wang, Li
Yang, Xuejun
Author_xml – sequence: 1
  givenname: Li
  surname: Wang
  fullname: Wang, Li
  email: dragonylffly@163.com
  organization: School of Computer, National University of Defense Technology
– sequence: 2
  givenname: Jingling
  surname: Xue
  fullname: Xue, Jingling
  organization: School of Computer Science and Engineering, UNSW
– sequence: 3
  givenname: Xuejun
  surname: Yang
  fullname: Yang, Xuejun
  organization: School of Computer, National University of Defense Technology
BookMark eNp9kEtqwzAQhkVpoUnaA3SnC6gdPSzZyxL6gkA26VrI8jhxiKUgOYXkND1LT1aHdN3VDMN8Pz_flFyHGJCQBw6PHMA8Zc6FMAw4MCiEYKcrMuGFkQxUqa7JBCoBrCyUuCXTnLcAoKSRE7Ja7oeu705dWNM-NoddpNlvcFzOlyFS5zcdfiFNeMhIXWh-vn0M_pASBn-kbUw0DwldT_cpesw5pnxHblq3y3j_N2fk8_VlNX9ni-Xbx_x5wbxQMDCNWlaNKWte1hJrbJx0ivtKamhUZSqoUUlsi9o1qL2udI3aGK0LLaWrGyVnhF9yfYo5J2ztPnW9S0fLwZ612IsWO2qxZy32NDLiwuTxN6wx2W08pDDW_Af6BXrparc
Cites_doi 10.1109/PACT.2004.1342560
10.1145/1362622.1362647
10.1145/1542452.1542454
10.1109/MICRO.1995.476842
10.1145/1128022.1128027
10.1145/113445.113449
10.1007/s11227-008-0208-y
10.1007/s11227-008-0186-0
10.1145/1048935.1050187
10.1145/774789.774805
10.1109/PACT.1996.554030
10.1145/1152154.1152164
10.1109/MICRO.2005.32
10.1145/1375657.1375679
10.1145/1839667.1839673
10.1109/ICCD.2002.1106785
10.1145/192724.192731
10.1145/1346281.1346319
10.1145/224538.224542
10.1145/1375581.1375596
10.1007/s11227-008-0192-2
10.1145/1015706.1015800
10.1145/1250662.1250707
10.1145/1250662.1250689
ContentType Journal Article
Copyright Springer Science+Business Media, LLC 2010
Copyright_xml – notice: Springer Science+Business Media, LLC 2010
DBID AAYXX
CITATION
DOI 10.1007/s11227-010-0522-z
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 1251
ExternalDocumentID 10_1007_s11227_010_0522_z
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
ID FETCH-LOGICAL-c240t-6e639d78b18b3ebeda3a41c9360d49790be43ef5bade6c696be677665633abd43
IEDL.DBID RSV
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000299509500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Sat Nov 29 06:13:02 EST 2025
Fri Feb 21 02:27:34 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Stream register file
Stream programming model
Stream processor
Software pipelining
Loop unrolling
Language English
License http://www.springer.com/tdm
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c240t-6e639d78b18b3ebeda3a41c9360d49790be43ef5bade6c696be677665633abd43
PageCount 23
ParticipantIDs crossref_primary_10_1007_s11227_010_0522_z
springer_journals_10_1007_s11227_010_0522_z
PublicationCentury 2000
PublicationDate 2012-03-01
PublicationDateYYYYMMDD 2012-03-01
PublicationDate_xml – month: 03
  year: 2012
  text: 2012-03-01
  day: 01
PublicationDecade 2010
PublicationPlace Boston
PublicationPlace_xml – name: Boston
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2012
Publisher Springer US
Publisher_xml – name: Springer US
References RauBRIterative modulo scheduling: an algorithm for software pipelining loopsMICRO-27: proceedings of the 27th annual international symposium on microarchitecture1994637410.1145/192724.192731
WolfMELamMSA data locality optimizing algorithmPLDI ’91: proceedings of the 1991 conference on programming language design and implementation1991304410.1145/113445.113449
WangLYangXXueJReuse-aware modulo scheduling for stream processorsDATE ’10: proceedings of the conference on design, automation and test in Europe201011121117
YangXYanXXingZDengYJiangJZhangYA 64-bit stream processor architecture for scientific applicationsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press21021910.1145/1250662.1250689
KudlurMMahlkeSOrchestrating the execution of stream programs on multicore platformsPLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation2008New YorkACM Press11412410.1145/1375581.1375596
DasADallyWJMattsonPCompiling for stream processingPACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques2006New YorkACM Press334210.1145/1152154.1152164
LaveryDMHwuWMWUnrolling-based optimizations for modulo schedulingMICRO-28: proceedings of the 28th annual international symposium on microarchitecture199532733710.1109/MICRO.1995.476842
StotzerEJLeissELModulo scheduling without overlapped lifetimesLCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems2009New YorkACM Press11010.1145/1542452.1542454
MakinoJHirakiKInabaMGRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computingSC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing2007New YorkACM Press11110.1145/1362622.1362647
AMD (2006) AMD FireStream Stream Processor. http://atiamdcom/products/streamprocessor/specshtml
BarkerKJDavisKHoisieAKerbysonDJLangMPakinSSanchoJCEntering the petaflop era: the architecture and performance of roadrunnerSC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing2008111
DimitroulakosGKostarasNGalanisMDGoutisCECompiler assisted architectural exploration framework for coarse grained reconfigurable arraysJ Supercomput20094811515110.1007/s11227-008-0208-y
WangLYangXXueJDengYYanXTangTNguyenQHOptimizing scientific application loops on stream processorsLCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems2008New YorkACM Press16117010.1145/1375657.1375679
BanakarRSteinkeSLeeBSBalakrishnanMMarwedelPScratchpad memory: design alternative for cache on-chip memory in embedded systemsCODES ’02: proceedings of the tenth international symposium on hardware/software codesign2002New YorkACM Press737810.1145/774789.774805
OwensJDKapasiUJMattsonPTowlesBSerebrinBRixnerSDallyWJMedia processing applications on the imagine stream processorICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors2002WashingtonIEEE Computer Society29530210.1109/ICCD.2002.1106785
LabonteFMattsonPThiesWBuckIKozyrakisCHorowitzMThe stream virtual machinePACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques200426727710.1109/PACT.2004.1342560
YangXDuJYanXDengYMatrix-based streamization approach for improving locality and parallelism on ft64 stream processorJ Supercomput20094717119710.1007/s11227-008-0186-0
LeverichJArakidaHSolomatnikovAFiroozshahianAHorowitzMKozyrakisCComparing memory systems for chip multiprocessorsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press35836810.1145/1250662.1250707
XueJHuangCHReuse-driven tiling for data localityLCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing1997BerlinSpringer1633
Thies W, Karczmarek M, Gordon M, Maze D, Wong J, Ho H, Brown M, Amarasinghe S (2001) StreamIt: a compiler for streaming applications. MIT-LCS Technical Memo TM-622
LiHZhangCLiLRenJTransform coding on programmable stream processorsJ Supercomput200845668710.1007/s11227-008-0192-2
YangXZhangYLuXXueJRogersILiGWangGFangXExploiting the reuse supplied by loop-dependent stream references for stream processorsACM Trans Archit Code Optim201071113510.1145/1839667.1839673
EichenbergerAEDavidsonESAbrahamSGOptimum modulo schedules for minimum register requirementsICS ’95: proceedings of the 9th international conference on supercomputing1995New YorkACM Press314010.1145/224538.224542
DallyWJLabonteFDasAHanrahanPMerrimac: supercomputing with streamsSC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing2003WashingtonIEEE Computer Society354210.1145/1048935.1050187
LlosaJSwing modulo scheduling: a lifetime-sensitive approachPACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques1996WashingtonIEEE Computer Society808610.1109/PACT.1996.554030
NVIDIA (2009) CUDA Architecture Overview. http://developerdownloadnvidiacom/compute/cuda/docs/CUDA_Architecture_Overviewpdf
GummarajuJCoburnJTurnerYRosenblumMStreamware: programming general-purpose multicore processors using streamsASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems2008New YorkACM Press29730710.1145/1346281.1346319
GummarajuJRosenblumMStream programming on general-purpose processorsMICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture2005WashingtonIEEE Computer Society343354
CuvilloJZhuWZiangHGaoGFAST: a functionally accurate simulation toolset for the Cyclops64 cellular architectureMoBS ’05: workshop on modeling, benchmarking, and simulation2005New YorkACM Press1120
BuckIFoleyTHornDSugermanJFatahalianKHoustonMHanrahanPBrook for gpus: stream computing on graphics hardwareACM Trans Graph200423377778610.1145/1015706.1015800
WilliamsSShalfJOlikerLKamilSHusbandsPYelickKThe potential of the cell processor for scientific computingCF ’06: proceedings of the 3rd conference on computing frontiers2006New YorkACM Press92010.1145/1128022.1128027
R Banakar (522_CR2) 2002
F Labonte (522_CR13) 2004
WJ Dally (522_CR6) 2003
X Yang (522_CR30) 2009; 47
J Gummaraju (522_CR11) 2008
EJ Stotzer (522_CR22) 2009
J Llosa (522_CR17) 1996
A Das (522_CR7) 2006
AE Eichenberger (522_CR9) 1995
BR Rau (522_CR21) 1994
J Makino (522_CR18) 2007
X Yang (522_CR31) 2010; 7
J Xue (522_CR28) 1997
DM Lavery (522_CR14) 1995
522_CR19
L Wang (522_CR24) 2008
J Gummaraju (522_CR10) 2005
M Kudlur (522_CR12) 2008
KJ Barker (522_CR3) 2008
J Leverich (522_CR15) 2007
J Cuvillo (522_CR5) 2005
522_CR1
I Buck (522_CR4) 2004; 23
H Li (522_CR16) 2008; 45
L Wang (522_CR25) 2010
X Yang (522_CR29) 2007
522_CR23
S Williams (522_CR26) 2006
G Dimitroulakos (522_CR8) 2009; 48
ME Wolf (522_CR27) 1991
JD Owens (522_CR20) 2002
References_xml – reference: KudlurMMahlkeSOrchestrating the execution of stream programs on multicore platformsPLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation2008New YorkACM Press11412410.1145/1375581.1375596
– reference: WangLYangXXueJReuse-aware modulo scheduling for stream processorsDATE ’10: proceedings of the conference on design, automation and test in Europe201011121117
– reference: LaveryDMHwuWMWUnrolling-based optimizations for modulo schedulingMICRO-28: proceedings of the 28th annual international symposium on microarchitecture199532733710.1109/MICRO.1995.476842
– reference: GummarajuJRosenblumMStream programming on general-purpose processorsMICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture2005WashingtonIEEE Computer Society343354
– reference: LlosaJSwing modulo scheduling: a lifetime-sensitive approachPACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques1996WashingtonIEEE Computer Society808610.1109/PACT.1996.554030
– reference: XueJHuangCHReuse-driven tiling for data localityLCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing1997BerlinSpringer1633
– reference: NVIDIA (2009) CUDA Architecture Overview. http://developerdownloadnvidiacom/compute/cuda/docs/CUDA_Architecture_Overviewpdf
– reference: BanakarRSteinkeSLeeBSBalakrishnanMMarwedelPScratchpad memory: design alternative for cache on-chip memory in embedded systemsCODES ’02: proceedings of the tenth international symposium on hardware/software codesign2002New YorkACM Press737810.1145/774789.774805
– reference: DallyWJLabonteFDasAHanrahanPMerrimac: supercomputing with streamsSC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing2003WashingtonIEEE Computer Society354210.1145/1048935.1050187
– reference: BuckIFoleyTHornDSugermanJFatahalianKHoustonMHanrahanPBrook for gpus: stream computing on graphics hardwareACM Trans Graph200423377778610.1145/1015706.1015800
– reference: RauBRIterative modulo scheduling: an algorithm for software pipelining loopsMICRO-27: proceedings of the 27th annual international symposium on microarchitecture1994637410.1145/192724.192731
– reference: YangXYanXXingZDengYJiangJZhangYA 64-bit stream processor architecture for scientific applicationsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press21021910.1145/1250662.1250689
– reference: CuvilloJZhuWZiangHGaoGFAST: a functionally accurate simulation toolset for the Cyclops64 cellular architectureMoBS ’05: workshop on modeling, benchmarking, and simulation2005New YorkACM Press1120
– reference: MakinoJHirakiKInabaMGRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computingSC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing2007New YorkACM Press11110.1145/1362622.1362647
– reference: WolfMELamMSA data locality optimizing algorithmPLDI ’91: proceedings of the 1991 conference on programming language design and implementation1991304410.1145/113445.113449
– reference: WangLYangXXueJDengYYanXTangTNguyenQHOptimizing scientific application loops on stream processorsLCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems2008New YorkACM Press16117010.1145/1375657.1375679
– reference: LiHZhangCLiLRenJTransform coding on programmable stream processorsJ Supercomput200845668710.1007/s11227-008-0192-2
– reference: DimitroulakosGKostarasNGalanisMDGoutisCECompiler assisted architectural exploration framework for coarse grained reconfigurable arraysJ Supercomput20094811515110.1007/s11227-008-0208-y
– reference: LabonteFMattsonPThiesWBuckIKozyrakisCHorowitzMThe stream virtual machinePACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques200426727710.1109/PACT.2004.1342560
– reference: LeverichJArakidaHSolomatnikovAFiroozshahianAHorowitzMKozyrakisCComparing memory systems for chip multiprocessorsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press35836810.1145/1250662.1250707
– reference: DasADallyWJMattsonPCompiling for stream processingPACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques2006New YorkACM Press334210.1145/1152154.1152164
– reference: YangXDuJYanXDengYMatrix-based streamization approach for improving locality and parallelism on ft64 stream processorJ Supercomput20094717119710.1007/s11227-008-0186-0
– reference: StotzerEJLeissELModulo scheduling without overlapped lifetimesLCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems2009New YorkACM Press11010.1145/1542452.1542454
– reference: GummarajuJCoburnJTurnerYRosenblumMStreamware: programming general-purpose multicore processors using streamsASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems2008New YorkACM Press29730710.1145/1346281.1346319
– reference: YangXZhangYLuXXueJRogersILiGWangGFangXExploiting the reuse supplied by loop-dependent stream references for stream processorsACM Trans Archit Code Optim201071113510.1145/1839667.1839673
– reference: EichenbergerAEDavidsonESAbrahamSGOptimum modulo schedules for minimum register requirementsICS ’95: proceedings of the 9th international conference on supercomputing1995New YorkACM Press314010.1145/224538.224542
– reference: AMD (2006) AMD FireStream Stream Processor. http://atiamdcom/products/streamprocessor/specshtml
– reference: BarkerKJDavisKHoisieAKerbysonDJLangMPakinSSanchoJCEntering the petaflop era: the architecture and performance of roadrunnerSC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing2008111
– reference: Thies W, Karczmarek M, Gordon M, Maze D, Wong J, Ho H, Brown M, Amarasinghe S (2001) StreamIt: a compiler for streaming applications. MIT-LCS Technical Memo TM-622
– reference: OwensJDKapasiUJMattsonPTowlesBSerebrinBRixnerSDallyWJMedia processing applications on the imagine stream processorICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors2002WashingtonIEEE Computer Society29530210.1109/ICCD.2002.1106785
– reference: WilliamsSShalfJOlikerLKamilSHusbandsPYelickKThe potential of the cell processor for scientific computingCF ’06: proceedings of the 3rd conference on computing frontiers2006New YorkACM Press92010.1145/1128022.1128027
– start-page: 267
  volume-title: PACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques
  year: 2004
  ident: 522_CR13
  doi: 10.1109/PACT.2004.1342560
– ident: 522_CR23
– start-page: 1
  volume-title: SC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing
  year: 2007
  ident: 522_CR18
  doi: 10.1145/1362622.1362647
– ident: 522_CR19
– start-page: 1
  volume-title: LCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems
  year: 2009
  ident: 522_CR22
  doi: 10.1145/1542452.1542454
– start-page: 327
  volume-title: MICRO-28: proceedings of the 28th annual international symposium on microarchitecture
  year: 1995
  ident: 522_CR14
  doi: 10.1109/MICRO.1995.476842
– ident: 522_CR1
– start-page: 9
  volume-title: CF ’06: proceedings of the 3rd conference on computing frontiers
  year: 2006
  ident: 522_CR26
  doi: 10.1145/1128022.1128027
– start-page: 30
  volume-title: PLDI ’91: proceedings of the 1991 conference on programming language design and implementation
  year: 1991
  ident: 522_CR27
  doi: 10.1145/113445.113449
– volume: 48
  start-page: 115
  year: 2009
  ident: 522_CR8
  publication-title: J Supercomput
  doi: 10.1007/s11227-008-0208-y
– volume: 47
  start-page: 171
  year: 2009
  ident: 522_CR30
  publication-title: J Supercomput
  doi: 10.1007/s11227-008-0186-0
– start-page: 35
  volume-title: SC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing
  year: 2003
  ident: 522_CR6
  doi: 10.1145/1048935.1050187
– start-page: 73
  volume-title: CODES ’02: proceedings of the tenth international symposium on hardware/software codesign
  year: 2002
  ident: 522_CR2
  doi: 10.1145/774789.774805
– start-page: 11
  volume-title: MoBS ’05: workshop on modeling, benchmarking, and simulation
  year: 2005
  ident: 522_CR5
– start-page: 80
  volume-title: PACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques
  year: 1996
  ident: 522_CR17
  doi: 10.1109/PACT.1996.554030
– start-page: 33
  volume-title: PACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques
  year: 2006
  ident: 522_CR7
  doi: 10.1145/1152154.1152164
– start-page: 1112
  volume-title: DATE ’10: proceedings of the conference on design, automation and test in Europe
  year: 2010
  ident: 522_CR25
– start-page: 343
  volume-title: MICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture
  year: 2005
  ident: 522_CR10
  doi: 10.1109/MICRO.2005.32
– start-page: 161
  volume-title: LCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems
  year: 2008
  ident: 522_CR24
  doi: 10.1145/1375657.1375679
– start-page: 1
  volume-title: SC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing
  year: 2008
  ident: 522_CR3
– volume: 7
  start-page: 1
  issue: 11
  year: 2010
  ident: 522_CR31
  publication-title: ACM Trans Archit Code Optim
  doi: 10.1145/1839667.1839673
– start-page: 295
  volume-title: ICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors
  year: 2002
  ident: 522_CR20
  doi: 10.1109/ICCD.2002.1106785
– start-page: 63
  volume-title: MICRO-27: proceedings of the 27th annual international symposium on microarchitecture
  year: 1994
  ident: 522_CR21
  doi: 10.1145/192724.192731
– start-page: 297
  volume-title: ASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems
  year: 2008
  ident: 522_CR11
  doi: 10.1145/1346281.1346319
– start-page: 31
  volume-title: ICS ’95: proceedings of the 9th international conference on supercomputing
  year: 1995
  ident: 522_CR9
  doi: 10.1145/224538.224542
– start-page: 114
  volume-title: PLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation
  year: 2008
  ident: 522_CR12
  doi: 10.1145/1375581.1375596
– volume: 45
  start-page: 66
  year: 2008
  ident: 522_CR16
  publication-title: J Supercomput
  doi: 10.1007/s11227-008-0192-2
– volume: 23
  start-page: 777
  issue: 3
  year: 2004
  ident: 522_CR4
  publication-title: ACM Trans Graph
  doi: 10.1145/1015706.1015800
– start-page: 16
  volume-title: LCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing
  year: 1997
  ident: 522_CR28
– start-page: 358
  volume-title: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture
  year: 2007
  ident: 522_CR15
  doi: 10.1145/1250662.1250707
– start-page: 210
  volume-title: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture
  year: 2007
  ident: 522_CR29
  doi: 10.1145/1250662.1250689
SSID ssj0004373
Score 1.8833941
Snippet Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level...
SourceID crossref
springer
SourceType Index Database
Publisher
StartPage 1229
SubjectTerms Compilers
Computer Science
Interpreters
Processor Architectures
Programming Languages
Title Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
URI https://link.springer.com/article/10.1007/s11227-010-0522-z
Volume 59
WOSCitedRecordID wos000299509500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aPXixPrG-yMGTsrC7yWaTo4jFg1TBWnpb8piCYHfL7rZgf42_xV9msg9KQQ96D2GZZDLf7DffDEJXgoZaBcr3Il8yj0rQngDu_Er4ZCJBBtUP_dFjPBjw8Vg8Nzruoq12bynJ6qVeid2CMHRlko68tRnUchNtRa7ZjEvRX0YrMSSpaWVh8yIe0bClMn_aYj0YrTOhVYDpd__1aXtot8GT-La-APtoA9ID1G1nNeDGdQ_R8Mm-DdO3pd0WTzMzf8-wzWttnHFydFxm2BVVwgJwDvMCsEzN16dNlXXVvkl_YIttsdOVyCme1dqCLC-O0Gv_fnj34DUTFTxtI3fpMbCAxMRcBVwRe3xGEkkDLQjzDRWx8BVQApNISQNMM8EUsDhmFvMRIpWh5Bh10iyFE4S5Zo4xBHDDfgNmuLEnK2JNVaCppryHrlvTJrO6cUayapHs7JVYeyXOXsmyh25awyaNDxW_rz790-oztGNBTljXjZ2jTpnP4QJt60X5VuSX1d35Bh6jwhI
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aBb1Yn1ifOXhSFnY3aXZzFLFUrFWwlt6WPKZQsLuluy3YX-Nv8ZeZ7INS0IPeQ1gmmcw3-803g9AVp76SnnSdpiuYQwUoh0No_Yq7ZChAePkP_X4n6HbDwYC_lDrutKp2ryjJ_KVeit0837dlkpa8NRnUYh1tUDtlx6bor_2lGJIUtDI3eVHYpH5FZf60xWowWmVC8wDTqv_r03bRTokn8W1xAfbQGsT7qF7NasCl6x6g3rN5G8ajhdkWjxM9e0-wyWtNnLFydJwl2BZVwhzwFGYpYBHrr0-TKqu8fZP6wAbbYqsrEWM8KbQFyTQ9RG-t-95d2yknKjjKRO7MYWAAiQ5C6YWSmOPTggjqKU6YqykPuCuBEhg2pdDAFONMAgsCZjAfIUJqSo5QLU5iOEY4VMwyhgB22K_HdKjNyfJAUekpqmjYQNeVaaNJ0TgjWrZItvaKjL0ia69o0UA3lWGj0ofS31ef_Gn1Jdpq9546Ueeh-3iKtg3g8YsasjNUy6YzOEebap6N0ulFfo--AUKjxPY
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF60inixPrE-9-BJCSbZ7SZ7FLUollq0lt7CPqZQsElp0oL9Nf4Wf5m7TUIp6EG8L8kyu5OZyfd9MwhdcOor6UnXqbuCOVSAcjiE1q-4S_oChDf_od9tBq1W2OvxdjHnNC3Z7iUkmWsabJemOLse6f71Qvjm-b6lTFog11RTs1W0Rk0hYzldL6_dhTCS5BAzNzVSWKd-CWv-9IjlwLSMis6DTaP6721uo60iz8Q3-cXYQSsQ76JqOcMBFy69hzrP5psxHMzMK_Aw0ZP3BJt618QfK1PHWYIt2RKmgMcwSQGLWH99mk2oeVsn9YFNzout3kQM8SjXHCTjdB-9Ne47tw9OMWnBUSaiZw4Dk6joIJReKIk5Vi2IoJ7ihLma8oC7EiiBfl0KDUwxziSwIGAmFyRESE3JAarESQyHCIeKWSQRwA4B9pgOtTlxHigqPUUVDWvosjRzNMobakSL1snWXpGxV2TtFc1q6Ko0clT4Vvr76qM_rT5HG-27RtR8bD0do02TB_k5tewEVbLxBE7Ruppmg3R8Nr9S31-tzdo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimizing+modulo+scheduling+to+achieve+reuse+and%C2%A0concurrency+for+stream+processors&rft.jtitle=The+Journal+of+supercomputing&rft.au=Wang%2C+Li&rft.au=Xue%2C+Jingling&rft.au=Yang%2C+Xuejun&rft.date=2012-03-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=59&rft.issue=3&rft.spage=1229&rft.epage=1251&rft_id=info:doi/10.1007%2Fs11227-010-0522-z&rft.externalDocID=10_1007_s11227_010_0522_z
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon