Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse a...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	The Journal of supercomputing Ročník 59; číslo 3; s. 1229 - 1251
Hlavní autori:	Wang, Li, Xue, Jingling, Yang, Xuejun
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Boston Springer US 01.03.2012
Predmet:	Compilers Computer Science Interpreters Processor Architectures Programming Languages Stream register file Stream programming model Stream processor Software pipelining Loop unrolling
ISSN:	0920-8542, 1573-0484
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.
AbstractList	Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level loops, either reuse or concurrency or both may be inadequately exploited. In this paper, we optimize modulo scheduling to maximize stream reuse and improve concurrency for stream-level loops. The key insight is that an unrolled and software-pipelined stream-level loop could be described by a set of reuse equations. Guided by reuse equations, a reuse-aware modulo scheduling algorithm is developed to simultaneously optimize the two performance objectives, reuse, and concurrency, for a loop in a unified framework. Moreover, we describe a code generation algorithm to automatically produce the optimized loop from a given loop. The experimental results obtained on FT64 and by simulation demonstrate the effectiveness of the proposed approach.
Author	Xue, Jingling Wang, Li Yang, Xuejun
Author_xml	– sequence: 1 givenname: Li surname: Wang fullname: Wang, Li email: dragonylffly@163.com organization: School of Computer, National University of Defense Technology – sequence: 2 givenname: Jingling surname: Xue fullname: Xue, Jingling organization: School of Computer Science and Engineering, UNSW – sequence: 3 givenname: Xuejun surname: Yang fullname: Yang, Xuejun organization: School of Computer, National University of Defense Technology
BookMark	eNp9kEtqwzAQhkVpoUnaA3SnC6gdPSzZyxL6gkA26VrI8jhxiKUgOYXkND1LT1aHdN3VDMN8Pz_flFyHGJCQBw6PHMA8Zc6FMAw4MCiEYKcrMuGFkQxUqa7JBCoBrCyUuCXTnLcAoKSRE7Ja7oeu705dWNM-NoddpNlvcFzOlyFS5zcdfiFNeMhIXWh-vn0M_pASBn-kbUw0DwldT_cpesw5pnxHblq3y3j_N2fk8_VlNX9ni-Xbx_x5wbxQMDCNWlaNKWte1hJrbJx0ivtKamhUZSqoUUlsi9o1qL2udI3aGK0LLaWrGyVnhF9yfYo5J2ztPnW9S0fLwZ612IsWO2qxZy32NDLiwuTxN6wx2W08pDDW_Af6BXrparc
Cites_doi	10.1109/PACT.2004.1342560 10.1145/1362622.1362647 10.1145/1542452.1542454 10.1109/MICRO.1995.476842 10.1145/1128022.1128027 10.1145/113445.113449 10.1007/s11227-008-0208-y 10.1007/s11227-008-0186-0 10.1145/1048935.1050187 10.1145/774789.774805 10.1109/PACT.1996.554030 10.1145/1152154.1152164 10.1109/MICRO.2005.32 10.1145/1375657.1375679 10.1145/1839667.1839673 10.1109/ICCD.2002.1106785 10.1145/192724.192731 10.1145/1346281.1346319 10.1145/224538.224542 10.1145/1375581.1375596 10.1007/s11227-008-0192-2 10.1145/1015706.1015800 10.1145/1250662.1250707 10.1145/1250662.1250689
ContentType	Journal Article
Copyright	Springer Science+Business Media, LLC 2010
Copyright_xml	– notice: Springer Science+Business Media, LLC 2010
DBID	AAYXX CITATION
DOI	10.1007/s11227-010-0522-z
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-0484
EndPage	1251
ExternalDocumentID	10_1007_s11227_010_0522_z
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS
ID	FETCH-LOGICAL-c240t-6e639d78b18b3ebeda3a41c9360d49790be43ef5bade6c696be677665633abd43
IEDL.DBID	RSV
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000299509500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0920-8542
IngestDate	Sat Nov 29 06:13:02 EST 2025 Fri Feb 21 02:27:34 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	Stream register file Stream programming model Stream processor Software pipelining Loop unrolling
Language	English
License	http://www.springer.com/tdm
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c240t-6e639d78b18b3ebeda3a41c9360d49790be43ef5bade6c696be677665633abd43
PageCount	23
ParticipantIDs	crossref_primary_10_1007_s11227_010_0522_z springer_journals_10_1007_s11227_010_0522_z
PublicationCentury	2000
PublicationDate	2012-03-01
PublicationDateYYYYMMDD	2012-03-01
PublicationDate_xml	– month: 03 year: 2012 text: 2012-03-01 day: 01
PublicationDecade	2010
PublicationPlace	Boston
PublicationPlace_xml	– name: Boston
PublicationSubtitle	An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle	The Journal of supercomputing
PublicationTitleAbbrev	J Supercomput
PublicationYear	2012
Publisher	Springer US
Publisher_xml	– name: Springer US
References	RauBRIterative modulo scheduling: an algorithm for software pipelining loopsMICRO-27: proceedings of the 27th annual international symposium on microarchitecture1994637410.1145/192724.192731 WolfMELamMSA data locality optimizing algorithmPLDI ’91: proceedings of the 1991 conference on programming language design and implementation1991304410.1145/113445.113449 WangLYangXXueJReuse-aware modulo scheduling for stream processorsDATE ’10: proceedings of the conference on design, automation and test in Europe201011121117 YangXYanXXingZDengYJiangJZhangYA 64-bit stream processor architecture for scientific applicationsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press21021910.1145/1250662.1250689 KudlurMMahlkeSOrchestrating the execution of stream programs on multicore platformsPLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation2008New YorkACM Press11412410.1145/1375581.1375596 DasADallyWJMattsonPCompiling for stream processingPACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques2006New YorkACM Press334210.1145/1152154.1152164 LaveryDMHwuWMWUnrolling-based optimizations for modulo schedulingMICRO-28: proceedings of the 28th annual international symposium on microarchitecture199532733710.1109/MICRO.1995.476842 StotzerEJLeissELModulo scheduling without overlapped lifetimesLCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems2009New YorkACM Press11010.1145/1542452.1542454 MakinoJHirakiKInabaMGRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computingSC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing2007New YorkACM Press11110.1145/1362622.1362647 AMD (2006) AMD FireStream Stream Processor. http://atiamdcom/products/streamprocessor/specshtml BarkerKJDavisKHoisieAKerbysonDJLangMPakinSSanchoJCEntering the petaflop era: the architecture and performance of roadrunnerSC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing2008111 DimitroulakosGKostarasNGalanisMDGoutisCECompiler assisted architectural exploration framework for coarse grained reconfigurable arraysJ Supercomput20094811515110.1007/s11227-008-0208-y WangLYangXXueJDengYYanXTangTNguyenQHOptimizing scientific application loops on stream processorsLCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems2008New YorkACM Press16117010.1145/1375657.1375679 BanakarRSteinkeSLeeBSBalakrishnanMMarwedelPScratchpad memory: design alternative for cache on-chip memory in embedded systemsCODES ’02: proceedings of the tenth international symposium on hardware/software codesign2002New YorkACM Press737810.1145/774789.774805 OwensJDKapasiUJMattsonPTowlesBSerebrinBRixnerSDallyWJMedia processing applications on the imagine stream processorICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors2002WashingtonIEEE Computer Society29530210.1109/ICCD.2002.1106785 LabonteFMattsonPThiesWBuckIKozyrakisCHorowitzMThe stream virtual machinePACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques200426727710.1109/PACT.2004.1342560 YangXDuJYanXDengYMatrix-based streamization approach for improving locality and parallelism on ft64 stream processorJ Supercomput20094717119710.1007/s11227-008-0186-0 LeverichJArakidaHSolomatnikovAFiroozshahianAHorowitzMKozyrakisCComparing memory systems for chip multiprocessorsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press35836810.1145/1250662.1250707 XueJHuangCHReuse-driven tiling for data localityLCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing1997BerlinSpringer1633 Thies W, Karczmarek M, Gordon M, Maze D, Wong J, Ho H, Brown M, Amarasinghe S (2001) StreamIt: a compiler for streaming applications. MIT-LCS Technical Memo TM-622 LiHZhangCLiLRenJTransform coding on programmable stream processorsJ Supercomput200845668710.1007/s11227-008-0192-2 YangXZhangYLuXXueJRogersILiGWangGFangXExploiting the reuse supplied by loop-dependent stream references for stream processorsACM Trans Archit Code Optim201071113510.1145/1839667.1839673 EichenbergerAEDavidsonESAbrahamSGOptimum modulo schedules for minimum register requirementsICS ’95: proceedings of the 9th international conference on supercomputing1995New YorkACM Press314010.1145/224538.224542 DallyWJLabonteFDasAHanrahanPMerrimac: supercomputing with streamsSC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing2003WashingtonIEEE Computer Society354210.1145/1048935.1050187 LlosaJSwing modulo scheduling: a lifetime-sensitive approachPACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques1996WashingtonIEEE Computer Society808610.1109/PACT.1996.554030 NVIDIA (2009) CUDA Architecture Overview. http://developerdownloadnvidiacom/compute/cuda/docs/CUDA_Architecture_Overviewpdf GummarajuJCoburnJTurnerYRosenblumMStreamware: programming general-purpose multicore processors using streamsASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems2008New YorkACM Press29730710.1145/1346281.1346319 GummarajuJRosenblumMStream programming on general-purpose processorsMICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture2005WashingtonIEEE Computer Society343354 CuvilloJZhuWZiangHGaoGFAST: a functionally accurate simulation toolset for the Cyclops64 cellular architectureMoBS ’05: workshop on modeling, benchmarking, and simulation2005New YorkACM Press1120 BuckIFoleyTHornDSugermanJFatahalianKHoustonMHanrahanPBrook for gpus: stream computing on graphics hardwareACM Trans Graph200423377778610.1145/1015706.1015800 WilliamsSShalfJOlikerLKamilSHusbandsPYelickKThe potential of the cell processor for scientific computingCF ’06: proceedings of the 3rd conference on computing frontiers2006New YorkACM Press92010.1145/1128022.1128027 R Banakar (522_CR2) 2002 F Labonte (522_CR13) 2004 WJ Dally (522_CR6) 2003 X Yang (522_CR30) 2009; 47 J Gummaraju (522_CR11) 2008 EJ Stotzer (522_CR22) 2009 J Llosa (522_CR17) 1996 A Das (522_CR7) 2006 AE Eichenberger (522_CR9) 1995 BR Rau (522_CR21) 1994 J Makino (522_CR18) 2007 X Yang (522_CR31) 2010; 7 J Xue (522_CR28) 1997 DM Lavery (522_CR14) 1995 522_CR19 L Wang (522_CR24) 2008 J Gummaraju (522_CR10) 2005 M Kudlur (522_CR12) 2008 KJ Barker (522_CR3) 2008 J Leverich (522_CR15) 2007 J Cuvillo (522_CR5) 2005 522_CR1 I Buck (522_CR4) 2004; 23 H Li (522_CR16) 2008; 45 L Wang (522_CR25) 2010 X Yang (522_CR29) 2007 522_CR23 S Williams (522_CR26) 2006 G Dimitroulakos (522_CR8) 2009; 48 ME Wolf (522_CR27) 1991 JD Owens (522_CR20) 2002
References_xml	– reference: KudlurMMahlkeSOrchestrating the execution of stream programs on multicore platformsPLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation2008New YorkACM Press11412410.1145/1375581.1375596 – reference: WangLYangXXueJReuse-aware modulo scheduling for stream processorsDATE ’10: proceedings of the conference on design, automation and test in Europe201011121117 – reference: LaveryDMHwuWMWUnrolling-based optimizations for modulo schedulingMICRO-28: proceedings of the 28th annual international symposium on microarchitecture199532733710.1109/MICRO.1995.476842 – reference: GummarajuJRosenblumMStream programming on general-purpose processorsMICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture2005WashingtonIEEE Computer Society343354 – reference: LlosaJSwing modulo scheduling: a lifetime-sensitive approachPACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques1996WashingtonIEEE Computer Society808610.1109/PACT.1996.554030 – reference: XueJHuangCHReuse-driven tiling for data localityLCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing1997BerlinSpringer1633 – reference: NVIDIA (2009) CUDA Architecture Overview. http://developerdownloadnvidiacom/compute/cuda/docs/CUDA_Architecture_Overviewpdf – reference: BanakarRSteinkeSLeeBSBalakrishnanMMarwedelPScratchpad memory: design alternative for cache on-chip memory in embedded systemsCODES ’02: proceedings of the tenth international symposium on hardware/software codesign2002New YorkACM Press737810.1145/774789.774805 – reference: DallyWJLabonteFDasAHanrahanPMerrimac: supercomputing with streamsSC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing2003WashingtonIEEE Computer Society354210.1145/1048935.1050187 – reference: BuckIFoleyTHornDSugermanJFatahalianKHoustonMHanrahanPBrook for gpus: stream computing on graphics hardwareACM Trans Graph200423377778610.1145/1015706.1015800 – reference: RauBRIterative modulo scheduling: an algorithm for software pipelining loopsMICRO-27: proceedings of the 27th annual international symposium on microarchitecture1994637410.1145/192724.192731 – reference: YangXYanXXingZDengYJiangJZhangYA 64-bit stream processor architecture for scientific applicationsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press21021910.1145/1250662.1250689 – reference: CuvilloJZhuWZiangHGaoGFAST: a functionally accurate simulation toolset for the Cyclops64 cellular architectureMoBS ’05: workshop on modeling, benchmarking, and simulation2005New YorkACM Press1120 – reference: MakinoJHirakiKInabaMGRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computingSC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing2007New YorkACM Press11110.1145/1362622.1362647 – reference: WolfMELamMSA data locality optimizing algorithmPLDI ’91: proceedings of the 1991 conference on programming language design and implementation1991304410.1145/113445.113449 – reference: WangLYangXXueJDengYYanXTangTNguyenQHOptimizing scientific application loops on stream processorsLCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems2008New YorkACM Press16117010.1145/1375657.1375679 – reference: LiHZhangCLiLRenJTransform coding on programmable stream processorsJ Supercomput200845668710.1007/s11227-008-0192-2 – reference: DimitroulakosGKostarasNGalanisMDGoutisCECompiler assisted architectural exploration framework for coarse grained reconfigurable arraysJ Supercomput20094811515110.1007/s11227-008-0208-y – reference: LabonteFMattsonPThiesWBuckIKozyrakisCHorowitzMThe stream virtual machinePACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques200426727710.1109/PACT.2004.1342560 – reference: LeverichJArakidaHSolomatnikovAFiroozshahianAHorowitzMKozyrakisCComparing memory systems for chip multiprocessorsISCA ’07: proceedings of the 34th annual international symposium on computer architecture2007New YorkACM Press35836810.1145/1250662.1250707 – reference: DasADallyWJMattsonPCompiling for stream processingPACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques2006New YorkACM Press334210.1145/1152154.1152164 – reference: YangXDuJYanXDengYMatrix-based streamization approach for improving locality and parallelism on ft64 stream processorJ Supercomput20094717119710.1007/s11227-008-0186-0 – reference: StotzerEJLeissELModulo scheduling without overlapped lifetimesLCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems2009New YorkACM Press11010.1145/1542452.1542454 – reference: GummarajuJCoburnJTurnerYRosenblumMStreamware: programming general-purpose multicore processors using streamsASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems2008New YorkACM Press29730710.1145/1346281.1346319 – reference: YangXZhangYLuXXueJRogersILiGWangGFangXExploiting the reuse supplied by loop-dependent stream references for stream processorsACM Trans Archit Code Optim201071113510.1145/1839667.1839673 – reference: EichenbergerAEDavidsonESAbrahamSGOptimum modulo schedules for minimum register requirementsICS ’95: proceedings of the 9th international conference on supercomputing1995New YorkACM Press314010.1145/224538.224542 – reference: AMD (2006) AMD FireStream Stream Processor. http://atiamdcom/products/streamprocessor/specshtml – reference: BarkerKJDavisKHoisieAKerbysonDJLangMPakinSSanchoJCEntering the petaflop era: the architecture and performance of roadrunnerSC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing2008111 – reference: Thies W, Karczmarek M, Gordon M, Maze D, Wong J, Ho H, Brown M, Amarasinghe S (2001) StreamIt: a compiler for streaming applications. MIT-LCS Technical Memo TM-622 – reference: OwensJDKapasiUJMattsonPTowlesBSerebrinBRixnerSDallyWJMedia processing applications on the imagine stream processorICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors2002WashingtonIEEE Computer Society29530210.1109/ICCD.2002.1106785 – reference: WilliamsSShalfJOlikerLKamilSHusbandsPYelickKThe potential of the cell processor for scientific computingCF ’06: proceedings of the 3rd conference on computing frontiers2006New YorkACM Press92010.1145/1128022.1128027 – start-page: 267 volume-title: PACT ’04: proceedings of the 13th international conference on parallel architectures and compilation techniques year: 2004 ident: 522_CR13 doi: 10.1109/PACT.2004.1342560 – ident: 522_CR23 – start-page: 1 volume-title: SC ’07: proceedings of the 2007 ACM/IEEE conference on supercomputing year: 2007 ident: 522_CR18 doi: 10.1145/1362622.1362647 – ident: 522_CR19 – start-page: 1 volume-title: LCTES ’09: proceedings of the 2009 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems year: 2009 ident: 522_CR22 doi: 10.1145/1542452.1542454 – start-page: 327 volume-title: MICRO-28: proceedings of the 28th annual international symposium on microarchitecture year: 1995 ident: 522_CR14 doi: 10.1109/MICRO.1995.476842 – ident: 522_CR1 – start-page: 9 volume-title: CF ’06: proceedings of the 3rd conference on computing frontiers year: 2006 ident: 522_CR26 doi: 10.1145/1128022.1128027 – start-page: 30 volume-title: PLDI ’91: proceedings of the 1991 conference on programming language design and implementation year: 1991 ident: 522_CR27 doi: 10.1145/113445.113449 – volume: 48 start-page: 115 year: 2009 ident: 522_CR8 publication-title: J Supercomput doi: 10.1007/s11227-008-0208-y – volume: 47 start-page: 171 year: 2009 ident: 522_CR30 publication-title: J Supercomput doi: 10.1007/s11227-008-0186-0 – start-page: 35 volume-title: SC ’03: proceedings of the 2003 ACM/IEEE conference on supercomputing year: 2003 ident: 522_CR6 doi: 10.1145/1048935.1050187 – start-page: 73 volume-title: CODES ’02: proceedings of the tenth international symposium on hardware/software codesign year: 2002 ident: 522_CR2 doi: 10.1145/774789.774805 – start-page: 11 volume-title: MoBS ’05: workshop on modeling, benchmarking, and simulation year: 2005 ident: 522_CR5 – start-page: 80 volume-title: PACT ’96: proceedings of the 1996 conference on parallel architectures and compilation techniques year: 1996 ident: 522_CR17 doi: 10.1109/PACT.1996.554030 – start-page: 33 volume-title: PACT ’06: proceedings of the 15th international conference on parallel architectures and compilation techniques year: 2006 ident: 522_CR7 doi: 10.1145/1152154.1152164 – start-page: 1112 volume-title: DATE ’10: proceedings of the conference on design, automation and test in Europe year: 2010 ident: 522_CR25 – start-page: 343 volume-title: MICRO 38: proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture year: 2005 ident: 522_CR10 doi: 10.1109/MICRO.2005.32 – start-page: 161 volume-title: LCTES ’08: proceedings of the 2008 ACM SIGPLAN-SIGBED conference on languages, compilers, and tools for embedded systems year: 2008 ident: 522_CR24 doi: 10.1145/1375657.1375679 – start-page: 1 volume-title: SC ’08: proceedings of the 2008 ACM/IEEE conference on supercomputing year: 2008 ident: 522_CR3 – volume: 7 start-page: 1 issue: 11 year: 2010 ident: 522_CR31 publication-title: ACM Trans Archit Code Optim doi: 10.1145/1839667.1839673 – start-page: 295 volume-title: ICCD ’02 proceedings of the 2002 IEEE international conference on computer design: VLSI in computers and processors year: 2002 ident: 522_CR20 doi: 10.1109/ICCD.2002.1106785 – start-page: 63 volume-title: MICRO-27: proceedings of the 27th annual international symposium on microarchitecture year: 1994 ident: 522_CR21 doi: 10.1145/192724.192731 – start-page: 297 volume-title: ASPLOS XIII: proceedings of the 13th international conference on architectural support for programming languages and operating systems year: 2008 ident: 522_CR11 doi: 10.1145/1346281.1346319 – start-page: 31 volume-title: ICS ’95: proceedings of the 9th international conference on supercomputing year: 1995 ident: 522_CR9 doi: 10.1145/224538.224542 – start-page: 114 volume-title: PLDI ’08: proceedings of the 2008 ACM SIGPLAN conference on programming language design and implementation year: 2008 ident: 522_CR12 doi: 10.1145/1375581.1375596 – volume: 45 start-page: 66 year: 2008 ident: 522_CR16 publication-title: J Supercomput doi: 10.1007/s11227-008-0192-2 – volume: 23 start-page: 777 issue: 3 year: 2004 ident: 522_CR4 publication-title: ACM Trans Graph doi: 10.1145/1015706.1015800 – start-page: 16 volume-title: LCPC ’97: proceedings of the 10th workshop on languages and compilers for parallel computing year: 1997 ident: 522_CR28 – start-page: 358 volume-title: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture year: 2007 ident: 522_CR15 doi: 10.1145/1250662.1250707 – start-page: 210 volume-title: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture year: 2007 ident: 522_CR29 doi: 10.1145/1250662.1250689
SSID	ssj0004373
Score	1.8833941
Snippet	Both reuse and concurrency are performance-critical for stream processors. When applying loop unrolling and software pipelining separately to stream-level...
SourceID	crossref springer
SourceType	Index Database Publisher
StartPage	1229
SubjectTerms	Compilers Computer Science Interpreters Processor Architectures Programming Languages
Title	Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
URI	https://link.springer.com/article/10.1007/s11227-010-0522-z
Volume	59
WOSCitedRecordID	wos000299509500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aPXixPrG-yMGTsrC7yWaTo4jFg1TBWnpb8piCYHfL7rZgf42_xV9msg9KQQ96D2GZZDLf7DffDEJXgoZaBcr3Il8yj0rQngDu_Er4ZCJBBtUP_dFjPBjw8Vg8Nzruoq12bynJ6qVeid2CMHRlko68tRnUchNtRa7ZjEvRX0YrMSSpaWVh8yIe0bClMn_aYj0YrTOhVYDpd__1aXtot8GT-La-APtoA9ID1G1nNeDGdQ_R8Mm-DdO3pd0WTzMzf8-wzWttnHFydFxm2BVVwgJwDvMCsEzN16dNlXXVvkl_YIttsdOVyCme1dqCLC-O0Gv_fnj34DUTFTxtI3fpMbCAxMRcBVwRe3xGEkkDLQjzDRWx8BVQApNISQNMM8EUsDhmFvMRIpWh5Bh10iyFE4S5Zo4xBHDDfgNmuLEnK2JNVaCppryHrlvTJrO6cUayapHs7JVYeyXOXsmyh25awyaNDxW_rz790-oztGNBTljXjZ2jTpnP4QJt60X5VuSX1d35Bh6jwhI
linkProvider	Springer Nature
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aBb1Yn1ifOXhSFnY3aXZzFLFUrFWwlt6WPKZQsLuluy3YX-Nv8ZeZ7INS0IPeQ1gmmcw3-803g9AVp76SnnSdpiuYQwUoh0No_Yq7ZChAePkP_X4n6HbDwYC_lDrutKp2ryjJ_KVeit0837dlkpa8NRnUYh1tUDtlx6bor_2lGJIUtDI3eVHYpH5FZf60xWowWmVC8wDTqv_r03bRTokn8W1xAfbQGsT7qF7NasCl6x6g3rN5G8ajhdkWjxM9e0-wyWtNnLFydJwl2BZVwhzwFGYpYBHrr0-TKqu8fZP6wAbbYqsrEWM8KbQFyTQ9RG-t-95d2yknKjjKRO7MYWAAiQ5C6YWSmOPTggjqKU6YqykPuCuBEhg2pdDAFONMAgsCZjAfIUJqSo5QLU5iOEY4VMwyhgB22K_HdKjNyfJAUekpqmjYQNeVaaNJ0TgjWrZItvaKjL0ia69o0UA3lWGj0ofS31ef_Gn1Jdpq9546Ueeh-3iKtg3g8YsasjNUy6YzOEebap6N0ulFfo--AUKjxPY
linkToPdf	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF60inixPrE-9-BJCSbZ7SZ7FLUollq0lt7CPqZQsElp0oL9Nf4Wf5m7TUIp6EG8L8kyu5OZyfd9MwhdcOor6UnXqbuCOVSAcjiE1q-4S_oChDf_od9tBq1W2OvxdjHnNC3Z7iUkmWsabJemOLse6f71Qvjm-b6lTFog11RTs1W0Rk0hYzldL6_dhTCS5BAzNzVSWKd-CWv-9IjlwLSMis6DTaP6721uo60iz8Q3-cXYQSsQ76JqOcMBFy69hzrP5psxHMzMK_Aw0ZP3BJt618QfK1PHWYIt2RKmgMcwSQGLWH99mk2oeVsn9YFNzout3kQM8SjXHCTjdB-9Ne47tw9OMWnBUSaiZw4Dk6joIJReKIk5Vi2IoJ7ihLma8oC7EiiBfl0KDUwxziSwIGAmFyRESE3JAarESQyHCIeKWSQRwA4B9pgOtTlxHigqPUUVDWvosjRzNMobakSL1snWXpGxV2TtFc1q6Ko0clT4Vvr76qM_rT5HG-27RtR8bD0do02TB_k5tewEVbLxBE7Ruppmg3R8Nr9S31-tzdo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimizing+modulo+scheduling+to+achieve+reuse+and%C2%A0concurrency+for+stream+processors&rft.jtitle=The+Journal+of+supercomputing&rft.au=Wang%2C+Li&rft.au=Xue%2C+Jingling&rft.au=Yang%2C+Xuejun&rft.date=2012-03-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=59&rft.issue=3&rft.spage=1229&rft.epage=1251&rft_id=info:doi/10.1007%2Fs11227-010-0522-z&rft.externalDocID=10_1007_s11227_010_0522_z
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon