Design space exploration of hardware task superscalar architecture

For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-...

Full description

Saved in:

Bibliographic Details
Published in:	The Journal of supercomputing Vol. 71; no. 9; pp. 3567 - 3592
Main Authors:	Yazdanpanah, Fahimeh, Alaei, Mohammad
Format:	Journal Article
Language:	English
Published:	New York Springer US 01.09.2015
Subjects:	Compilers Computer Science Interpreters Processor Architectures Programming Languages Task scheduling Task parallelism Task superscalar OmpSs
ISSN:	0920-8542, 1573-0484
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.
AbstractList	For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.
Author	Alaei, Mohammad Yazdanpanah, Fahimeh
Author_xml	– sequence: 1 givenname: Fahimeh surname: Yazdanpanah fullname: Yazdanpanah, Fahimeh email: yazdanpanah@uk.ac.ir organization: Computer Engineering Department, Faculty of Engineering, Shahid Bahonar University of Kerman – sequence: 2 givenname: Mohammad surname: Alaei fullname: Alaei, Mohammad organization: Computer Engineering Department, Faculty of Engineering, Shahid Bahonar University of Kerman
BookMark	eNp9kMtOwzAQRS1UJNrCB7DzDxj8TJwllKdUiQ2srUkyblNCEtmOgL8npaxYdHU394zmngWZdX2HhFwKfiU4z6-jEFLmjAvDhNYFEydkLkyuGNdWz8icF5Iza7Q8I4sYd5xzrXI1J7d3GJtNR-MAFVL8Gto-QGr6jvaebiHUnxCQJojvNI4DhlhBC4FCqLZNwiqNAc_JqYc24sVfLsnbw_3r6omtXx6fVzdrViljEyu1UKXymdCq9kVteQm1lxYKD8b4UstcIHoOU8UXmcZCQZblpa1RojFg1ZLkh7tV6GMM6F3VpN9fU4CmdYK7vQp3UOEmFW6vwomJFP_IITQfEL6PMvLAxKnbbTC4XT-Gbhp4BPoBfPJ0RA
CitedBy_id	crossref_primary_10_1016_j_parco_2024_103084 crossref_primary_10_1002_cpe_8318
Cites_doi	10.1145/993396.993404 10.1109/2.214440 10.1016/j.procs.2013.05.197 10.1049/ip-cds:20040434 10.1145/109625.109636 10.1007/978-3-642-19448-1_9 10.1016/S0375-9601(02)01365-8 10.1109/MM.2008.31 10.1016/j.future.2014.12.010 10.1145/2133173.2133182 10.1109/TVLSI.2009.2014068 10.1109/DSD.2008.45 10.1109/DATE.2007.364666 10.1109/ISIE.1999.801754 10.1587/elex.5.296 10.1109/CLUSTR.2008.4663765 10.1145/951710.951722 10.1145/78973.78978 10.1109/SUPERC.1992.236678 10.1109/DSD.2011.62 10.1142/S0129626411000151 10.1109/MICRO.2010.13 10.1007/978-3-642-23400-2_52 10.1145/291889.291893 10.1007/978-3-540-92990-1_12 10.1145/1687399.1687508 10.1109/DSD.2010.63 10.1109/SC.2006.17 10.1145/1941553.1941563 10.1145/1250662.1250683 10.1109/TPDS.2013.125
ContentType	Journal Article
Copyright	Springer Science+Business Media New York 2015
Copyright_xml	– notice: Springer Science+Business Media New York 2015
DBID	AAYXX CITATION
DOI	10.1007/s11227-015-1449-1
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-0484
EndPage	3592
ExternalDocumentID	10_1007_s11227_015_1449_1
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS
ID	FETCH-LOGICAL-c358t-b413b3f6143df9d80badf28a9fa55fb4271eef0a3f6f964e93a667b8de2e55a83
IEDL.DBID	RSV
ISICitedReferencesCount	3
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000360390700018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0920-8542
IngestDate	Sat Nov 29 06:13:07 EST 2025 Tue Nov 18 21:40:01 EST 2025 Fri Feb 21 02:27:41 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	9
Keywords	Task scheduling Task parallelism Task superscalar OmpSs
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c358t-b413b3f6143df9d80badf28a9fa55fb4271eef0a3f6f964e93a667b8de2e55a83
PageCount	26
ParticipantIDs	crossref_citationtrail_10_1007_s11227_015_1449_1 crossref_primary_10_1007_s11227_015_1449_1 springer_journals_10_1007_s11227_015_1449_1
PublicationCentury	2000
PublicationDate	2015-09-01
PublicationDateYYYYMMDD	2015-09-01
PublicationDate_xml	– month: 09 year: 2015 text: 2015-09-01 day: 01
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationSubtitle	An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle	The Journal of supercomputing
PublicationTitleAbbrev	J Supercomput
PublicationYear	2015
Publisher	Springer US
Publisher_xml	– name: Springer US
References	KishLBEnd of Moore’s law: thermal (noise) death of integration in micro and nano electronicsPhys Lett A200230514414910.1016/S0375-9601(02)01365-8 Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC) DuranAAyguadeEBadiaRMLabartaJMartinellLMartorellXPlanasJOmpss: a proposal for programming heterogeneous multi-core architecturesParallel Process Lett2011212173193281200010.1142/S0129626411000151 Bsc application repository, bar (2014). In: Barcelona Supercomputing Center (BSC). https://pm.bsc.es/projects/bar. Accessed 06 Feb 2014 NogueraJBadiaRMMultitasking on reconfigurable architectures: microarchitecture support and dynamic schedulingACM Trans Embedded Comput Syst20043238540610.1145/993396.993404 Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566 Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68 Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509 Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151 ParkSA hardware operating system kernel for multi processorsIEICE Electron Express20085929630210.1587/elex.5.296 Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157 Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83 Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152 Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105 KalraRLyseckyRConfiguration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systemsIEEE Trans Very Large Scale Integr Sys201018467167410.1109/TVLSI.2009.2014068 Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256 Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173 LindholmENickollsJObermanSMontrymJNVIDIA Tesla: a unified graphics and computing architectureIEEE Micro2008282395510.1109/MM.2008.31 RinardMCScalesDJLamMSJade: a high-level, machine-independent language for parallel programmingComputer1993266283810.1109/2.214440 Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647 Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354 Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011) Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York KishLBMoore’s law and the energy requirement of computing versus performanceIEE Proc Circuits Dev Syst20041512190194195509310.1049/ip-cds:20040434 Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100 Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst BellensPPerezJMCabarcasFRamirezABadiaRMLabartaJCellSs: scheduling techniques to better exploit memory hierarchySci Program2009171–27795 Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594 Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS) Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11 Openmp application program interface, version 4.0 (2013). www.openmp.org/. Accessed 06 Feb 2014 Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48 PearsonPKFast hashing of variable-length text stringsCommun ACM199033667768010.1145/78973.78978 Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445 Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR) Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20 RinardMCLamMSThe design, implementation, and evaluation of JadeACM Trans Program Lang Syst (TPLS)199820348354510.1145/291889.291893 S Park (1449_CR26) 2008; 5 P Bellens (1449_CR3) 2009; 17 LB Kish (1449_CR15) 2002; 305 MC Rinard (1449_CR31) 1993; 26 1449_CR25 MC Rinard (1449_CR29) 1998; 20 1449_CR28 1449_CR21 1449_CR20 1449_CR23 1449_CR22 E Lindholm (1449_CR19) 2008; 28 1449_CR1 1449_CR2 1449_CR5 1449_CR4 PK Pearson (1449_CR27) 1990; 33 1449_CR36 1449_CR13 1449_CR35 A Duran (1449_CR8) 2011; 21 LB Kish (1449_CR16) 2004; 151 1449_CR37 1449_CR18 1449_CR17 J Noguera (1449_CR24) 2004; 3 1449_CR7 1449_CR6 1449_CR9 1449_CR30 1449_CR10 1449_CR32 1449_CR12 R Kalra (1449_CR14) 2010; 18 1449_CR34 1449_CR11 1449_CR33
References_xml	– reference: PearsonPKFast hashing of variable-length text stringsCommun ACM199033667768010.1145/78973.78978 – reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509 – reference: Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157 – reference: Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst – reference: KishLBMoore’s law and the energy requirement of computing versus performanceIEE Proc Circuits Dev Syst20041512190194195509310.1049/ip-cds:20040434 – reference: Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105 – reference: BellensPPerezJMCabarcasFRamirezABadiaRMLabartaJCellSs: scheduling techniques to better exploit memory hierarchySci Program2009171–27795 – reference: Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354 – reference: Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173 – reference: RinardMCLamMSThe design, implementation, and evaluation of JadeACM Trans Program Lang Syst (TPLS)199820348354510.1145/291889.291893 – reference: Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83 – reference: Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York – reference: DuranAAyguadeEBadiaRMLabartaJMartinellLMartorellXPlanasJOmpss: a proposal for programming heterogeneous multi-core architecturesParallel Process Lett2011212173193281200010.1142/S0129626411000151 – reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS) – reference: Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11 – reference: NogueraJBadiaRMMultitasking on reconfigurable architectures: microarchitecture support and dynamic schedulingACM Trans Embedded Comput Syst20043238540610.1145/993396.993404 – reference: Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011) – reference: Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445 – reference: Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100 – reference: Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20 – reference: RinardMCScalesDJLamMSJade: a high-level, machine-independent language for parallel programmingComputer1993266283810.1109/2.214440 – reference: Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151 – reference: Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594 – reference: KalraRLyseckyRConfiguration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systemsIEEE Trans Very Large Scale Integr Sys201018467167410.1109/TVLSI.2009.2014068 – reference: Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566 – reference: Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR) – reference: ParkSA hardware operating system kernel for multi processorsIEICE Electron Express20085929630210.1587/elex.5.296 – reference: Bsc application repository, bar (2014). In: Barcelona Supercomputing Center (BSC). https://pm.bsc.es/projects/bar. Accessed 06 Feb 2014 – reference: Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152 – reference: LindholmENickollsJObermanSMontrymJNVIDIA Tesla: a unified graphics and computing architectureIEEE Micro2008282395510.1109/MM.2008.31 – reference: Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647 – reference: Openmp application program interface, version 4.0 (2013). www.openmp.org/. Accessed 06 Feb 2014 – reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC) – reference: Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68 – reference: Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256 – reference: Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48 – reference: KishLBEnd of Moore’s law: thermal (noise) death of integration in micro and nano electronicsPhys Lett A200230514414910.1016/S0375-9601(02)01365-8 – volume: 3 start-page: 385 issue: 2 year: 2004 ident: 1449_CR24 publication-title: ACM Trans Embedded Comput Syst doi: 10.1145/993396.993404 – volume: 26 start-page: 28 issue: 6 year: 1993 ident: 1449_CR31 publication-title: Computer doi: 10.1109/2.214440 – ident: 1449_CR36 doi: 10.1016/j.procs.2013.05.197 – volume: 151 start-page: 190 issue: 2 year: 2004 ident: 1449_CR16 publication-title: IEE Proc Circuits Dev Syst doi: 10.1049/ip-cds:20040434 – ident: 1449_CR18 doi: 10.1145/109625.109636 – ident: 1449_CR11 doi: 10.1007/978-3-642-19448-1_9 – volume: 305 start-page: 144 year: 2002 ident: 1449_CR15 publication-title: Phys Lett A doi: 10.1016/S0375-9601(02)01365-8 – volume: 28 start-page: 39 issue: 2 year: 2008 ident: 1449_CR19 publication-title: IEEE Micro doi: 10.1109/MM.2008.31 – ident: 1449_CR34 doi: 10.1016/j.future.2014.12.010 – ident: 1449_CR2 doi: 10.1145/2133173.2133182 – volume: 18 start-page: 671 issue: 4 year: 2010 ident: 1449_CR14 publication-title: IEEE Trans Very Large Scale Integr Sys doi: 10.1109/TVLSI.2009.2014068 – ident: 1449_CR33 doi: 10.1109/DSD.2008.45 – ident: 1449_CR22 doi: 10.1109/DATE.2007.364666 – ident: 1449_CR32 doi: 10.1109/ISIE.1999.801754 – ident: 1449_CR25 – ident: 1449_CR5 – volume: 5 start-page: 296 issue: 9 year: 2008 ident: 1449_CR26 publication-title: IEICE Electron Express doi: 10.1587/elex.5.296 – ident: 1449_CR28 doi: 10.1109/CLUSTR.2008.4663765 – ident: 1449_CR23 doi: 10.1145/951710.951722 – volume: 33 start-page: 677 issue: 6 year: 1990 ident: 1449_CR27 publication-title: Commun ACM doi: 10.1145/78973.78978 – volume: 17 start-page: 77 issue: 1–2 year: 2009 ident: 1449_CR3 publication-title: Sci Program – ident: 1449_CR12 – ident: 1449_CR30 doi: 10.1109/SUPERC.1992.236678 – ident: 1449_CR21 doi: 10.1109/DSD.2011.62 – volume: 21 start-page: 173 issue: 2 year: 2011 ident: 1449_CR8 publication-title: Parallel Process Lett doi: 10.1142/S0129626411000151 – ident: 1449_CR9 doi: 10.1109/MICRO.2010.13 – ident: 1449_CR6 doi: 10.1007/978-3-642-23400-2_52 – volume: 20 start-page: 483 issue: 3 year: 1998 ident: 1449_CR29 publication-title: ACM Trans Program Lang Syst (TPLS) doi: 10.1145/291889.291893 – ident: 1449_CR1 doi: 10.1007/978-3-540-92990-1_12 – ident: 1449_CR37 doi: 10.1016/j.procs.2013.05.197 – ident: 1449_CR7 doi: 10.1145/1687399.1687508 – ident: 1449_CR20 doi: 10.1109/DSD.2010.63 – ident: 1449_CR4 doi: 10.1109/SC.2006.17 – ident: 1449_CR13 doi: 10.1145/1941553.1941563 – ident: 1449_CR17 doi: 10.1145/1250662.1250683 – ident: 1449_CR10 – ident: 1449_CR35 doi: 10.1109/TPDS.2013.125
SSID	ssj0004373
Score	2.0489516
Snippet	For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management...
SourceID	crossref springer
SourceType	Enrichment Source Index Database Publisher
StartPage	3567
SubjectTerms	Compilers Computer Science Interpreters Processor Architectures Programming Languages
Title	Design space exploration of hardware task superscalar architecture
URI	https://link.springer.com/article/10.1007/s11227-015-1449-1
Volume	71
WOSCitedRecordID	wos000360390700018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAVX databaseName: Springer Journals customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5SPXixPrG-yMGTEthmk93k6Kt4KuKL3pbJJgFRatnd6t93Nt21FlTQ-ySEb5LMDN88CDk2vG9i4S2LhDdMeDDMoFfKUoO-MgZv3moIwybS4VCNRvqmqeMu22z3lpIMP_W82K3PeZ0mKRkGAZphyLOM1k7V8xpu7x7nxZDxjFbWGBcpKXhLZX63xaIxWmRCg4EZdP91tHWy1viT9Gx2ATbIkhtvkm47q4E2T3eLnF-GVA2KH0juqAuZd0Ep9NXTuvLqHQpHKyifaTmdoE-IuoOCfuUZtsnD4Or-4po18xNYHktVMYMGCtFGAxxbr62KDFjPFWgPUnojeNp3zkeAIl4nwukYkiQ1yjrupAQV75DO-HXsdglNjJcqAvQ3JBdOaYi4h1jk1hoMQyHvkagFMsub5uL1jIuXbN4WucYoQ4yyGqOs3yMnn0sms84avwmftshnzSMrf5be-5P0PlnltepC4tgB6VTF1B2SlfyteiqLo3C5PgD3vcog
linkProvider	Springer Nature
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFA4yBX1xXnFe8-CTEmjTpE0evY2Jc4hO2VtJ2gRE2Ubb6d_3NGudAxX0_SSE7yQ55-PcEDrW1NcBsynxmNWEWaWJBq-URBp8ZSBvNpXKDZuIej0xGMi7qo47r7Pd65Ck-6lnxW4-pWWaJCdAAiQByrPIwGCVDfPvH55mxZDBNKwsgRcJzmgdyvxui3ljNB8JdQam3fzX0dbQauVP4rPpBVhHC2a4gZr1rAZcPd1NdH7pUjUwfCCJwcZl3jml4JHFZeXVu8oMLlT-gvPJGHxC0J3K8Nc4wxZ6bF_1Lzqkmp9AkoCLgmgwUIA2GOAgtTIVnlappUJJqzi3mtHIN8Z6CkSsDJmRgQrDSIvUUMO5EsE2agxHQ7ODcKgtF54Cf4NTZoRUHrUqYEmaaqChKmkhrwYyTqrm4uWMi9d41ha5xCgGjOISo9hvoZPPJeNpZ43fhE9r5OPqkeU_S-_-SfoILXf6t924e9272UMrtFSjSyLbR40im5gDtJS8Fc95dugu2gd7Qs0E
linkToPdf	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA6iIl6sT6zPHDwpodts0k2Oai2KUgo-6G1JNgmIsi3drf59J_uwFlQQ77PLMpPszMd8Mx9CJ5q2dcicIQFzmjCnNNFQlZJIQ60M4M0ZqQqxiajfF8OhHFQ6p1nNdq9bkuVMg9_SlOatsXGt2eBbm1JPmeQEAIEkAH-WmOfRe7h-_zQbjAzLFrMEjCQ4o3Vb87tXzCem-a5okWx6jX9_5jpaq-pMfF4ejA20YNNN1Kg1HHB1pbfQRbegcGD4sSQW24KRVwQLjxz2E1nvamJxrrIXnE3HUCtCTNUEf-0_bKPH3tXD5TWpdBVIEnKREw2JC6IAiTk0ThoRaGUcFUo6xbnTjEZta12gwMTJDrMyVJ1OpIWx1HKuRLiDFtNRancR7mjHRaCgDuGUWSFVQJ0KWWKMBniqkiYKaqfGSbV03GtfvMazdcneRzH4KPY-ittNdPr5yLjcuPGb8Vkdhbi6fNnP1nt_sj5GK4NuL7676d_uo1Xqo1hwyw7QYj6Z2kO0nLzlz9nkqDhzH87m1eg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Design+space+exploration+of+hardware+task+superscalar+architecture&rft.jtitle=The+Journal+of+supercomputing&rft.au=Yazdanpanah%2C+Fahimeh&rft.au=Alaei%2C+Mohammad&rft.date=2015-09-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=71&rft.issue=9&rft.spage=3567&rft.epage=3592&rft_id=info:doi/10.1007%2Fs11227-015-1449-1&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_015_1449_1
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon