Design space exploration of hardware task superscalar architecture

For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing Vol. 71; no. 9; pp. 3567 - 3592
Main Authors: Yazdanpanah, Fahimeh, Alaei, Mohammad
Format: Journal Article
Language:English
Published: New York Springer US 01.09.2015
Subjects:
ISSN:0920-8542, 1573-0484
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.
AbstractList For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management mechanisms have been proposed. In particular, task-based dataflow programming models which benefit from dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. However, these programming models rely on software-based dependency analysis, which are performed inherently slowly; and this limits their scalability specially when there is fine-grained task granularity and a large amount of tasks. Moreover, task scheduling in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution, like Task SuperScalar (TSS), can achieve greater values of speed-up because a hardware task scheduler requires fewer cycles than the software version to dispatch a task. TSS combines the effectiveness of Out-of-Order processors together with the task abstraction. It has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. Hardware Task Superscalar (HTSS) is proposed to solve these drawbacks. HTSS is designed to be integrated in a future high performance computer with the ability to exploit fine-grained task parallelism. In this article, a deep latency and design space exploration of HTSS is described. For design space exploration, we have designed a full cycle-accurate simulator of HTSS, called SimTSS. The simulator has been tuned based on latency exploration of HTSS components resulted from VHDL description of each component. As the result of this exploration, we have found the number of components and memory capacity of HTSS for HPC systems.
Author Alaei, Mohammad
Yazdanpanah, Fahimeh
Author_xml – sequence: 1
  givenname: Fahimeh
  surname: Yazdanpanah
  fullname: Yazdanpanah, Fahimeh
  email: yazdanpanah@uk.ac.ir
  organization: Computer Engineering Department, Faculty of Engineering, Shahid Bahonar University of Kerman
– sequence: 2
  givenname: Mohammad
  surname: Alaei
  fullname: Alaei, Mohammad
  organization: Computer Engineering Department, Faculty of Engineering, Shahid Bahonar University of Kerman
BookMark eNp9kMtOwzAQRS1UJNrCB7DzDxj8TJwllKdUiQ2srUkyblNCEtmOgL8npaxYdHU394zmngWZdX2HhFwKfiU4z6-jEFLmjAvDhNYFEydkLkyuGNdWz8icF5Iza7Q8I4sYd5xzrXI1J7d3GJtNR-MAFVL8Gto-QGr6jvaebiHUnxCQJojvNI4DhlhBC4FCqLZNwiqNAc_JqYc24sVfLsnbw_3r6omtXx6fVzdrViljEyu1UKXymdCq9kVteQm1lxYKD8b4UstcIHoOU8UXmcZCQZblpa1RojFg1ZLkh7tV6GMM6F3VpN9fU4CmdYK7vQp3UOEmFW6vwomJFP_IITQfEL6PMvLAxKnbbTC4XT-Gbhp4BPoBfPJ0RA
CitedBy_id crossref_primary_10_1016_j_parco_2024_103084
crossref_primary_10_1002_cpe_8318
Cites_doi 10.1145/993396.993404
10.1109/2.214440
10.1016/j.procs.2013.05.197
10.1049/ip-cds:20040434
10.1145/109625.109636
10.1007/978-3-642-19448-1_9
10.1016/S0375-9601(02)01365-8
10.1109/MM.2008.31
10.1016/j.future.2014.12.010
10.1145/2133173.2133182
10.1109/TVLSI.2009.2014068
10.1109/DSD.2008.45
10.1109/DATE.2007.364666
10.1109/ISIE.1999.801754
10.1587/elex.5.296
10.1109/CLUSTR.2008.4663765
10.1145/951710.951722
10.1145/78973.78978
10.1109/SUPERC.1992.236678
10.1109/DSD.2011.62
10.1142/S0129626411000151
10.1109/MICRO.2010.13
10.1007/978-3-642-23400-2_52
10.1145/291889.291893
10.1007/978-3-540-92990-1_12
10.1145/1687399.1687508
10.1109/DSD.2010.63
10.1109/SC.2006.17
10.1145/1941553.1941563
10.1145/1250662.1250683
10.1109/TPDS.2013.125
ContentType Journal Article
Copyright Springer Science+Business Media New York 2015
Copyright_xml – notice: Springer Science+Business Media New York 2015
DBID AAYXX
CITATION
DOI 10.1007/s11227-015-1449-1
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 3592
ExternalDocumentID 10_1007_s11227_015_1449_1
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
ID FETCH-LOGICAL-c358t-b413b3f6143df9d80badf28a9fa55fb4271eef0a3f6f964e93a667b8de2e55a83
IEDL.DBID RSV
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000360390700018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Sat Nov 29 06:13:07 EST 2025
Tue Nov 18 21:40:01 EST 2025
Fri Feb 21 02:27:41 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords Task scheduling
Task parallelism
Task superscalar
OmpSs
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c358t-b413b3f6143df9d80badf28a9fa55fb4271eef0a3f6f964e93a667b8de2e55a83
PageCount 26
ParticipantIDs crossref_citationtrail_10_1007_s11227_015_1449_1
crossref_primary_10_1007_s11227_015_1449_1
springer_journals_10_1007_s11227_015_1449_1
PublicationCentury 2000
PublicationDate 2015-09-01
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 09
  year: 2015
  text: 2015-09-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2015
Publisher Springer US
Publisher_xml – name: Springer US
References KishLBEnd of Moore’s law: thermal (noise) death of integration in micro and nano electronicsPhys Lett A200230514414910.1016/S0375-9601(02)01365-8
Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC)
DuranAAyguadeEBadiaRMLabartaJMartinellLMartorellXPlanasJOmpss: a proposal for programming heterogeneous multi-core architecturesParallel Process Lett2011212173193281200010.1142/S0129626411000151
Bsc application repository, bar (2014). In: Barcelona Supercomputing Center (BSC). https://pm.bsc.es/projects/bar. Accessed 06 Feb 2014
NogueraJBadiaRMMultitasking on reconfigurable architectures: microarchitecture support and dynamic schedulingACM Trans Embedded Comput Syst20043238540610.1145/993396.993404
Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566
Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68
Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509
Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151
ParkSA hardware operating system kernel for multi processorsIEICE Electron Express20085929630210.1587/elex.5.296
Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157
Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83
Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152
Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105
KalraRLyseckyRConfiguration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systemsIEEE Trans Very Large Scale Integr Sys201018467167410.1109/TVLSI.2009.2014068
Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256
Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173
LindholmENickollsJObermanSMontrymJNVIDIA Tesla: a unified graphics and computing architectureIEEE Micro2008282395510.1109/MM.2008.31
RinardMCScalesDJLamMSJade: a high-level, machine-independent language for parallel programmingComputer1993266283810.1109/2.214440
Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647
Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354
Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011)
Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York
KishLBMoore’s law and the energy requirement of computing versus performanceIEE Proc Circuits Dev Syst20041512190194195509310.1049/ip-cds:20040434
Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100
Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst
BellensPPerezJMCabarcasFRamirezABadiaRMLabartaJCellSs: scheduling techniques to better exploit memory hierarchySci Program2009171–27795
Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594
Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS)
Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11
Openmp application program interface, version 4.0 (2013). www.openmp.org/. Accessed 06 Feb 2014
Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48
PearsonPKFast hashing of variable-length text stringsCommun ACM199033667768010.1145/78973.78978
Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445
Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR)
Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20
RinardMCLamMSThe design, implementation, and evaluation of JadeACM Trans Program Lang Syst (TPLS)199820348354510.1145/291889.291893
S Park (1449_CR26) 2008; 5
P Bellens (1449_CR3) 2009; 17
LB Kish (1449_CR15) 2002; 305
MC Rinard (1449_CR31) 1993; 26
1449_CR25
MC Rinard (1449_CR29) 1998; 20
1449_CR28
1449_CR21
1449_CR20
1449_CR23
1449_CR22
E Lindholm (1449_CR19) 2008; 28
1449_CR1
1449_CR2
1449_CR5
1449_CR4
PK Pearson (1449_CR27) 1990; 33
1449_CR36
1449_CR13
1449_CR35
A Duran (1449_CR8) 2011; 21
LB Kish (1449_CR16) 2004; 151
1449_CR37
1449_CR18
1449_CR17
J Noguera (1449_CR24) 2004; 3
1449_CR7
1449_CR6
1449_CR9
1449_CR30
1449_CR10
1449_CR32
1449_CR12
R Kalra (1449_CR14) 2010; 18
1449_CR34
1449_CR11
1449_CR33
References_xml – reference: PearsonPKFast hashing of variable-length text stringsCommun ACM199033667768010.1145/78973.78978
– reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y (2013) Hybrid dataflow/von-Neumann architectures. IEEE Trans Parallel Distrib Syst (TPDS) 25(6):1489–1509
– reference: Sjalander M, Terechko A, Duranton M (2008) A look-ahead task management unit for embedded multi-core architectures. In: Proceedings of the conference on digital system design (DSD), pp 149–157
– reference: Yazdanpanah F, Alvarez C, Jimenez-Gonalez D, Badia RM, Valero M (2015) Picos: a hardware runtime architecture support for ompss. Future Gener Comput Syst
– reference: KishLBMoore’s law and the energy requirement of computing versus performanceIEE Proc Circuits Dev Syst20041512190194195509310.1049/ip-cds:20040434
– reference: Lam MS, Rinard MC (1991) Coarse-grain parallel programming in Jade. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP). ACM, New York, pp 94–105
– reference: BellensPPerezJMCabarcasFRamirezABadiaRMLabartaJCellSs: scheduling techniques to better exploit memory hierarchySci Program2009171–27795
– reference: Meenderinck C, Juurlink B (2010) A case for hardware task management support for the StarSs programming model. In: Proceedings of the conference on digital system design (DSD), pp 347–354
– reference: Kumar S, Hughes CJ, Nguyen A (2007) Carbon: Architectural support for fine-grained parallelism on chip multiprocessors. In: Proceedings of the international symposium on computer architecture (ISCA), pp 162–173
– reference: RinardMCLamMSThe design, implementation, and evaluation of JadeACM Trans Program Lang Syst (TPLS)199820348354510.1145/291889.291893
– reference: Noguera J, Badia RM (2003) System-level power-performance trade-offs in task scheduling for dynamically reconfigurable architectures. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 73–83
– reference: Bellens P, Perez J, Badia R, Labarta J (2006) CellSs: a programming model for the cell BE architecture. In: Proceedings of the supercomputing (SC). ACM, New York
– reference: DuranAAyguadeEBadiaRMLabartaJMartinellLMartorellXPlanasJOmpss: a proposal for programming heterogeneous multi-core architecturesParallel Process Lett2011212173193281200010.1142/S0129626411000151
– reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) Analysis of the task superscalar architecture hardware design. In: Proceedings of the international conference on computational science (ICCS)
– reference: Jenista JC, Eom YH, Demsky B (2010) OoOJava: an out-of-order approach to parallel programming. In: Proceedings of the USENIX conference on hot topic in parallelism (HotPar), pp 11–11
– reference: NogueraJBadiaRMMultitasking on reconfigurable architectures: microarchitecture support and dynamic schedulingACM Trans Embedded Comput Syst20043238540610.1145/993396.993404
– reference: Hoogerbrugge J, Terechko A (2011) A multithreaded multicore system for embedded media processing. Trans High-Perform Embedded Archit Compil (THEA) 3(2):154–173 (2011)
– reference: Meenderinck C, Juurlink B (2011) Nexus: hardware support for task-based programming. In: Proceedings of the conference on digital system design (DSD), pp 442–445
– reference: Etsion Y, Cabarcas F, Rico A, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: an out-of-order task pipeline. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 89–100
– reference: Badia RM (2011) Top down programming methodology and tools with StarSs, enabling scalable programming paradigms: extended abstract. In: Proceedings of the workshop on scalable algorithms for large-scale systems (ScalA), pp 19–20
– reference: RinardMCScalesDJLamMSJade: a high-level, machine-independent language for parallel programmingComputer1993266283810.1109/2.214440
– reference: Perez, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the international conference on cluster computing (CC), pp 142–151
– reference: Castrillon J, Zhang D, Kempf T, Vanthournout B, Leupers R, Ascheid G (2009) Task management in MPSoCs: an ASIP approach. In: Proceedings of the international conference on computer-aided design (ICCAD), pp 587–594
– reference: KalraRLyseckyRConfiguration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systemsIEEE Trans Very Large Scale Integr Sys201018467167410.1109/TVLSI.2009.2014068
– reference: Bueno J, Martinell L, Duran A, Farreras M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Productive cluster programming with OmpSs. In: Proceedings of the International conference on parallel processing (Euro-Par), pp 555–566
– reference: Etsion Y, Ramirez A, Badia RM, Ayguade E, Labarta J, Valero M (2010) Task superscalar: using processors as functional units. In: Proceedings of the hot topics in parallelism (HOTPAR)
– reference: ParkSA hardware operating system kernel for multi processorsIEICE Electron Express20085929630210.1587/elex.5.296
– reference: Bsc application repository, bar (2014). In: Barcelona Supercomputing Center (BSC). https://pm.bsc.es/projects/bar. Accessed 06 Feb 2014
– reference: Al-Kadi G, Terechko AS (2009) A hardware task scheduler for embedded video processing. In: Proceedings of the international conference on high performance and embedded architectures and compilers (HiPEAC), pp 140–152
– reference: LindholmENickollsJObermanSMontrymJNVIDIA Tesla: a unified graphics and computing architectureIEEE Micro2008282395510.1109/MM.2008.31
– reference: Nacul AC, Regazzoni F, Lajolo M (2007) Hardware scheduling support in SMP architectures. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 642–647
– reference: Openmp application program interface, version 4.0 (2013). www.openmp.org/. Accessed 06 Feb 2014
– reference: Yazdanpanah F, Jimenez-Gonzalez D, Alvarez-Martinez C, Etsion Y, Badia RM (2013) FPGA-based prototype of the task superscalar architecture. In: Proceedings of the 7th HiPEAC workshop of reconfigurable computing (WRC)
– reference: Jenista JC, Eom YH, Demsky BC (2011) OoOJava: software out-of-order execution. In: Proceedings of the ACM symposium on principles and practice of parallel programming (PPoPP), pp 57–68
– reference: Rinard MC, Scales DJ, Lam MS (1992) Heterogeneous parallel programming in Jade. In: Proceedings of the conference on supercomputing, pp 245–256
– reference: Saez S, Vila J, Crespo A, Garcia A (1999) A hardware scheduler for complex real time system. In: Proceedings of the IEEE international symposium industrial electronics (ISIE). IEEE, pp 43–48
– reference: KishLBEnd of Moore’s law: thermal (noise) death of integration in micro and nano electronicsPhys Lett A200230514414910.1016/S0375-9601(02)01365-8
– volume: 3
  start-page: 385
  issue: 2
  year: 2004
  ident: 1449_CR24
  publication-title: ACM Trans Embedded Comput Syst
  doi: 10.1145/993396.993404
– volume: 26
  start-page: 28
  issue: 6
  year: 1993
  ident: 1449_CR31
  publication-title: Computer
  doi: 10.1109/2.214440
– ident: 1449_CR36
  doi: 10.1016/j.procs.2013.05.197
– volume: 151
  start-page: 190
  issue: 2
  year: 2004
  ident: 1449_CR16
  publication-title: IEE Proc Circuits Dev Syst
  doi: 10.1049/ip-cds:20040434
– ident: 1449_CR18
  doi: 10.1145/109625.109636
– ident: 1449_CR11
  doi: 10.1007/978-3-642-19448-1_9
– volume: 305
  start-page: 144
  year: 2002
  ident: 1449_CR15
  publication-title: Phys Lett A
  doi: 10.1016/S0375-9601(02)01365-8
– volume: 28
  start-page: 39
  issue: 2
  year: 2008
  ident: 1449_CR19
  publication-title: IEEE Micro
  doi: 10.1109/MM.2008.31
– ident: 1449_CR34
  doi: 10.1016/j.future.2014.12.010
– ident: 1449_CR2
  doi: 10.1145/2133173.2133182
– volume: 18
  start-page: 671
  issue: 4
  year: 2010
  ident: 1449_CR14
  publication-title: IEEE Trans Very Large Scale Integr Sys
  doi: 10.1109/TVLSI.2009.2014068
– ident: 1449_CR33
  doi: 10.1109/DSD.2008.45
– ident: 1449_CR22
  doi: 10.1109/DATE.2007.364666
– ident: 1449_CR32
  doi: 10.1109/ISIE.1999.801754
– ident: 1449_CR25
– ident: 1449_CR5
– volume: 5
  start-page: 296
  issue: 9
  year: 2008
  ident: 1449_CR26
  publication-title: IEICE Electron Express
  doi: 10.1587/elex.5.296
– ident: 1449_CR28
  doi: 10.1109/CLUSTR.2008.4663765
– ident: 1449_CR23
  doi: 10.1145/951710.951722
– volume: 33
  start-page: 677
  issue: 6
  year: 1990
  ident: 1449_CR27
  publication-title: Commun ACM
  doi: 10.1145/78973.78978
– volume: 17
  start-page: 77
  issue: 1–2
  year: 2009
  ident: 1449_CR3
  publication-title: Sci Program
– ident: 1449_CR12
– ident: 1449_CR30
  doi: 10.1109/SUPERC.1992.236678
– ident: 1449_CR21
  doi: 10.1109/DSD.2011.62
– volume: 21
  start-page: 173
  issue: 2
  year: 2011
  ident: 1449_CR8
  publication-title: Parallel Process Lett
  doi: 10.1142/S0129626411000151
– ident: 1449_CR9
  doi: 10.1109/MICRO.2010.13
– ident: 1449_CR6
  doi: 10.1007/978-3-642-23400-2_52
– volume: 20
  start-page: 483
  issue: 3
  year: 1998
  ident: 1449_CR29
  publication-title: ACM Trans Program Lang Syst (TPLS)
  doi: 10.1145/291889.291893
– ident: 1449_CR1
  doi: 10.1007/978-3-540-92990-1_12
– ident: 1449_CR37
  doi: 10.1016/j.procs.2013.05.197
– ident: 1449_CR7
  doi: 10.1145/1687399.1687508
– ident: 1449_CR20
  doi: 10.1109/DSD.2010.63
– ident: 1449_CR4
  doi: 10.1109/SC.2006.17
– ident: 1449_CR13
  doi: 10.1145/1941553.1941563
– ident: 1449_CR17
  doi: 10.1145/1250662.1250683
– ident: 1449_CR10
– ident: 1449_CR35
  doi: 10.1109/TPDS.2013.125
SSID ssj0004373
Score 2.0489516
Snippet For current high performance computing systems, exploiting concurrency is a serious and important challenge. Recently, several dynamic software task management...
SourceID crossref
springer
SourceType Enrichment Source
Index Database
Publisher
StartPage 3567
SubjectTerms Compilers
Computer Science
Interpreters
Processor Architectures
Programming Languages
Title Design space exploration of hardware task superscalar architecture
URI https://link.springer.com/article/10.1007/s11227-015-1449-1
Volume 71
WOSCitedRecordID wos000360390700018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: Springer Journals
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5SPXixPrG-yMGTEthmk93k6Kt4KuKL3pbJJgFRatnd6t93Nt21FlTQ-ySEb5LMDN88CDk2vG9i4S2LhDdMeDDMoFfKUoO-MgZv3moIwybS4VCNRvqmqeMu22z3lpIMP_W82K3PeZ0mKRkGAZphyLOM1k7V8xpu7x7nxZDxjFbWGBcpKXhLZX63xaIxWmRCg4EZdP91tHWy1viT9Gx2ATbIkhtvkm47q4E2T3eLnF-GVA2KH0juqAuZd0Ep9NXTuvLqHQpHKyifaTmdoE-IuoOCfuUZtsnD4Or-4po18xNYHktVMYMGCtFGAxxbr62KDFjPFWgPUnojeNp3zkeAIl4nwukYkiQ1yjrupAQV75DO-HXsdglNjJcqAvQ3JBdOaYi4h1jk1hoMQyHvkagFMsub5uL1jIuXbN4WucYoQ4yyGqOs3yMnn0sms84avwmftshnzSMrf5be-5P0PlnltepC4tgB6VTF1B2SlfyteiqLo3C5PgD3vcog
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFA4yBX1xXnFe8-CTEmjTpE0evY2Jc4hO2VtJ2gRE2Ubb6d_3NGudAxX0_SSE7yQ55-PcEDrW1NcBsynxmNWEWaWJBq-URBp8ZSBvNpXKDZuIej0xGMi7qo47r7Pd65Ck-6lnxW4-pWWaJCdAAiQByrPIwGCVDfPvH55mxZDBNKwsgRcJzmgdyvxui3ljNB8JdQam3fzX0dbQauVP4rPpBVhHC2a4gZr1rAZcPd1NdH7pUjUwfCCJwcZl3jml4JHFZeXVu8oMLlT-gvPJGHxC0J3K8Nc4wxZ6bF_1Lzqkmp9AkoCLgmgwUIA2GOAgtTIVnlappUJJqzi3mtHIN8Z6CkSsDJmRgQrDSIvUUMO5EsE2agxHQ7ODcKgtF54Cf4NTZoRUHrUqYEmaaqChKmkhrwYyTqrm4uWMi9d41ha5xCgGjOISo9hvoZPPJeNpZ43fhE9r5OPqkeU_S-_-SfoILXf6t924e9272UMrtFSjSyLbR40im5gDtJS8Fc95dugu2gd7Qs0E
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA6iIl6sT6zPHDwpodts0k2Oai2KUgo-6G1JNgmIsi3drf59J_uwFlQQ77PLMpPszMd8Mx9CJ5q2dcicIQFzmjCnNNFQlZJIQ60M4M0ZqQqxiajfF8OhHFQ6p1nNdq9bkuVMg9_SlOatsXGt2eBbm1JPmeQEAIEkAH-WmOfRe7h-_zQbjAzLFrMEjCQ4o3Vb87tXzCem-a5okWx6jX9_5jpaq-pMfF4ejA20YNNN1Kg1HHB1pbfQRbegcGD4sSQW24KRVwQLjxz2E1nvamJxrrIXnE3HUCtCTNUEf-0_bKPH3tXD5TWpdBVIEnKREw2JC6IAiTk0ThoRaGUcFUo6xbnTjEZta12gwMTJDrMyVJ1OpIWx1HKuRLiDFtNRancR7mjHRaCgDuGUWSFVQJ0KWWKMBniqkiYKaqfGSbV03GtfvMazdcneRzH4KPY-ittNdPr5yLjcuPGb8Vkdhbi6fNnP1nt_sj5GK4NuL7676d_uo1Xqo1hwyw7QYj6Z2kO0nLzlz9nkqDhzH87m1eg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Design+space+exploration+of+hardware+task+superscalar+architecture&rft.jtitle=The+Journal+of+supercomputing&rft.au=Yazdanpanah%2C+Fahimeh&rft.au=Alaei%2C+Mohammad&rft.date=2015-09-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=71&rft.issue=9&rft.spage=3567&rft.epage=3592&rft_id=info:doi/10.1007%2Fs11227-015-1449-1&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_015_1449_1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon