Letting future programmers experience performance-related tasks

•Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of parallel and distributed computing Ročník 155; s. 74 - 86
Hlavní autori: Bednárek, David, Kruliš, Martin, Yaghob, Jakub
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Inc 01.09.2021
Predmet:
ISSN:0743-7315
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and Spark). Programming courses usually focus on software-engineering problems like software decomposition and code maintenance. While computer-science lessons emphasize algorithm complexity, technological problems are usually neglected although they may significantly affect the performance in terms of wall time. As the technological problems are best explained by hands-on experience, we present a set of homework assignments focused on a range of technologies from instruction-level parallelism to GPU programming to cluster computing. These assignments are a product of a decade of development and testing on live subjects – the students of three performance-related software courses at the Faculty of Mathematics and Physics of the Charles University in Prague.
AbstractList •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and Spark). Programming courses usually focus on software-engineering problems like software decomposition and code maintenance. While computer-science lessons emphasize algorithm complexity, technological problems are usually neglected although they may significantly affect the performance in terms of wall time. As the technological problems are best explained by hands-on experience, we present a set of homework assignments focused on a range of technologies from instruction-level parallelism to GPU programming to cluster computing. These assignments are a product of a decade of development and testing on live subjects – the students of three performance-related software courses at the Faculty of Mathematics and Physics of the Charles University in Prague.
Author Bednárek, David
Kruliš, Martin
Yaghob, Jakub
Author_xml – sequence: 1
  givenname: David
  orcidid: 0000-0001-7740-0158
  surname: Bednárek
  fullname: Bednárek, David
  email: bednarek@ksi.mff.cuni.cz
– sequence: 2
  givenname: Martin
  orcidid: 0000-0002-0985-8949
  surname: Kruliš
  fullname: Kruliš, Martin
  email: krulis@ksi.mff.cuni.cz
– sequence: 3
  givenname: Jakub
  orcidid: 0000-0002-9602-2196
  surname: Yaghob
  fullname: Yaghob, Jakub
  email: yaghob@ksi.mff.cuni.cz
BookMark eNp9kM1KAzEUhbOoYFt9AVfzAjPmZ9KkIIgUtULBja5DmtyUjJ0fblLRt3dqXbno6h4OfAe-OyOTru-AkBtGK0bZ4rapmsG7ilPOKlpXlNUTMqWqFqUSTF6SWUoNpYxJpafkfgM5x25XhEM-IBQD9ju0bQuYCvgaACN0bqwBQ4-tHXOJsLcZfJFt-khX5CLYfYLrvzsn70-Pb6t1uXl9flk9bEonKM2l11ozHhyttebKSb5U2y1YqZgXcquDAwhSck5rycMClBTaq-VSSes9ExbEnOjTrsM-JYRgXMw2x77LaOPeMGqO8qYxR3lzlDe0NqP8iPJ_6ICxtfh9Hro7QTBKfUZAk9zvK3xEcNn4Pp7DfwBZMHiG
CitedBy_id crossref_primary_10_1016_j_parco_2024_103096
Cites_doi 10.1145/1356052.1356053
10.1016/j.patrec.2007.01.001
10.1016/j.is.2016.06.001
10.1145/143103.143132
10.1145/321796.321811
ContentType Journal Article
Copyright 2021 Elsevier Inc.
Copyright_xml – notice: 2021 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.jpdc.2021.04.014
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Education
Computer Science
EndPage 86
ExternalDocumentID 10_1016_j_jpdc_2021_04_014
S0743731521000976
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1~.
1~5
29L
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
ABBOA
ABDPE
ABEFU
ABFNM
ABFSI
ABJNI
ABMAC
ABTAH
ABWVN
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACRPL
ACZNC
ADBBV
ADEZE
ADFGL
ADHUB
ADJOM
ADMUD
ADNMO
ADTZH
ADVLN
AEBSH
AECPX
AEIPS
AEKER
AENEX
AFJKZ
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNPGV
CAG
COF
CS3
DM4
DU5
E.L
EBS
EFBJH
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
K-O
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SSH
SST
SSV
SSZ
T5K
TN5
TWZ
WUQ
XJT
XOL
XPP
ZMT
ZU3
ZY4
~G-
9DU
AAYWO
AAYXX
ACLOT
ACVFH
ADCNI
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c300t-d88812fc048827c5297bbea571d35b8fceef55220452f6e7538d79975add13ae3
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000656871100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0743-7315
IngestDate Sat Nov 29 07:15:39 EST 2025
Tue Nov 18 22:13:22 EST 2025
Sun Apr 06 06:54:09 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords Assignments
Parallel computing
HPC
Education
GPU
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c300t-d88812fc048827c5297bbea571d35b8fceef55220452f6e7538d79975add13ae3
ORCID 0000-0001-7740-0158
0000-0002-0985-8949
0000-0002-9602-2196
PageCount 13
ParticipantIDs crossref_citationtrail_10_1016_j_jpdc_2021_04_014
crossref_primary_10_1016_j_jpdc_2021_04_014
elsevier_sciencedirect_doi_10_1016_j_jpdc_2021_04_014
PublicationCentury 2000
PublicationDate September 2021
2021-09-00
PublicationDateYYYYMMDD 2021-09-01
PublicationDate_xml – month: 09
  year: 2021
  text: September 2021
PublicationDecade 2020
PublicationTitle Journal of parallel and distributed computing
PublicationYear 2021
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Redmond, Heneghan (br0100) 2007; 28
Reinders (br0050) 2007
Bader, Zenger (br0010) 2006
Sarkar, Thekkath (br0020) 1992; 27
Goto, Geijn (br0120) 2008; 34
Bednárek, Brabec, Kruliš (br0080) 2017; 64
Knuth (br0150) 1997
Kaufman, Rousseeuw (br0090) 2009
Wagner, Fischer (br0060) 1974; 21
Farber (br0130) 2011
Ding, Zhao, Shen, Musuvathi, Mytkowicz (br0110) 2015
van Emde Boas (br0030) 1975
Chandra, Dagum, Kohr, Menon, Maydan, McDonald (br0040) 2001
Jette, Yoo, Grondona (br0140) 2002
Kennedy, Allen (br0070) 2001
Giardino, Ferri (br0160) 2016
Jette (10.1016/j.jpdc.2021.04.014_br0140) 2002
Kaufman (10.1016/j.jpdc.2021.04.014_br0090) 2009
Reinders (10.1016/j.jpdc.2021.04.014_br0050) 2007
Bednárek (10.1016/j.jpdc.2021.04.014_br0080) 2017; 64
van Emde Boas (10.1016/j.jpdc.2021.04.014_br0030) 1975
Chandra (10.1016/j.jpdc.2021.04.014_br0040) 2001
Farber (10.1016/j.jpdc.2021.04.014_br0130) 2011
Giardino (10.1016/j.jpdc.2021.04.014_br0160) 2016
Redmond (10.1016/j.jpdc.2021.04.014_br0100) 2007; 28
Knuth (10.1016/j.jpdc.2021.04.014_br0150) 1997
Goto (10.1016/j.jpdc.2021.04.014_br0120) 2008; 34
Kennedy (10.1016/j.jpdc.2021.04.014_br0070) 2001
Sarkar (10.1016/j.jpdc.2021.04.014_br0020) 1992; 27
Bader (10.1016/j.jpdc.2021.04.014_br0010) 2006
Wagner (10.1016/j.jpdc.2021.04.014_br0060) 1974; 21
Ding (10.1016/j.jpdc.2021.04.014_br0110) 2015
References_xml – year: 2011
  ident: br0130
  article-title: CUDA Application Design and Development
– volume: 27
  start-page: 175
  year: 1992
  end-page: 187
  ident: br0020
  article-title: A general framework for iteration-reordering loop transformations
  publication-title: SIGPLAN Not.
– volume: 64
  start-page: 175
  year: 2017
  end-page: 193
  ident: br0080
  article-title: Improving matrix-based dynamic programming on massively parallel accelerators
  publication-title: Inf. Syst.
– year: 2009
  ident: br0090
  article-title: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344
– start-page: 75
  year: 1975
  end-page: 84
  ident: br0030
  article-title: Preserving order in a forest in less than logarithmic time
  publication-title: 16th Annual Symposium on Foundations of Computer Science (sfcs 1975)
– year: 2001
  ident: br0070
  article-title: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach
– year: 1997
  ident: br0150
  article-title: The Art of Computer Programming, vol. 3
– volume: 28
  start-page: 965
  year: 2007
  end-page: 973
  ident: br0100
  article-title: A method for initialising the k-means clustering algorithm using kd-trees
  publication-title: Pattern Recognit. Lett.
– volume: 21
  start-page: 168
  year: 1974
  end-page: 173
  ident: br0060
  article-title: The string-to-string correction problem
  publication-title: J. ACM
– year: 2001
  ident: br0040
  article-title: Parallel Programming in OpenMP
– start-page: 1042
  year: 2006
  end-page: 1049
  ident: br0010
  article-title: A cache oblivious algorithm for matrix multiplication based on Peano's space filling curve
  publication-title: Parallel Processing and Applied Mathematics
– start-page: 579
  year: 2015
  end-page: 587
  ident: br0110
  article-title: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup
  publication-title: International Conference on Machine Learning
– volume: 34
  year: 2008
  ident: br0120
  article-title: Anatomy of high-performance matrix multiplication
  publication-title: ACM Trans. Math. Softw.
– year: 2007
  ident: br0050
  article-title: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism
– start-page: 1
  year: 2016
  end-page: 2
  ident: br0160
  article-title: Correlating hardware performance events to CPU and DRAM power consumption
  publication-title: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS)
– start-page: 44
  year: 2002
  end-page: 60
  ident: br0140
  article-title: SLURM: simple Linux utility for resource management
  publication-title: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003
– volume: 34
  issue: 3
  year: 2008
  ident: 10.1016/j.jpdc.2021.04.014_br0120
  article-title: Anatomy of high-performance matrix multiplication
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/1356052.1356053
– year: 1997
  ident: 10.1016/j.jpdc.2021.04.014_br0150
– year: 2001
  ident: 10.1016/j.jpdc.2021.04.014_br0040
– year: 2011
  ident: 10.1016/j.jpdc.2021.04.014_br0130
– year: 2009
  ident: 10.1016/j.jpdc.2021.04.014_br0090
– start-page: 1
  year: 2016
  ident: 10.1016/j.jpdc.2021.04.014_br0160
  article-title: Correlating hardware performance events to CPU and DRAM power consumption
– start-page: 1042
  year: 2006
  ident: 10.1016/j.jpdc.2021.04.014_br0010
  article-title: A cache oblivious algorithm for matrix multiplication based on Peano's space filling curve
– volume: 28
  start-page: 965
  issue: 8
  year: 2007
  ident: 10.1016/j.jpdc.2021.04.014_br0100
  article-title: A method for initialising the k-means clustering algorithm using kd-trees
  publication-title: Pattern Recognit. Lett.
  doi: 10.1016/j.patrec.2007.01.001
– start-page: 579
  year: 2015
  ident: 10.1016/j.jpdc.2021.04.014_br0110
  article-title: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup
– start-page: 44
  year: 2002
  ident: 10.1016/j.jpdc.2021.04.014_br0140
  article-title: SLURM: simple Linux utility for resource management
– year: 2001
  ident: 10.1016/j.jpdc.2021.04.014_br0070
– volume: 64
  start-page: 175
  issue: C
  year: 2017
  ident: 10.1016/j.jpdc.2021.04.014_br0080
  article-title: Improving matrix-based dynamic programming on massively parallel accelerators
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2016.06.001
– volume: 27
  start-page: 175
  issue: 7
  year: 1992
  ident: 10.1016/j.jpdc.2021.04.014_br0020
  article-title: A general framework for iteration-reordering loop transformations
  publication-title: SIGPLAN Not.
  doi: 10.1145/143103.143132
– year: 2007
  ident: 10.1016/j.jpdc.2021.04.014_br0050
– volume: 21
  start-page: 168
  issue: 1
  year: 1974
  ident: 10.1016/j.jpdc.2021.04.014_br0060
  article-title: The string-to-string correction problem
  publication-title: J. ACM
  doi: 10.1145/321796.321811
– start-page: 75
  year: 1975
  ident: 10.1016/j.jpdc.2021.04.014_br0030
  article-title: Preserving order in a forest in less than logarithmic time
SSID ssj0011578
Score 2.3126333
Snippet •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 74
SubjectTerms Assignments
Education
GPU
HPC
Parallel computing
Title Letting future programmers experience performance-related tasks
URI https://dx.doi.org/10.1016/j.jpdc.2021.04.014
Volume 155
WOSCitedRecordID wos000656871100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0743-7315
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0011578
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEF58Hbz4FuuLHLxJpJtku8lJRBT1IIIKvYVsslHbkoY2FX--M_tIa6lFBS-hpN1u2JnMfrs73zeEnCSB5C3AFW4u8JjRp4ErUi91c1w8RCxJ_VwXm-D392G7HT2YtLGhKifAiyL8-IjKfzU13ANjI3X2F-au_xRuwGcwOlzB7HD9keGRoKPSI5VaiE3Awu1pI-ev3uVyTBhwFZ8FgGeVDLvDb9AqSoT3elILC2SotouFsqTixJWjyk6AivGTFer4nQ5kdyptHgL7AAnZlwy-NlQhq_2NoSd5ee0LnbzbHYnJHQmP1ilXZpvMUmW-ZHIqLVTua-5mHXoZmwieulyPnYZbMwO83mvonHXKDAUoPaqEamkwns7qJMNH7BJ79Kiiq7QWybLHWQSxb_ni9qp9V582UaZnbPuIhlyl8wCne5oNYCZAydMGWTP2cS60F2ySBVlskXVbqcMxgXsLa3ObPJ5tcm4cxNEO4kw4iDN2EGeGgzjKQXbI8_XV0-WNa8pouKnfbFZuFoaA4vIUY7XHU-ZFXAiZME4zn4kwB5iUM4DhKK6ftySsX8OMRxFnMPVRP5H-Llkq-oXcI44QIhQAmUNYhwYBik1SwDNJIOD99gFrNgi1YxOnRmMeS530YptM2IlxPGMcz7gZxDCeDXJatym1wsrcXzM75LHBiBr7xeAhc9rt_7HdAVkdu_ghWaoGI3lEVtL36m04ODaO9AmlWItx
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Letting+future+programmers+experience+performance-related+tasks&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Bedn%C3%A1rek%2C+David&rft.au=Kruli%C5%A1%2C+Martin&rft.au=Yaghob%2C+Jakub&rft.date=2021-09-01&rft.pub=Elsevier+Inc&rft.issn=0743-7315&rft.volume=155&rft.spage=74&rft.epage=86&rft_id=info:doi/10.1016%2Fj.jpdc.2021.04.014&rft.externalDocID=S0743731521000976
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon