Letting future programmers experience performance-related tasks
•Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and...
Uložené v:
| Vydané v: | Journal of parallel and distributed computing Ročník 155; s. 74 - 86 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Inc
01.09.2021
|
| Predmet: | |
| ISSN: | 0743-7315 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and Spark).
Programming courses usually focus on software-engineering problems like software decomposition and code maintenance. While computer-science lessons emphasize algorithm complexity, technological problems are usually neglected although they may significantly affect the performance in terms of wall time. As the technological problems are best explained by hands-on experience, we present a set of homework assignments focused on a range of technologies from instruction-level parallelism to GPU programming to cluster computing. These assignments are a product of a decade of development and testing on live subjects – the students of three performance-related software courses at the Faculty of Mathematics and Physics of the Charles University in Prague. |
|---|---|
| AbstractList | •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering vectorization, caches, multicore CPUs, GPUs, and multi-node problems.•Introducing 5 widely used technologies (TBB, OpenMP, CUDA, OpenMPI, and Spark).
Programming courses usually focus on software-engineering problems like software decomposition and code maintenance. While computer-science lessons emphasize algorithm complexity, technological problems are usually neglected although they may significantly affect the performance in terms of wall time. As the technological problems are best explained by hands-on experience, we present a set of homework assignments focused on a range of technologies from instruction-level parallelism to GPU programming to cluster computing. These assignments are a product of a decade of development and testing on live subjects – the students of three performance-related software courses at the Faculty of Mathematics and Physics of the Charles University in Prague. |
| Author | Bednárek, David Kruliš, Martin Yaghob, Jakub |
| Author_xml | – sequence: 1 givenname: David orcidid: 0000-0001-7740-0158 surname: Bednárek fullname: Bednárek, David email: bednarek@ksi.mff.cuni.cz – sequence: 2 givenname: Martin orcidid: 0000-0002-0985-8949 surname: Kruliš fullname: Kruliš, Martin email: krulis@ksi.mff.cuni.cz – sequence: 3 givenname: Jakub orcidid: 0000-0002-9602-2196 surname: Yaghob fullname: Yaghob, Jakub email: yaghob@ksi.mff.cuni.cz |
| BookMark | eNp9kM1KAzEUhbOoYFt9AVfzAjPmZ9KkIIgUtULBja5DmtyUjJ0fblLRt3dqXbno6h4OfAe-OyOTru-AkBtGK0bZ4rapmsG7ilPOKlpXlNUTMqWqFqUSTF6SWUoNpYxJpafkfgM5x25XhEM-IBQD9ju0bQuYCvgaACN0bqwBQ4-tHXOJsLcZfJFt-khX5CLYfYLrvzsn70-Pb6t1uXl9flk9bEonKM2l11ozHhyttebKSb5U2y1YqZgXcquDAwhSck5rycMClBTaq-VSSes9ExbEnOjTrsM-JYRgXMw2x77LaOPeMGqO8qYxR3lzlDe0NqP8iPJ_6ICxtfh9Hro7QTBKfUZAk9zvK3xEcNn4Pp7DfwBZMHiG |
| CitedBy_id | crossref_primary_10_1016_j_parco_2024_103096 |
| Cites_doi | 10.1145/1356052.1356053 10.1016/j.patrec.2007.01.001 10.1016/j.is.2016.06.001 10.1145/143103.143132 10.1145/321796.321811 |
| ContentType | Journal Article |
| Copyright | 2021 Elsevier Inc. |
| Copyright_xml | – notice: 2021 Elsevier Inc. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.jpdc.2021.04.014 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Education Computer Science |
| EndPage | 86 |
| ExternalDocumentID | 10_1016_j_jpdc_2021_04_014 S0743731521000976 |
| GroupedDBID | --K --M -~X .~1 0R~ 1B1 1~. 1~5 29L 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN ABBOA ABDPE ABEFU ABFNM ABFSI ABJNI ABMAC ABTAH ABWVN ABXDB ACDAQ ACGFS ACNNM ACRLP ACRPL ACZNC ADBBV ADEZE ADFGL ADHUB ADJOM ADMUD ADNMO ADTZH ADVLN AEBSH AECPX AEIPS AEKER AENEX AFJKZ AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNPGV CAG COF CS3 DM4 DU5 E.L EBS EFBJH EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA K-O KOM LG5 LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SSH SST SSV SSZ T5K TN5 TWZ WUQ XJT XOL XPP ZMT ZU3 ZY4 ~G- 9DU AAYWO AAYXX ACLOT ACVFH ADCNI AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c300t-d88812fc048827c5297bbea571d35b8fceef55220452f6e7538d79975add13ae3 |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000656871100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0743-7315 |
| IngestDate | Sat Nov 29 07:15:39 EST 2025 Tue Nov 18 22:13:22 EST 2025 Sun Apr 06 06:54:09 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Assignments Parallel computing HPC Education GPU |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c300t-d88812fc048827c5297bbea571d35b8fceef55220452f6e7538d79975add13ae3 |
| ORCID | 0000-0001-7740-0158 0000-0002-0985-8949 0000-0002-9602-2196 |
| PageCount | 13 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_jpdc_2021_04_014 crossref_primary_10_1016_j_jpdc_2021_04_014 elsevier_sciencedirect_doi_10_1016_j_jpdc_2021_04_014 |
| PublicationCentury | 2000 |
| PublicationDate | September 2021 2021-09-00 |
| PublicationDateYYYYMMDD | 2021-09-01 |
| PublicationDate_xml | – month: 09 year: 2021 text: September 2021 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of parallel and distributed computing |
| PublicationYear | 2021 |
| Publisher | Elsevier Inc |
| Publisher_xml | – name: Elsevier Inc |
| References | Redmond, Heneghan (br0100) 2007; 28 Reinders (br0050) 2007 Bader, Zenger (br0010) 2006 Sarkar, Thekkath (br0020) 1992; 27 Goto, Geijn (br0120) 2008; 34 Bednárek, Brabec, Kruliš (br0080) 2017; 64 Knuth (br0150) 1997 Kaufman, Rousseeuw (br0090) 2009 Wagner, Fischer (br0060) 1974; 21 Farber (br0130) 2011 Ding, Zhao, Shen, Musuvathi, Mytkowicz (br0110) 2015 van Emde Boas (br0030) 1975 Chandra, Dagum, Kohr, Menon, Maydan, McDonald (br0040) 2001 Jette, Yoo, Grondona (br0140) 2002 Kennedy, Allen (br0070) 2001 Giardino, Ferri (br0160) 2016 Jette (10.1016/j.jpdc.2021.04.014_br0140) 2002 Kaufman (10.1016/j.jpdc.2021.04.014_br0090) 2009 Reinders (10.1016/j.jpdc.2021.04.014_br0050) 2007 Bednárek (10.1016/j.jpdc.2021.04.014_br0080) 2017; 64 van Emde Boas (10.1016/j.jpdc.2021.04.014_br0030) 1975 Chandra (10.1016/j.jpdc.2021.04.014_br0040) 2001 Farber (10.1016/j.jpdc.2021.04.014_br0130) 2011 Giardino (10.1016/j.jpdc.2021.04.014_br0160) 2016 Redmond (10.1016/j.jpdc.2021.04.014_br0100) 2007; 28 Knuth (10.1016/j.jpdc.2021.04.014_br0150) 1997 Goto (10.1016/j.jpdc.2021.04.014_br0120) 2008; 34 Kennedy (10.1016/j.jpdc.2021.04.014_br0070) 2001 Sarkar (10.1016/j.jpdc.2021.04.014_br0020) 1992; 27 Bader (10.1016/j.jpdc.2021.04.014_br0010) 2006 Wagner (10.1016/j.jpdc.2021.04.014_br0060) 1974; 21 Ding (10.1016/j.jpdc.2021.04.014_br0110) 2015 |
| References_xml | – year: 2011 ident: br0130 article-title: CUDA Application Design and Development – volume: 27 start-page: 175 year: 1992 end-page: 187 ident: br0020 article-title: A general framework for iteration-reordering loop transformations publication-title: SIGPLAN Not. – volume: 64 start-page: 175 year: 2017 end-page: 193 ident: br0080 article-title: Improving matrix-based dynamic programming on massively parallel accelerators publication-title: Inf. Syst. – year: 2009 ident: br0090 article-title: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344 – start-page: 75 year: 1975 end-page: 84 ident: br0030 article-title: Preserving order in a forest in less than logarithmic time publication-title: 16th Annual Symposium on Foundations of Computer Science (sfcs 1975) – year: 2001 ident: br0070 article-title: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach – year: 1997 ident: br0150 article-title: The Art of Computer Programming, vol. 3 – volume: 28 start-page: 965 year: 2007 end-page: 973 ident: br0100 article-title: A method for initialising the k-means clustering algorithm using kd-trees publication-title: Pattern Recognit. Lett. – volume: 21 start-page: 168 year: 1974 end-page: 173 ident: br0060 article-title: The string-to-string correction problem publication-title: J. ACM – year: 2001 ident: br0040 article-title: Parallel Programming in OpenMP – start-page: 1042 year: 2006 end-page: 1049 ident: br0010 article-title: A cache oblivious algorithm for matrix multiplication based on Peano's space filling curve publication-title: Parallel Processing and Applied Mathematics – start-page: 579 year: 2015 end-page: 587 ident: br0110 article-title: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup publication-title: International Conference on Machine Learning – volume: 34 year: 2008 ident: br0120 article-title: Anatomy of high-performance matrix multiplication publication-title: ACM Trans. Math. Softw. – year: 2007 ident: br0050 article-title: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism – start-page: 1 year: 2016 end-page: 2 ident: br0160 article-title: Correlating hardware performance events to CPU and DRAM power consumption publication-title: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS) – start-page: 44 year: 2002 end-page: 60 ident: br0140 article-title: SLURM: simple Linux utility for resource management publication-title: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP) 2003 – volume: 34 issue: 3 year: 2008 ident: 10.1016/j.jpdc.2021.04.014_br0120 article-title: Anatomy of high-performance matrix multiplication publication-title: ACM Trans. Math. Softw. doi: 10.1145/1356052.1356053 – year: 1997 ident: 10.1016/j.jpdc.2021.04.014_br0150 – year: 2001 ident: 10.1016/j.jpdc.2021.04.014_br0040 – year: 2011 ident: 10.1016/j.jpdc.2021.04.014_br0130 – year: 2009 ident: 10.1016/j.jpdc.2021.04.014_br0090 – start-page: 1 year: 2016 ident: 10.1016/j.jpdc.2021.04.014_br0160 article-title: Correlating hardware performance events to CPU and DRAM power consumption – start-page: 1042 year: 2006 ident: 10.1016/j.jpdc.2021.04.014_br0010 article-title: A cache oblivious algorithm for matrix multiplication based on Peano's space filling curve – volume: 28 start-page: 965 issue: 8 year: 2007 ident: 10.1016/j.jpdc.2021.04.014_br0100 article-title: A method for initialising the k-means clustering algorithm using kd-trees publication-title: Pattern Recognit. Lett. doi: 10.1016/j.patrec.2007.01.001 – start-page: 579 year: 2015 ident: 10.1016/j.jpdc.2021.04.014_br0110 article-title: Yinyang k-means: a drop-in replacement of the classic k-means with consistent speedup – start-page: 44 year: 2002 ident: 10.1016/j.jpdc.2021.04.014_br0140 article-title: SLURM: simple Linux utility for resource management – year: 2001 ident: 10.1016/j.jpdc.2021.04.014_br0070 – volume: 64 start-page: 175 issue: C year: 2017 ident: 10.1016/j.jpdc.2021.04.014_br0080 article-title: Improving matrix-based dynamic programming on massively parallel accelerators publication-title: Inf. Syst. doi: 10.1016/j.is.2016.06.001 – volume: 27 start-page: 175 issue: 7 year: 1992 ident: 10.1016/j.jpdc.2021.04.014_br0020 article-title: A general framework for iteration-reordering loop transformations publication-title: SIGPLAN Not. doi: 10.1145/143103.143132 – year: 2007 ident: 10.1016/j.jpdc.2021.04.014_br0050 – volume: 21 start-page: 168 issue: 1 year: 1974 ident: 10.1016/j.jpdc.2021.04.014_br0060 article-title: The string-to-string correction problem publication-title: J. ACM doi: 10.1145/321796.321811 – start-page: 75 year: 1975 ident: 10.1016/j.jpdc.2021.04.014_br0030 article-title: Preserving order in a forest in less than logarithmic time |
| SSID | ssj0011578 |
| Score | 2.3126333 |
| Snippet | •Presentation of fine tuned assignments for HPC and parallel programming courses.•All assignments tested on students in three interconnected courses.•Covering... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 74 |
| SubjectTerms | Assignments Education GPU HPC Parallel computing |
| Title | Letting future programmers experience performance-related tasks |
| URI | https://dx.doi.org/10.1016/j.jpdc.2021.04.014 |
| Volume | 155 |
| WOSCitedRecordID | wos000656871100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0743-7315 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0011578 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEF58Hbz4FuuLHLxJpJtku8lJRBT1IIIKvYVsslHbkoY2FX--M_tIa6lFBS-hpN1u2JnMfrs73zeEnCSB5C3AFW4u8JjRp4ErUi91c1w8RCxJ_VwXm-D392G7HT2YtLGhKifAiyL8-IjKfzU13ANjI3X2F-au_xRuwGcwOlzB7HD9keGRoKPSI5VaiE3Awu1pI-ev3uVyTBhwFZ8FgGeVDLvDb9AqSoT3elILC2SotouFsqTixJWjyk6AivGTFer4nQ5kdyptHgL7AAnZlwy-NlQhq_2NoSd5ee0LnbzbHYnJHQmP1ilXZpvMUmW-ZHIqLVTua-5mHXoZmwieulyPnYZbMwO83mvonHXKDAUoPaqEamkwns7qJMNH7BJ79Kiiq7QWybLHWQSxb_ni9qp9V582UaZnbPuIhlyl8wCne5oNYCZAydMGWTP2cS60F2ySBVlskXVbqcMxgXsLa3ObPJ5tcm4cxNEO4kw4iDN2EGeGgzjKQXbI8_XV0-WNa8pouKnfbFZuFoaA4vIUY7XHU-ZFXAiZME4zn4kwB5iUM4DhKK6ftySsX8OMRxFnMPVRP5H-Llkq-oXcI44QIhQAmUNYhwYBik1SwDNJIOD99gFrNgi1YxOnRmMeS530YptM2IlxPGMcz7gZxDCeDXJatym1wsrcXzM75LHBiBr7xeAhc9rt_7HdAVkdu_ghWaoGI3lEVtL36m04ODaO9AmlWItx |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Letting+future+programmers+experience+performance-related+tasks&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Bedn%C3%A1rek%2C+David&rft.au=Kruli%C5%A1%2C+Martin&rft.au=Yaghob%2C+Jakub&rft.date=2021-09-01&rft.pub=Elsevier+Inc&rft.issn=0743-7315&rft.volume=155&rft.spage=74&rft.epage=86&rft_id=info:doi/10.1016%2Fj.jpdc.2021.04.014&rft.externalDocID=S0743731521000976 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon |