Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters
We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, t...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 26; no. 7; pp. 1861 - 1873 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.07.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1045-9219, 1558-2183 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, the computation is decomposed into tasks (subtasks) and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring the insights into the design. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load-balancing strategies. The final code achieves up to 41 percent performance improvement over a non-multithreaded counterpart. Through detailed comparison, we also show its advantages over other well-known Barnes-Hut implementations, both in programming complexity and in performance. |
|---|---|
| AbstractList | We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS [Formula Omitted] X programming style. Within a node, the computation is decomposed into tasks (subtasks) and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring the insights into the design. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load-balancing strategies. The final code achieves up to 41 percent performance improvement over a non-multithreaded counterpart. Through detailed comparison, we also show its advantages over other well-known Barnes-Hut implementations, both in programming complexity and in performance. We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, the computation is decomposed into tasks (subtasks) and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring the insights into the design. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load-balancing strategies. The final code achieves up to 41 percent performance improvement over a non-multithreaded counterpart. Through detailed comparison, we also show its advantages over other well-known Barnes-Hut implementations, both in programming complexity and in performance. |
| Author | Snir, Marc Junchao Zhang Behzad, Babak |
| Author_xml | – sequence: 1 surname: Junchao Zhang fullname: Junchao Zhang email: jczhang@anl.gov organization: Math. & Comput. Sci. (MCS) Div., Argonne Nat. Lab., Argonne, IL, USA – sequence: 2 givenname: Babak surname: Behzad fullname: Behzad, Babak email: bbehza2@illinois.edu organization: Comput. Sci. Dept., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA – sequence: 3 givenname: Marc surname: Snir fullname: Snir, Marc email: snir@anl.gov organization: Math. & Comput. Sci. (MCS) Div., Argonne Nat. Lab., Argonne, IL, USA |
| BookMark | eNp9kD1PwzAQhi1UJNrCD0AskZhT_BE79ljaQpGKQKLMlptcSqo0LrYz9N_jKhUDA9OddM9zp3tHaNDaFhC6JXhCCFYP6_f5x4Rikk0oY4Rm7AINCecypUSyQexxxlNFibpCI-93OJIcZ0O0mIOvt21iq8Qkr10T6vDlwJRQJo_GteDTZReSabO1Lk72SWVdjxXWQTJrOh_A-Wt0WZnGw825jtHn02I9W6art-eX2XSVFoyrkFJmmMSy2uTGiKKETalEXmaSGANgsMpLEEJwTGmVywwoxVkRFWby0mwqWbAxuu_3Hpz97sAHvbOda-NJTYRSlIv4WKRITxXOeu-g0gdX7407aoL1KS19Skuf0tLntKKT_3GKOphQ2zY4Uzf_mne9WQPA7yUhWc4pYT8Sq3kx |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1007_s42452_020_2386_z crossref_primary_10_1016_j_ascom_2021_100466 |
| Cites_doi | 10.1016/j.cpc.2011.12.013 10.1137/1.9780898719604 10.1145/1693453.1693482 10.1109/ICPPW.2009.14 10.1088/1742-6596/180/1/012037 10.1038/324446a0 10.1177/1094342012440585 10.1051/0004-6361/201118085 10.1145/1787275.1787323 10.1109/ISCA.1995.524546 10.1016/0021-9991(87)90140-9 10.1145/2048066.2048104 10.1145/169627.169640 10.1016/j.parco.2011.10.003 10.1177/1094342006064503 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2015 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2015 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPDS.2014.2331243 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) Online IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 1873 |
| ExternalDocumentID | 3760182581 10_1109_TPDS_2014_2331243 6837521 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Advanced Scientific Computing Research grantid: DE-AC02-06CH11357 – fundername: Office of Science grantid: DE-AC02-06CH11357 funderid: 10.13039/100006132 – fundername: DOE grantid: 1205852 funderid: 10.13039/100000015 – fundername: Department of Energy grantid: DE-AC02-06CH11357 funderid: 10.13039/100000015 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AGSQL AHBIQ AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D RIG |
| ID | FETCH-LOGICAL-c359t-23a3808fb7aa6cdebd967d481aaeea097de6665022f784e2204c3a33a7dabf8c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000356138700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Mon Jun 30 07:09:18 EDT 2025 Sat Nov 29 03:36:08 EST 2025 Tue Nov 18 22:24:37 EST 2025 Wed Aug 27 02:52:30 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Keywords | cluster n-body PGAS multicore Barnes-Hut |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c359t-23a3808fb7aa6cdebd967d481aaeea097de6665022f784e2204c3a33a7dabf8c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 1699256001 |
| PQPubID | 85437 |
| PageCount | 13 |
| ParticipantIDs | crossref_primary_10_1109_TPDS_2014_2331243 ieee_primary_6837521 proquest_journals_1699256001 crossref_citationtrail_10_1109_TPDS_2014_2331243 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-July-1 2015-7-1 20150701 |
| PublicationDateYYYYMMDD | 2015-07-01 |
| PublicationDate_xml | – month: 07 year: 2015 text: 2015-July-1 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2015 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ishiyama (ref30) 0 kale (ref22) 2011 ref34 ref31 ref32 ref10 barnes (ref8) 1986; 324 (ref12) 0 ref17 (ref6) 2005 (ref7) 2010 (ref1) 0 (ref18) 2011 singh (ref19) 1993 kale (ref21) 0 shan (ref3) 0 ref24 kurzak (ref33) 2011 aarseth (ref13) 1974; 37 ref26 mucci (ref16) 0 ref25 ref20 reinders (ref23) 2007 woo (ref9) 1995 ref28 ref27 ref29 zhang (ref11) 2011 barriuso (ref4) 1994 ref5 herlihy (ref15) 2008 stevens (ref2) 2009 salmon (ref14) 1991 |
| References_xml | – year: 1993 ident: ref19 article-title: Parallel hierarchical N-body methods and their implications for multiprocessors – ident: ref20 doi: 10.1016/j.cpc.2011.12.013 – start-page: 7 year: 0 ident: ref16 article-title: PAPI: A portable interface to hardware performance counters publication-title: Proc Dept of Defense HPCMP Users Group Conference – year: 1994 ident: ref4 article-title: SHMEM user's guide for C – year: 2011 ident: ref33 article-title: Multithreading in the PLASMA library publication-title: Multi- and Many-Core Technologies Programming Algorithms and Applications – ident: ref32 doi: 10.1137/1.9780898719604 – year: 0 ident: ref12 – year: 2011 ident: ref22 article-title: Charm++ for productivity and performance: A submission to the 2011 HPC class II challenge – year: 2005 ident: ref6 article-title: UPC language specifications, v1.2 – year: 2010 ident: ref7 publication-title: Programming and Languages – ident: ref28 doi: 10.1145/1693453.1693482 – ident: ref27 doi: 10.1109/ICPPW.2009.14 – ident: ref31 doi: 10.1088/1742-6596/180/1/012037 – year: 0 ident: ref3 article-title: Accelerating applications at scale using one-sided communication publication-title: Proc 6th Conf Partitioned Global Address Space Program Models – volume: 324 start-page: 446 year: 1986 ident: ref8 article-title: A hierarchical O(nlogn) force-calculation algorithm publication-title: Nature doi: 10.1038/324446a0 – year: 2009 ident: ref2 article-title: Architectures and technology for extreme scale computing publication-title: ASCR Sci Grand Challenges Workshop Ser – year: 0 ident: ref1 – start-page: 91 year: 0 ident: ref21 article-title: Charm++: A portable concurrent object oriented system based on c++ publication-title: Proc 8th Annu Conf Object-Oriented Program Syst Languages Appl – year: 2008 ident: ref15 publication-title: The Art of Multiprocessor Programming – ident: ref26 doi: 10.1177/1094342012440585 – ident: ref24 doi: 10.1051/0004-6361/201118085 – volume: 37 start-page: 183 year: 1974 ident: ref13 article-title: A comparison of numerical methods for the study of star cluster dynamics publication-title: Astron Astrophys – start-page: 75?85 year: 2011 ident: ref11 article-title: Optimizing the barnes-hut algorithm in UPC publication-title: Proc 2011 Int Conf High Perform Comput Netw Storage Anal – year: 2007 ident: ref23 publication-title: Intel Threading Building Blocks Outfitting C++ for Multi-core Processor Parallelism – ident: ref25 doi: 10.1145/1787275.1787323 – start-page: 24 year: 1995 ident: ref9 article-title: Methodological considerations and characterization of the splash-2 parallel application suite publication-title: Proc 22nd Annu Int Symp Comput Archit doi: 10.1109/ISCA.1995.524546 – ident: ref29 doi: 10.1016/0021-9991(87)90140-9 – start-page: 5:1 year: 0 ident: ref30 article-title: 4.45 pflops astrophysical n-body simulation on k computer: The gravitational trillion-body problem publication-title: Proc Int Conf High Perform Comput Netw Storage Anal – ident: ref17 doi: 10.1145/2048066.2048104 – ident: ref10 doi: 10.1145/169627.169640 – ident: ref34 doi: 10.1016/j.parco.2011.10.003 – year: 2011 ident: ref18 article-title: OpenMP application program interface version 3.1 – year: 1991 ident: ref14 article-title: Parallel hierarchical n-body methods – ident: ref5 doi: 10.1177/1094342006064503 |
| SSID | ssj0014504 |
| Score | 2.1639352 |
| Snippet | We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1861 |
| SubjectTerms | Algorithm design and analysis Force Instruction sets Multicore processing Octrees Product introduction Programming Synchronization |
| Title | Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters |
| URI | https://ieeexplore.ieee.org/document/6837521 https://www.proquest.com/docview/1699256001 |
| Volume | 26 |
| WOSCitedRecordID | wos000356138700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB5W8aAH3-LqKjl4EqtpkzbN0SceZBF8sLeSJlMR1l3Zh7_fSdpdFEXwVkgmlPk6mZnOC-AoRWnJUI6j3JK4ScUxMk7FkeJOV7wqOYZWSs93qtvNez1934KTeS0MIobkMzz1jyGW74Z26n-VnWXkTaW-anxBqayu1ZpHDGQaRgWSd5FGmsSwiWDGXJ893l89-CQueZoIQfpMfNNBYajKj5s4qJebtf-92DqsNmYkO69x34AWDjZhbTaigTUSuwkrX_oNbsH1VcjXYMOKGVaX3hKUxqFjF2ZEl150O52w8_7LcEQrb4wM2nqb73XJLvtT31VhvA1PN9ePl7dRM0chsiLVkygRRuQ8r0plTGYdlk5nysk8NgbRcK0ckhOTkjavVC4xSbi0RCKMcqascit2YHEwHOAuMJFJgy6jZUfSLzUdSvpMizJJnDGJagOfcbawTZNxP-uiXwRng-vCg1F4MIoGjDYcz0ne6w4bf23e8tyfb2wY34bODL6ikcFxEWda1wbd3u9U-7BMZ6d18m0HFiejKR7Akv2YvI5Hh-Hz-gQlGMwW |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-QwDLYQILF7ABZ2tbO8cuCEKKRN2jRHnhq0wwiJAXGr0sRdIQ0zaB78_nXSzggEQuJWKXFb2XVs1_ZngP0UpSVHOY5yS-omFcfIOBVHijtd8arkGKCU7juq280fHvTNAhzOe2EQMRSf4ZG_DLl8N7RT_6vsOKNoKvVd40uplAmvu7XmOQOZhmGBFF-kkSZFbHKYMdfHvZvzW1_GJY8SIciiiTdWKIxVeXcWBwNzufa1V1uH1caRZCe15H_AAg42YG02pIE1OrsB318hDm7CxXmo2GDDihlWN9-SMI1Dx07NiI69qD2dsJP-v-GIVp4YubT1No92yc76U4-rMP4Jd5cXvbN21ExSiKxI9SRKhBE5z6tSGZNZh6XTmXIyj41BNFwrhxTGpGTPK5VLTBIuLZEIo5wpq9yKX7A4GA7wNzCRSYMuo2VH-i813ZQsmhZlkjhjEtUCPuNsYRuYcT_tol-EcIPrwguj8MIoGmG04GBO8lxjbHy2edNzf76xYXwLtmfiKxotHBdxpnXt0v35mGoPVtq9607Ruer-3YJv9Jy0LsXdhsXJaIo7sGxfJo_j0W741P4D7OfPXQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Design+of+a+Multithreaded+Barnes-Hut+Algorithm+for+Multicore+Clusters&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Junchao+Zhang&rft.au=Behzad%2C+Babak&rft.au=Snir%2C+Marc&rft.date=2015-07-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=26&rft.issue=7&rft.spage=1861&rft.epage=1873&rft_id=info:doi/10.1109%2FTPDS.2014.2331243&rft.externalDocID=6837521 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |