ВОЗМОЖНОСТИ ПРИМЕНЕНИЯ ВЫЧИСЛЕНИЙ НА ГРАФИЧЕСКИХ УСКОРИТЕЛЯХ ПРИ РАСЧЕТАХ СООРУЖЕНИЙ

Gespeichert in:
Bibliographische Detailangaben
Titel: ВОЗМОЖНОСТИ ПРИМЕНЕНИЯ ВЫЧИСЛЕНИЙ НА ГРАФИЧЕСКИХ УСКОРИТЕЛЯХ ПРИ РАСЧЕТАХ СООРУЖЕНИЙ
Autoren: ЯКУШЕВ ВЛАДИМИР ЛАВРЕНТЬЕВИЧ, ФИЛИМОНОВ АНТОН ВАЛЕРЬЕВИЧ, СОЛДАТОВ ПАВЕЛ ЮРЬЕВИЧ
Verlagsinformationen: Федеральное государственное бюджетное образовательное учреждение высшего образования «Национальный исследовательский Московский государственный строительный университет»
Publikationsjahr: 2013
Bestand: CyberLeninka (Scientific Electronic Library) / Научная электронная библиотека «Киберленинка»
Schlagwörter: GPU,ПАРАЛЛЕЛЬНЫЕ ВЫЧИСЛЕНИЯ,PARALLEL COMPUTING,SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS,CHOLESKY FACTORIZATION,COMPUTER AIDED ENGINEERING,ГРАФИЧЕСКИЕ ПРОЦЕССОРЫ (GPU),РАЗЛОЖЕНИЕ ХОЛЕЦКОГО,СИСТЕМЫ АВТОМАТИЗИРОВАННОГО ПРОЕКТИРОВАНИЯ
Beschreibung: Предложен способ адаптации прямого решателя систем линейных алгебраических уравнений для вычислительных систем, использующих графические ускорители (GPU). Описан опыт пошагового повышения быстродействия. Перечислены проблемы, возникшие при работе с графическими процессорами, рассмотрены варианты их решения. Исследовалось влияние различных факторов на эффективность решателя. Приведены результаты тестирования для конечно-элементных моделей реальных строительных объектов. ; Computer aided design (CAD) and computer aided engineering (CAE) systems are significant tools in modern construction industry. More computations have to be run and handled to achieve the desired accuracy for more detailed models. Therefore, solver of sparse systems of linear algebraic equations is an important and time-consuming part of such software. Raising productivity of conventional clusters has become more complicated. Graphics processor units (GPU) may reach many folds higher productivity than standard CPU, especially in massive data operations. The paper suggests simple and productive technique of speeding up existing solver by implementation of GPU computing.The solver performs Cholesky factorization and is effectively omp-parallelized. Profiling indicated that matrix multiplications executed by standard BLAS library took up to eighty per cent of solver time running. Hence it was possible to distribute tasks between CPU and GPU dynamically by slight code modifications using standard BLAS interface.Proper matrices sizes were identified as data transfer between CPU and GPU. Data transfer takes too long, and multiplication of smaller matrices on GPU would slow down the solver. Allocation of pinned memory improved cooperation between processing units, while enabling the asynchronous transfer increased the load of the GPU. Cuda streams were associated with every omp thread to avoid queues of GPU calls. All the settings may be considerably different depending on hardware and software available, so tests were run on multiple computer ...
Publikationsart: text
Dateibeschreibung: text/html
Sprache: unknown
Verfügbarkeit: http://cyberleninka.ru/article/n/vozmozhnosti-primeneniya-vychisleniy-na-graficheskih-uskoritelyah-pri-raschetah-sooruzheniy
http://cyberleninka.ru/article_covers/16461743.png
Dokumentencode: edsbas.120713B6
Datenbank: BASE
Beschreibung
Abstract:Предложен способ адаптации прямого решателя систем линейных алгебраических уравнений для вычислительных систем, использующих графические ускорители (GPU). Описан опыт пошагового повышения быстродействия. Перечислены проблемы, возникшие при работе с графическими процессорами, рассмотрены варианты их решения. Исследовалось влияние различных факторов на эффективность решателя. Приведены результаты тестирования для конечно-элементных моделей реальных строительных объектов. ; Computer aided design (CAD) and computer aided engineering (CAE) systems are significant tools in modern construction industry. More computations have to be run and handled to achieve the desired accuracy for more detailed models. Therefore, solver of sparse systems of linear algebraic equations is an important and time-consuming part of such software. Raising productivity of conventional clusters has become more complicated. Graphics processor units (GPU) may reach many folds higher productivity than standard CPU, especially in massive data operations. The paper suggests simple and productive technique of speeding up existing solver by implementation of GPU computing.The solver performs Cholesky factorization and is effectively omp-parallelized. Profiling indicated that matrix multiplications executed by standard BLAS library took up to eighty per cent of solver time running. Hence it was possible to distribute tasks between CPU and GPU dynamically by slight code modifications using standard BLAS interface.Proper matrices sizes were identified as data transfer between CPU and GPU. Data transfer takes too long, and multiplication of smaller matrices on GPU would slow down the solver. Allocation of pinned memory improved cooperation between processing units, while enabling the asynchronous transfer increased the load of the GPU. Cuda streams were associated with every omp thread to avoid queues of GPU calls. All the settings may be considerably different depending on hardware and software available, so tests were run on multiple computer ...