Suchergebnisse - "communication-avoiding algorithm"

  1. 1

    An efficient randomized QLP algorithm for approximating the singular value decomposition von Kaloorazi, M.F., Liu, K., Chen, J., de Lamare, R.C.

    ISSN: 0020-0255, 1872-6291
    Veröffentlicht: Elsevier Inc 01.11.2023
    Veröffentlicht in Information sciences (01.11.2023)
    “… The rank-revealing pivoted QLP decomposition approximates the computationally prohibitive singular value decomposition (SVD) via two consecutive column-pivoted …”
    Volltext
    Journal Article
  2. 2

    Parallel Tall-and-Skinny QR Factorization Based on LU-CholeskyQR Algorithm von Uchino, Yuki, Imamura, Toshiyuki

    ISSN: 2168-9253
    Veröffentlicht: IEEE 02.09.2025
    “… We present optimal parallel QR factorization algorithms with reduced communication overhead. QR factorization is widely applied to solve various problems in …”
    Volltext
    Tagungsbericht
  3. 3

    A communication-avoiding implicit–explicit method for a free-surface ocean model von Newman, Christopher, Womeldorff, Geoffrey, Knoll, Dana A., Chacón, Luis

    ISSN: 0021-9991, 1090-2716
    Veröffentlicht: United States Elsevier Inc 15.01.2016
    Veröffentlicht in Journal of computational physics (15.01.2016)
    “… We examine a nonlinear elimination method for the free-surface ocean equations based on barotropic–baroclinic decomposition. The two dimensional scalar …”
    Volltext
    Journal Article
  4. 4

    Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters von Liu, Yang, Ding, Nan, Sao, Piyush, Williams, Samuel, Li, Xiaoye Sherry

    ISSN: 2167-4337
    Veröffentlicht: ACM 11.11.2023
    “… This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework …”
    Volltext
    Tagungsbericht
  5. 5

    Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree von Ma, Linjian, Solomonik, Edgar

    ISSN: 1530-2075
    Veröffentlicht: IEEE 01.05.2021
    “… The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor …”
    Volltext
    Tagungsbericht
  6. 6

    Communication-Avoiding Cholesky-QR2 for Rectangular Matrices von Hutter, Edward, Solomonik, Edgar

    ISSN: 1530-2075
    Veröffentlicht: IEEE 01.05.2019
    “… Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We …”
    Volltext
    Tagungsbericht
  7. 7

    A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices von Sao, Piyush, Li, Xiaoye Sherry, Vuduc, Richard

    ISSN: 1530-2075
    Veröffentlicht: IEEE 01.05.2018
    “… We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm …”
    Volltext
    Tagungsbericht
  8. 8

    A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems von Sao, Piyush, Xing Liu, Vuduc, Richard, Xiaoye Li

    ISSN: 1530-2075
    Veröffentlicht: IEEE 01.05.2015
    “… This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds …”
    Volltext
    Tagungsbericht
  9. 9

    A massively parallel tensor contraction framework for coupled-cluster computations von Solomonik, Edgar, Matthews, Devin, Hammond, Jeff R., Stanton, John F., Demmel, James

    ISSN: 0743-7315, 1096-0848
    Veröffentlicht: Elsevier Inc 01.12.2014
    Veröffentlicht in Journal of parallel and distributed computing (01.12.2014)
    “… Precise calculation of molecular electronic wavefunctions by methods such as coupled-cluster requires the computation of tensor contractions, the cost of which …”
    Volltext
    Journal Article
  10. 10

    Multiscale high-order/low-order (HOLO) algorithms and applications von Chacón, L., Chen, G., Knoll, D.A., Newman, C., Park, H., Taitano, W., Willert, J.A., Womeldorff, G.

    ISSN: 0021-9991, 1090-2716
    Veröffentlicht: Cambridge Elsevier Inc 01.02.2017
    Veröffentlicht in Journal of computational physics (01.02.2017)
    “… We review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging …”
    Volltext
    Journal Article
  11. 11

    Communication Avoiding Block Low-Rank Parallel Multifrontal Triangular Solve with Many Right-Hand Sides von Amestoy, Patrick, Boiteau, Olivier, Buttari, Alfredo, Gerest, Matthieu, Jézéquel, Fabienne, L’Excellent, Jean-Yves, Mary, Theo

    ISSN: 0895-4798, 1095-7162
    Veröffentlicht: Society for Industrial and Applied Mathematics 01.01.2024
    Veröffentlicht in SIAM journal on matrix analysis and applications (01.01.2024)
    “… Block low-rank (BLR) compression can significantly reduce the memory and time costs of parallel sparse direct solvers. In this paper, we investigate the …”
    Volltext
    Journal Article
  12. 12

    Parallel Fast Multipole Method accelerated FFT on HPC clusters von Mehta, Chahak, Karthi, Amarnath, Jetly, Vishrut, Chaudhury, Bhaskar

    ISSN: 0167-8191, 1872-7336
    Veröffentlicht: Elsevier B.V 01.07.2021
    Veröffentlicht in Parallel computing (01.07.2021)
    “… With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest …”
    Volltext
    Journal Article
  13. 13

    Multiscale high-order/low-order (HOLO) algorithms and applications von Chacon, Luis, Chen, Guangye, Knoll, Dana Alan, Newman, Christopher Kyle, Park, HyeongKae, Taitano, William, Willert, Jeff A., Womeldorff, Geoffrey Alan

    ISSN: 0021-9991, 1090-2716
    Veröffentlicht: United States Elsevier 11.11.2016
    Veröffentlicht in Journal of computational physics (11.11.2016)
    “… Here, we review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging …”
    Volltext
    Journal Article
  14. 14

    Communication-Avoiding Recursive Aggregation von Sun, Yihao, Kumar, Sidharth, Gilray, Thomas, Micinski, Kristopher

    ISSN: 2168-9253
    Veröffentlicht: IEEE 31.10.2023
    “… Recursive aggregation has been of considerable interest due to its unifying a wide range of deductive-analytic workloads, including social-media mining and …”
    Volltext
    Tagungsbericht
  15. 15

    Translational process: Mathematical software perspective von Dongarra, Jack, Gates, Mark, Luszczek, Piotr, Tomov, Stanimire

    ISSN: 1877-7503, 1877-7511
    Veröffentlicht: Netherlands Elsevier B.V 01.05.2021
    Veröffentlicht in Journal of computational science (01.05.2021)
    “… Each successive generation of computer architecture has brought new challenges to achieving high performance mathematical solvers, necessitating development …”
    Volltext
    Journal Article
  16. 16

    Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition von Magee, Daniel J., Niemeyer, Kyle E.

    ISSN: 0021-9991, 1090-2716
    Veröffentlicht: Cambridge Elsevier Inc 15.03.2018
    Veröffentlicht in Journal of computational physics (15.03.2018)
    “… •A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation …”
    Volltext
    Journal Article
  17. 17

    Reconstructing Householder vectors from Tall-Skinny QR von Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.

    ISSN: 0743-7315, 1096-0848
    Veröffentlicht: United States Elsevier Inc 01.11.2015
    Veröffentlicht in Journal of parallel and distributed computing (01.11.2015)
    “… The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more …”
    Volltext
    Journal Article
  18. 18

    Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems von Magee, Daniel J., Walker, Anthony S., Niemeyer, Kyle E.

    ISSN: 0920-8542, 1573-0484
    Veröffentlicht: New York Springer US 01.02.2021
    Veröffentlicht in The Journal of supercomputing (01.02.2021)
    “… Applications that exploit the architectural details of high-performance computing (HPC) systems have become increasingly invaluable in academia and industry …”
    Volltext
    Journal Article
  19. 19

    Reducing Communication in Graph Neural Network Training von Tripathy, Alok, Yelick, Katherine, Buluc, Aydin

    ISSN: 2167-4329
    Veröffentlicht: United States IEEE 01.11.2020
    “… Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this …”
    Volltext
    Tagungsbericht Journal Article
  20. 20

    Communication-Avoiding Symmetric-Indefinite Factorization von Ballard, Grey, Becker, Dulceneia, Demmel, James, Dongarra, Jack, Druinsky, Alex, Peled, Inon, Schwartz, Oded, Toledo, Sivan, Yamazaki, Ichitaro

    ISSN: 0895-4798, 1095-7162
    Veröffentlicht: United States SIAM 01.01.2014
    Veröffentlicht in SIAM journal on matrix analysis and applications (01.01.2014)
    “… We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular …”
    Volltext
    Journal Article