CUDA application design and development
As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development...
Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Format: | E-Book Buch |
| Sprache: | Englisch |
| Veröffentlicht: |
Waltham, MA
Morgan Kaufmann
2011
Elsevier Science & Technology |
| Ausgabe: | 1 |
| Schlagworte: | |
| ISBN: | 0123884268, 9780123884268 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computingAddresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchyIncludes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material |
|---|---|
| AbstractList | As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computingAddresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchyIncludes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan. The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries. Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding. Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computingAddresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchyIncludes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material |
| Author | Farber, Rob |
| Author_xml | – sequence: 1 fullname: Farber, Rob |
| BackLink | https://cir.nii.ac.jp/crid/1130282269804664576$$DView record in CiNii |
| BookMark | eNpVkElPwzAQhY2giLb0zLUHBOIQGK8ZH9tQFqkSF-AaOYkDocEJdSj8fAypBMxhFunTG703InuucZaQIwrnFKi6SBhQiCBSGnSYO2SiYwTKOKLgTO-S0fZgCgdkFGgKHBWN98kwVpxKDVwekIn3LxBKcik1G5LT5OFyNjVtW1e56arGTQvrqyc3Na4I68bWTftqXXdIBqWpvZ1s55g8Xi3uk5toeXd9m8yWkYmBq89IoQqvOZgylpYyGksoLGSQ5QUahLLMbC64tAZZAYKLklFUGTIQhc5kHvMxOeuFjV_ZD__c1J1PN7XNmmbl03-eA3vcs96UZl2lPbNhf7CQRsBOeqxdN2_v1nfpj1oebK1NnS7mCYISqH71XFWlefXdafDCkDGlEYRSQoYsvwCH0Wx5 |
| ContentType | eBook Book |
| DBID | RYH OHILO OODEK |
| DEWEY | 005.3 |
| DOI | 10.1016/C2010-0-69090-0 |
| DatabaseName | CiNii Complete O'Reilly Online Learning: Corporate Edition O'Reilly Online Learning: Academic/Public Library Edition |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9780123884329 0123884322 |
| Edition | 1 |
| ExternalDocumentID | 9780123884329 9780123884268 EBC806486 BB07443441 |
| GroupedDBID | -VX 089 20A 38. 5O. 92K A4J AAAAS AABBV AAGAK AALRI AAORS AAXUO AAZNM ABARN ABGWT ABIAV ABLXK ABMAC ABMRC ABQPQ ABQQC ABZHZ ACHHS ACLGV ACNAM ACTDM ACXMD ADCEY ADVEM AERYV AFOJC AGAMA AHFFV AHPGB AHWGJ AIXPE AJFER AKHYG ALMA_UNASSIGNED_HOLDINGS ALTAS AMCAZ AMYDA ARRBH ASVZH AVTBZ AVWMD AZZ BA6 BADUN BBABE BJ7 BPBUR BYTKM CETPU CZZ DUGUG EBSCA ECOWB GEOUK HGY JJU LLQQT MYL NK1 NK2 OHILO OODEK PJYGV PQQKQ RYH SDK SRW XI1 6XM AADAM DRU IVK IWL |
| ID | FETCH-LOGICAL-a7036x-68620130af75e121750de0b0bcd8a80ffbec435ea82d0434f2186b8204d9b5c73 |
| ISBN | 0123884268 9780123884268 |
| ISICitedReferencesCount | 207 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=0000264716&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Feb 19 08:41:26 EST 2025 Fri Dec 05 18:38:20 EST 2025 Wed Dec 10 08:38:18 EST 2025 Thu Jun 26 23:44:21 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| LCCN | 2011038617 |
| LCCallNum_Ident | QA76.76.A65 .F378 2 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a7036x-68620130af75e121750de0b0bcd8a80ffbec435ea82d0434f2186b8204d9b5c73 |
| OCLC | 763159035 |
| PQID | EBC806486 |
| PageCount | 337 |
| ParticipantIDs | askewsholts_vlebooks_9780123884329 safari_books_v2_9780123884268 proquest_ebookcentral_EBC806486 nii_cinii_1130282269804664576 |
| PublicationCentury | 2000 |
| PublicationDate | c2011 2011 2011-10-08T00:00:00 2011-10-08 |
| PublicationDateYYYYMMDD | 2011-01-01 2011-10-08 |
| PublicationDate_xml | – year: 2011 text: c2011 |
| PublicationDecade | 2010 |
| PublicationPlace | Waltham, MA |
| PublicationPlace_xml | – name: Waltham, MA – name: Chantilly |
| PublicationYear | 2011 |
| Publisher | Morgan Kaufmann Elsevier Science & Technology |
| Publisher_xml | – name: Morgan Kaufmann – name: Elsevier Science & Technology |
| SSID | ssj0000535592 |
| Score | 2.3910844 |
| Snippet | As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software... |
| SourceID | askewsholts safari proquest nii |
| SourceType | Aggregation Database Publisher |
| SubjectTerms | Application software Application software -- Development Computer architecture Development Parallel programming (Computer science) |
| TableOfContents | Multiple GPUs -- Explicit Synchronization -- Implicit Synchronization -- The Unified Virtual Address Space -- A Simple Example -- Profiling Results -- Out-of-Order Execution with Multiple Streams -- Tip for Concurrent Kernel Execution on the Same GPU -- Atomic Operations for Implicitly Concurrent Kernels -- Tying Data to Computation -- Manually Partitioning Data -- Mapped Memory -- How Mapped Memory Works -- Summary -- 8 CUDA for All GPU and CPU Applications -- Pathways from CUDA to Multiple Hardware Backends -- The PGI CUDA x86 Compiler -- The PGI CUDA x86 Compiler -- An x86 core as an SM -- The NVIDIA NVCC Compiler -- Ocelot -- Swan -- MCUDA -- Accessing CUDA from Other Languages -- SWIG -- Copperhead -- EXCEL -- MATLAB -- Libraries -- CUBLAS -- CUFFT -- MAGMA -- phiGEMM Library -- CURAND -- Summary -- 9 Mixing CUDA and Rendering -- OpenGL -- GLUT -- Mapping GPU Memory with OpenGL -- Using Primitive Restart for 3D Performance -- Introduction to the Files in the Framework -- The Demo and Perlin Example Kernels -- The Demo Kernel -- The Demo Kernel to Generate a Colored Sinusoidal Surface -- Perlin Noise -- Using the Perlin Noise Kernel to Generate Artificial Terrain -- The simpleGLmain.cpp File -- The simpleVBO.cpp File -- The callbacksVBO.cpp File -- Summary -- 10 CUDA in a Cloud and Cluster Environments -- The Message Passing Interface (MPI) -- The MPI Programming Model -- The MPI Communicator -- MPI Rank -- Master-Slave -- Point-to-Point Basics -- How MPI Communicates -- Bandwidth -- Balance Ratios -- Considerations for Large MPI Runs -- Scalability of the Initial Data Load -- Using MPI to Perform a Calculation -- Check Scalability -- Cloud Computing -- A Code Example -- Data Generation -- Summary -- 11 CUDA for Real Problems -- Working with High-Dimensional Data -- PCA/NLPCA -- Multidimensional Scaling -- K-Means Clustering Expectation-Maximization -- Support Vector Machines -- Bayesian Networks -- Mutual information -- Force-Directed Graphs -- Monte Carlo Methods -- Molecular Modeling -- Quantum Chemistry -- Interactive Workflows -- A Plethora of Projects -- Summary -- 12 Application Focus on Live Streaming Video -- Topics in Machine Vision -- 3D Effects -- Segmentation of Flesh-colored Regions -- Edge Detection -- FFmpeg -- TCP Server -- Live Stream Application -- kernelWave(): An Animated Kernel -- kernelFlat(): Render the Image on a Flat Surface -- kernelSkin(): Keep Only Flesh-colored Regions -- kernelSobel(): A Simple Sobel Edge Detection Filter -- The launch_kernel() Method -- The simpleVBO.cpp File -- The callbacksVBO.cpp File -- Building and Running the Code -- The Future -- Machine Learning -- The Connectome -- Summary -- Listing for simpleVBO.cpp -- Works Cited -- Index -- A -- B -- C -- D -- E -- F -- G -- H -- I -- J -- K -- L -- M -- N -- O -- P -- Q -- R -- S -- T -- U -- V -- W -- X Front Cover -- CUDA Application Design and Development -- Copyright -- Dedication -- Table of Contents -- Foreword -- Preface -- 1 First Programs and How to Think in CUDA -- Source Code and Wiki -- Distinguishing CUDA from Conventional Programming with a Simple Example -- Choosing a CUDA API -- Some Basic CUDA Concepts -- Understanding Our First Runtime Kernel -- Three Rules of GPGPU Programming -- Rule 1: Get the Data on the GPU and Keep It There -- Rule 2: Give the GPGPU Enough Work to Do -- Rule 3: Focus on Data Reuse within the GPGPU to Avoid Memory Bandwidth Limitations -- Big-O Considerations and Data Transfers -- CUDA and Amdahl's Law -- Data and Task Parallelism -- Hybrid Execution: Using Both CPU and GPU Resources -- Regression Testing and Accuracy -- Silent Errors -- Introduction to Debugging -- UNIX Debugging -- NVIDIA's cuda-gdb Debugger -- The CUDA Memory Checker -- Use cuda-gdb with the UNIX ddd Interface -- Windows Debugging with Parallel Nsight -- Summary -- 2 CUDA for Machine Learning and Optimization -- Modeling and Simulation -- Fitting Parameterized Models -- Nelder-Mead Method -- Levenberg-Marquardt Method -- Algorithmic Speedups -- Machine Learning and Neural Networks -- XOR: An Important Nonlinear Machine-Learning Problem -- An Example Objective Function -- A Complete Functor for Multiple GPU Devices and the Host Processors -- Brief Discussion of a Complete Nelder-Mead Optimization Code -- Performance Results on XOR -- Performance Discussion -- Summary -- The C++ Nelder-Mead Template -- 3 The CUDA Tool Suite: Profiling a PCA/NLPCA Functor -- PCA and NLPCA -- Autoencoders -- An Example Functor for PCA Analysis -- An Example Functor for NLPCA Analysis -- Obtaining Basic Profile Information -- Gprof: A Common UNIX Profiler -- The NVIDIA Visual Profiler: Computeprof -- Parallel Nsight for Microsoft Visual Studio The Nsight Timeline Analysis -- The NVTX Tracing Library -- Scaling Behavior of the CUDA API -- Tuning and Analysis Utilities (TAU) -- Summary -- 4 The CUDA Execution Model -- GPU Architecture Overview -- Thread Scheduling: Orchestrating Performance and Parallelism via the Execution Configuration -- Relevant computeprof Values for a Warp -- Warp Divergence -- Guidelines for Warp Divergence -- Relevant computeprof Values for Warp Divergence -- Warp Scheduling and TLP -- Relevant computeprof Values for Occupancy -- ILP: Higher Performance at Lower Occupancy -- ILP Hides Arithmetic Latency -- ILP Hides Data Latency -- ILP in the Future -- Relevant computeprof Values for Instruction Rates -- Little's Law -- CUDA Tools to Identify Limiting Factors -- The nvcc Compiler -- Launch Bounds -- The Disassembler -- PTX Kernels -- GPU Emulators -- Summary -- 5 CUDA Memory -- The CUDA Memory Hierarchy -- GPU Memory -- L2 Cache -- Relevant computeprof Values for the L2 Cache -- L1 Cache -- Relevant computeprof Values for the L1 Cache -- CUDA Memory Types -- Registers -- Local memory -- Relevant computeprof Values for Local Memory Cache -- Shared Memory -- Relevant computeprof Values for Shared Memory -- Constant Memory -- Texture Memory -- Relevant computeprof Values for Texture Memory -- Global Memory -- Common Coalescing Use Cases -- Allocation of Global Memory -- Limiting Factors in the Design of Global Memory -- Relevant computeprof Values for Global Memory -- Summary -- 6 Efficiently Using GPU Memory -- Reduction -- The Reduction Template -- A Test Program for functionReduce.h -- Results -- Utilizing Irregular Data Structures -- Sparse Matrices and the CUSP Library -- Graph Algorithms -- SoA, AoS, and Other Structures -- Tiles and Stencils -- Summary -- 7 Techniques to Increase Parallelism -- CUDA Contexts Extend Parallelism -- Streams and Contexts |
| Title | CUDA application design and development |
| URI | https://cir.nii.ac.jp/crid/1130282269804664576 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=806486 https://learning.oreilly.com/library/view/~/9780123884268/?ar https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9780123884329&uid=none |
| WOSCitedRecordID | wos0000264716&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bT8IwFG4UfZAX7xEVXYyJD2Sxu3XtoyBqQkQfwPi2dJcmizrNBoSf72l3YRAT44MvHVugDd8p7fdxTs9B6BIDSSa-QXQecFO3Q2Hp3HKEDr8tblmMCIa5KjbhDof09ZU9F6W7MlVOwE0SOp-zr381NTwDY8ujs38wd9UpPIDXYHRowezQrjDi6rZINDC-venUHNKdUEVnKPdAuIgNqizGUz8qo6vr4l8dgauL_0dV-qkz4FPxwZNkSRtKskQpbMD0x5UyF-293Bmug0ZmcF1sClWoXrcLJMO2bJkcYN0loG837vtP40H1R5ZMEOMwpXirEYvMRtV9mVPJINcrIzZRk2dvsJ7DWj_JYINP4niZ7Gdc8DSubfqjHdSQB0F20VqU7KHtsvyFVqyG--hKAq7VANdywDUAXKsBfoBe7vqj3oNe1J3QuUxHNtflqRnp0eXCdSIDRJuDwwj72A9CyikWAiY-0MyIUzPEgI2Qhb184FJ2yHwncK1D1Eg-k-gIaQJj7uOAYJnoCa7QpU1kUkcc-cwUuIUuat_fm70rH3nmLeCzTNZCbYDFC2LZGtLVDLSOMOiREBsEYwudl4B56vNFYK_X7_YocE4K72jnOHp59zPTWzLQ8S8jnKCtxfQ7RY1JOo3aaDOYTeIsPSumxDexzyfT |
| linkProvider | ProQuest Ebooks |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=CUDA+application+design+and+development&rft.au=Farber%2C+Rob&rft.date=2011-01-01&rft.pub=Morgan+Kaufmann&rft.isbn=9780123884268&rft_id=info:doi/10.1016%2FC2010-0-69090-0&rft.externalDocID=BB07443441 |
| thumbnail_m | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.safaribooksonline.com%2Flibrary%2Fcover%2F9780123884268 http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97801238%2F9780123884329.jpg |

