0% found this document useful (0 votes)

79 views18 pages

Christian Eh An Sen 2

This document summarizes the Scalable Parallel Programming with CUDA article. It discusses CUDA as a parallel programming model that allows developers to leverage GPUs for general purpose computing. CUDA extends C/C++ with keywords to define parallel kernels for GPU execution. The document outlines the hardware platforms, CUDA programming model, example applications, and concludes that while GPUs provide vastly more processing power than CPUs, programming for GPUs also presents challenges.

Uploaded by

Sadeem Al-Saidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views18 pages

Christian Eh An Sen 2

Uploaded by

Sadeem Al-Saidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Scalable Parallel Programming

with CUDA
John Nickolls, Ian Buck, Michael Garland and Kevin Skadron

Presentation by Christian Hansen

Article Published in ACM Queue, March 2008

● About the Authors
● Hardware Platform
● What is CUDA?
● Programming in CUDA
● GPGPU/MC Programming Approaches
● Conclusion

Outline
● John Nickolls
– Director of Architecture at NVIDIA
– MS and PhD degrees in electrical engineering
from Stanford University
– Previously at Broadcom and Sun Microsystems
● Ian Buck
– GPU-Compute Software Manager at NVIDIA
– PhD in computer science from Stanford
– Has previously worked on Brook

About the Authors

● Michael Garland
– Research scientist at NVIDIA
– PhD in computer science at Carnegie Mellon
University
● Kevin Skadron
– Associate Professor of Computer Science at the
University of Virginia, but currently at sabbatical
at NVIDIA Research
– PhD in Computer Science from Princeton
University

About the Authors

● GPU vs. CPU
– GPU: Few instructions but very, very fast execution
Uses very fast GDDR3 RAM
– CPU: Lots of instructions, but slower execution
Uses slower DDR2 or DDR3 RAM (but has direct
access to more memory than GPUs)

Hardware Platform
Hardware Platform
● CUDA is a minimal extension to C and C++ (like CILK,
but not quite as easy)
● A serial program calls parallel kernels that may be a
function or a full program
● Function type qualifiers
– __device__, __global__, __host__
● Value type qualifiers
– __device__, __constant__, __shared__

What is CUDA?
● Kernels execute over a set of parallel threads
● Threads are organized in a hierarchy of grids of thread
blocks
● Blocks can have up to 3 dimensions and contain up to
512 threads
– Threads in blocks can communicate
● Grids can also have up to 3 dimensions and 65,536²
blocks
– No communication between blocks

What is CUDA?
What is CUDA
● Computing y <- ax + y with a Serial ● Computing y <- ax + y in parallel
Loop using CUDA

void saxpy_serial(int n, float global

alpha, float *x, float *y) void saxpy_parallel(int n, float
{ alpha, float *x, float *y)
for(int i = 0; i<n; ++i) {
y[i] = alpha*x[i] + y[i]; int i = blockIdx.x*blockDim.x +
} threadIdx.x;
if( i<n ) y[i] = alpha*x[i] + y[i];
// Invoke serial SAXPY kernel }
saxpy_serial(n, 2.0, x, y);
// Invoke parallel SAXPY kernel
(256 threads per block)
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks,
256>>>(n, 2.0, x, y);

Programming in CUDA
Programming in CUDA
● Lots of different examples on nvidia.com
– Examples are image analysis (e.g. facial recognition), MRI
mapping, ray tracing, neural networks, and molecular
dynamics simulation
– Speed-ups from 1.3x (numerical weather prediction) to
250x (graphic-card cluster for astrophysics simulations)

Other Applications
N-Body Simulation
● OpenCL
● CTM
● RapidMind

GPGPU/MC Approaches
● Extremely high (and cheap) processing power
– 8800GTS: 640 GFLOP/s
– Core2Duo 2.66GHz: 17 GFLOP/s
– Core2Quad 3GHz (3,500kr): 43 GFLOP/s
– 2 x 8800GT(2,000kr): 1 TFLOP/s
– 8600GTM: 30 GFLOP/s

Conclusion
● Is GPGPU taking over multi-core CPUs?
– No (not yet, anyway)
● GPGPU programming has some problems
– Only applicable to large applications (or so it seems)
– When is it worth it to do it on the GPU?
– Possible problems with optimization
– Most programmers not used to working with GPUs
● Many rumors in the press on unified CPU and GPU in
the future, but nothing confirmed yet.

Conclusion
● Nice article, well written
● Gives good insight into what CUDA is, but the hardware
description is lacking
● Good sales speech, does not mention possible
problems with CUDA

Presenters Opinion
● Thank you

All Done

Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
Introduction - CUDA C Programming Guide
No ratings yet
Introduction - CUDA C Programming Guide
573 pages
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
No ratings yet
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
71 pages
Cuda
No ratings yet
Cuda
69 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Endsem Imp HPC Unit 5
No ratings yet
Endsem Imp HPC Unit 5
24 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
CUDA Lab Instruction
No ratings yet
CUDA Lab Instruction
40 pages
Puting Experiences
No ratings yet
Puting Experiences
15 pages
Course 7
No ratings yet
Course 7
21 pages
CUDA 1 - Introduction To GPU, CUDA
No ratings yet
CUDA 1 - Introduction To GPU, CUDA
21 pages
MCUDA: An Efficient Implementation of CUDA Kernels On Multi-Cores
No ratings yet
MCUDA: An Efficient Implementation of CUDA Kernels On Multi-Cores
19 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
CUDA
No ratings yet
CUDA
20 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
76 pages
Cuda-: An Emerging Technology That Can Make Robots Reflex Action Faster
No ratings yet
Cuda-: An Emerging Technology That Can Make Robots Reflex Action Faster
11 pages
Gpu Cuda Part2
No ratings yet
Gpu Cuda Part2
15 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
CUDA Programming On Nvidia Gpus: Mike Giles
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
21 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Cuda
No ratings yet
Cuda
15 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Cuda
No ratings yet
Cuda
25 pages
Cuda PDF
No ratings yet
Cuda PDF
18 pages
Unit 4
No ratings yet
Unit 4
48 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
No ratings yet
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
28 pages
CUDA Programming
No ratings yet
CUDA Programming
35 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
ACA Unit3 Revised
No ratings yet
ACA Unit3 Revised
53 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
28 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Graphics Processing Units Paper PDF
No ratings yet
Graphics Processing Units Paper PDF
14 pages
CUDA C Programming Guide
100% (1)
CUDA C Programming Guide
275 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
GPU Computing Revolution CUDA
100% (1)
GPU Computing Revolution CUDA
5 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
CUDA
No ratings yet
CUDA
33 pages
CUDA C Programming Guide PDF
No ratings yet
CUDA C Programming Guide PDF
405 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
Compute Unified Device Architecture
No ratings yet
Compute Unified Device Architecture
6 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
LJ 211
No ratings yet
LJ 211
118 pages
CUDA Installation Guide Linux
No ratings yet
CUDA Installation Guide Linux
45 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
BTech DSE Curriculum 2022 Updated
No ratings yet
BTech DSE Curriculum 2022 Updated
42 pages
Automated Attendance System Using Face Recognition
No ratings yet
Automated Attendance System Using Face Recognition
78 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
Heterogeneous Computing With CPU
No ratings yet
Heterogeneous Computing With CPU
14 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
AI System Prompts en
No ratings yet
AI System Prompts en
19 pages
Tensorflow Installation Errors
No ratings yet
Tensorflow Installation Errors
23 pages
Hashcat 2024 - Bruteforcing 3.3
No ratings yet
Hashcat 2024 - Bruteforcing 3.3
64 pages
Betriebsanleitung Nvidia Quatro 5500
No ratings yet
Betriebsanleitung Nvidia Quatro 5500
2 pages
g16 Plat A03
No ratings yet
g16 Plat A03
2 pages
S4421 Gpu Computing With Matlab
No ratings yet
S4421 Gpu Computing With Matlab
27 pages
High Performance Computing-1 PDF
No ratings yet
High Performance Computing-1 PDF
15 pages
Cifar 100 Dataset
No ratings yet
Cifar 100 Dataset
4 pages
Pratham Resume July
No ratings yet
Pratham Resume July
2 pages
Iber R
No ratings yet
Iber R
12 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
Crud Hello
No ratings yet
Crud Hello
4 pages
SnuCL-Tr: OpenCL To CUDA Quick Start Guide
No ratings yet
SnuCL-Tr: OpenCL To CUDA Quick Start Guide
4 pages
UserGuide 4
No ratings yet
UserGuide 4
1 page
GPU4S Benchmark
No ratings yet
GPU4S Benchmark
6 pages
Labib Asari
No ratings yet
Labib Asari
1 page
Nvidia t1000 Datasheet 1987414 r4
No ratings yet
Nvidia t1000 Datasheet 1987414 r4
1 page
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Christian Eh An Sen 2

Uploaded by

Christian Eh An Sen 2

Uploaded by

Scalable Parallel Programming

Presentation by Christian Hansen

Article Published in ACM Queue, March 2008

About the Authors

About the Authors

void saxpy_serial(int n, float global

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Christian Eh An Sen 2

Uploaded by

Christian Eh An Sen 2

Uploaded by

Scalable Parallel Programming

Presentation by Christian Hansen

Article Published in ACM Queue, March 2008

About the Authors

About the Authors

void saxpy_serial(int n, float __global__

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

void saxpy_serial(int n, float global