0% found this document useful (0 votes)

14 views2 pages

Allocate The Device Memory Where We Will Copy M

The document describes the steps to perform a matrix multiplication on a CPU (host) and GPU (device). It includes: 1) Allocating device memory for the matrices and copying the host matrix to device memory 2) Performing the multiplication on the host CPU using nested for loops 3) Initializing matrices on the host and copying them to the device 4) Launching a kernel to perform the multiplication on the device using thread blocks and grids 5) Copying the result back to host memory and freeing device memory

Uploaded by

onkarbabhale69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views2 pages

Allocate The Device Memory Where We Will Copy M

Uploaded by

onkarbabhale69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

// Allocate the device memory where we will copy M to

Matrix Md;
Md.width = WIDTH;
Md.height = WIDTH;
Md.pitch = WIDTH;
int size = WIDTH * WIDTH * sizeof(float);
cudaMalloc((void**)&Md.elements, size);
// Copy M from the host to the device
cudaMemcpy(Md.elements, M.elements, size, cudaMemcpyHostToDevice);
// Read M from the device to the host into P
cudaMemcpy(P.elements, Md.elements, size, cudaMemcpyDeviceToHost);
...
// Free device memory
cudaFree(Md.elements);

Step 2: Simple Host Code in C

// Matrix multiplication on the (CPU) host in double precision
// for simplicity, we will assume that all dimensions are equal
void MatrixMulOnHost(const Matrix M, const Matrix N, Matrix P)
{
for (int i = 0; i < M.height; ++i)
for (int j = 0; j < N.width; ++j) {
double sum = 0;
for (int k = 0; k < M.width; ++k) {
double a = M.elements[i * M.width + k];
double b = N.elements[k * N.width + j];
sum += a * b;
}
P. elements[i * N.width + j] = sum;
}

Step 3: Host-side Main Program Code

int main(void) {
// Allocate and initialize the matrices
Matrix M = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix N = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix P = AllocateMatrix(WIDTH, WIDTH, 0);
// M * N on the device
MatrixMulOnDevice(M, N, P);
// Free matrices
FreeMatrix(M);
FreeMatrix(N);
FreeMatrix(P);
return 0;
}
Host-side code
// Matrix multiplication on the device
void MatrixMulOnDevice(const Matrix M, const Matrix N, Matrix P)
{
// Load M and N to the device
Matrix Md = AllocateDeviceMatrix(M);
CopyToDeviceMatrix(Md, M);
Matrix Nd = AllocateDeviceMatrix(N);
CopyToDeviceMatrix(Nd, N);
// Allocate P on the device
Matrix Pd = AllocateDeviceMatrix(P);
CopyToDeviceMatrix(Pd, P); // Clear memory
// Setup the execution configuration
dim3 dimBlock(WIDTH, WIDTH);
dim3 dimGrid(1, 1);
// Launch the device computation threads!
MatrixMulKernel<<<dimGrid, dimBlock>>>(Md, Nd, Pd);
// Read P from the device
CopyFromDeviceMatrix(P, Pd);
// Free device matrices
FreeDeviceMatrix(Md);
FreeDeviceMatrix(Nd);
FreeDeviceMatrix(Pd);
}

Step 4: Device-side Kernel Function

// Matrix multiplication kernel – thread specification
global void MatrixMulKernel(Matrix M, Matrix N, Matrix P)
{
// 2D Thread ID
int tx = threadIdx.x;
int ty = threadIdx.y;
// Pvalue is used to store the element of the matrix
// that is computed by the thread
float Pvalue = 0;
for (int k = 0; k < M.width; ++k)
{
float Melement = M.elements[ty * M.pitch + k];
float Nelement = Nd.elements[k * N.pitch + tx];
Pvalue += Melement * Nelement;
}
// Write the matrix to device memory;
// each thread writes one element
P.elements[ty * P.pitch + tx] = Pvalue;
}

Linear Algebra - Theory, Intuition, Code - Mike X Cohen - 1, 2021 - Sincxpress BV - 9083136604 - Anna's Archive
100% (1)
Linear Algebra - Theory, Intuition, Code - Mike X Cohen - 1, 2021 - Sincxpress BV - 9083136604 - Anna's Archive
581 pages
JEE Main 2019 Mathematics April Attempt Shift - 1 (08th April, 2019)
100% (25)
JEE Main 2019 Mathematics April Attempt Shift - 1 (08th April, 2019)
16 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
Mock Porfolio - Von Koch Snowflake
No ratings yet
Mock Porfolio - Von Koch Snowflake
10 pages
Polya S Solving
No ratings yet
Polya S Solving
10 pages
HPC-Practical-4Addition of Two Large Vectors
No ratings yet
HPC-Practical-4Addition of Two Large Vectors
4 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Mathematics-Functions and Finance
No ratings yet
Mathematics-Functions and Finance
100 pages
Additional Exercises For Convex Optimization PDF
No ratings yet
Additional Exercises For Convex Optimization PDF
187 pages
Abacus Guide Book
No ratings yet
Abacus Guide Book
16 pages
Add Maths Form Four
0% (1)
Add Maths Form Four
3 pages
Graphing Polynomial Functions
100% (1)
Graphing Polynomial Functions
30 pages
Web GPU
0% (1)
Web GPU
40 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
10th STD Maths Bridge Course Eng Version 2024-25
No ratings yet
10th STD Maths Bridge Course Eng Version 2024-25
3 pages
Transformations of Functions
No ratings yet
Transformations of Functions
35 pages
Implementing AI Models On FPGAs - A Comprehensive T
No ratings yet
Implementing AI Models On FPGAs - A Comprehensive T
43 pages
Precal Unit 2 Notes SLK
No ratings yet
Precal Unit 2 Notes SLK
17 pages
Numerical Methods For Differential Equations: Euler Method
No ratings yet
Numerical Methods For Differential Equations: Euler Method
27 pages
Fiitjee Mat
No ratings yet
Fiitjee Mat
25 pages
Chapter 3 - Matrix
No ratings yet
Chapter 3 - Matrix
8 pages
2023 CSC14120 Lecture01 CUDAIntroduction
No ratings yet
2023 CSC14120 Lecture01 CUDAIntroduction
32 pages
Hye Math 12th
No ratings yet
Hye Math 12th
4 pages
CUDA Part-2
No ratings yet
CUDA Part-2
49 pages
CUDA Part-1
No ratings yet
CUDA Part-1
52 pages
Chapter 7 Review and Answers
No ratings yet
Chapter 7 Review and Answers
22 pages
Assignment (MID) Mat 101
No ratings yet
Assignment (MID) Mat 101
9 pages
Threads
No ratings yet
Threads
54 pages
Function Assignment PDF
100% (1)
Function Assignment PDF
6 pages
Mat121 Module 5
No ratings yet
Mat121 Module 5
18 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
International A Level Mathematics Pure Mathematics 3 Student Book Sample
No ratings yet
International A Level Mathematics Pure Mathematics 3 Student Book Sample
34 pages
Moving To Parallel - Addition of 2 Matrices
No ratings yet
Moving To Parallel - Addition of 2 Matrices
14 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
1 s2.0 0012365X9190350B Main
No ratings yet
1 s2.0 0012365X9190350B Main
25 pages
KSR-Numerical Methods
No ratings yet
KSR-Numerical Methods
3 pages
Cuuda Nvidai Guide - Part3
No ratings yet
Cuuda Nvidai Guide - Part3
15 pages
PyCUDA AH PDF
No ratings yet
PyCUDA AH PDF
16 pages
2023 CSC14120 Lecture05 CUDAMemories
No ratings yet
2023 CSC14120 Lecture05 CUDAMemories
48 pages
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
No ratings yet
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
45 pages
Rishi
No ratings yet
Rishi
30 pages
Cuda
No ratings yet
Cuda
4 pages
Cuda Add Mult
No ratings yet
Cuda Add Mult
3 pages
UNIT-5 Tiling
No ratings yet
UNIT-5 Tiling
23 pages
278 hw5
No ratings yet
278 hw5
20 pages
Assignment 04
No ratings yet
Assignment 04
16 pages
Cuda
No ratings yet
Cuda
7 pages
PDC Assignment
No ratings yet
PDC Assignment
9 pages
HPC File
No ratings yet
HPC File
22 pages
My Experiments: Opencl Gpu Matrix Multiplication Program
No ratings yet
My Experiments: Opencl Gpu Matrix Multiplication Program
19 pages
Group A Assignment 4 (A) : Two Large Vectors
No ratings yet
Group A Assignment 4 (A) : Two Large Vectors
5 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
10th SA2 I Q
No ratings yet
10th SA2 I Q
29 pages
CUDA Additionof2Vector
No ratings yet
CUDA Additionof2Vector
2 pages
Chanakya Half Yearly Sample 23
No ratings yet
Chanakya Half Yearly Sample 23
6 pages
Lab7 GPU
No ratings yet
Lab7 GPU
10 pages
Addition Cuda
No ratings yet
Addition Cuda
2 pages
5 Computation
No ratings yet
5 Computation
13 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Joint Matrix Bfloat16 Modified
No ratings yet
Joint Matrix Bfloat16 Modified
4 pages
CUDA
No ratings yet
CUDA
3 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
Matrix-Matrix Multiplication Using Shared Memory
No ratings yet
Matrix-Matrix Multiplication Using Shared Memory
27 pages
CUDA MatrixMultiplication
No ratings yet
CUDA MatrixMultiplication
2 pages
HPC (Pra 04)
No ratings yet
HPC (Pra 04)
11 pages
3 Some Commonly Used CUDA API: 3.1 Function Type Qualifiers
No ratings yet
3 Some Commonly Used CUDA API: 3.1 Function Type Qualifiers
7 pages
Google Colab Solution Activity
No ratings yet
Google Colab Solution Activity
5 pages
Cuda Firstprograms PDF
No ratings yet
Cuda Firstprograms PDF
6 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
Parallel Scan in C CUda
No ratings yet
Parallel Scan in C CUda
3 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
12 pages
Input: Output: 1. Sub String Program
No ratings yet
Input: Output: 1. Sub String Program
8 pages
Eigenvectors and Eigenvalues: Explained Visually
No ratings yet
Eigenvectors and Eigenvalues: Explained Visually
9 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
3D Transformations
No ratings yet
3D Transformations
5 pages
Mulmatrix Cu
No ratings yet
Mulmatrix Cu
3 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
Cuda Notes From Udacity Lecture
No ratings yet
Cuda Notes From Udacity Lecture
3 pages
2D Transformation
No ratings yet
2D Transformation
5 pages
Lesson 3 Understanding Addition of Integers
No ratings yet
Lesson 3 Understanding Addition of Integers
5 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
608 Using Metal 2 For Compute
No ratings yet
608 Using Metal 2 For Compute
207 pages
Symbolic and Numerical Computation For Artificial Intelligence
No ratings yet
Symbolic and Numerical Computation For Artificial Intelligence
21 pages
Math 109b - Problem Set 8 Evan Dummit: U V V U U V U U V V V V U U V V U U
No ratings yet
Math 109b - Problem Set 8 Evan Dummit: U V V U U V U U V V V V U U V V U U
2 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Allocate The Device Memory Where We Will Copy M

Uploaded by

Allocate The Device Memory Where We Will Copy M

Uploaded by

// Allocate the device memory where we will copy M to

Step 2: Simple Host Code in C

Step 3: Host-side Main Program Code

Step 4: Device-side Kernel Function

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.