0% found this document useful (0 votes)
148 views22 pages

Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus

Graphics hardware has evolved to efficiently process the large amounts of data required for real-time 3D graphics. Modern graphics processing units (GPUs) use stream programming and parallel processing across many cores to achieve high performance. The graphics pipeline includes programmable vertex and fragment shaders that can be written in languages like OpenGL Shading Language and compute shaders for general-purpose GPU programming in languages like CUDA and OpenCL. GPUs provide high memory bandwidth and throughput for graphics as well as non-graphics applications that can be expressed in a data-parallel manner.

Uploaded by

dagush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views22 pages

Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus

Graphics hardware has evolved to efficiently process the large amounts of data required for real-time 3D graphics. Modern graphics processing units (GPUs) use stream programming and parallel processing across many cores to achieve high performance. The graphics pipeline includes programmable vertex and fragment shaders that can be written in languages like OpenGL Shading Language and compute shaders for general-purpose GPU programming in languages like CUDA and OpenCL. GPUs provide high memory bandwidth and throughput for graphics as well as non-graphics applications that can be expressed in a data-parallel manner.

Uploaded by

dagush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

IntroductiontoGraphicsHardware

andGPUs
GustavoPatow
IMAE UdG

Overview

Definition
Motivation
HistoryofGraphicsHardware
GraphicsPipeline
Vertex,GeometryandFragmentShaders
ModernGraphicsHardware
StreamProgramming
GPUStreamProgramming
Languages
MoreInformation

Definition

Logical Representation
of Visual Information

Output Signal

Motivation
RealTime:15 60fps
HighResolution

Motivation
HighCPUload
Physics,AI,sound,network,

Graphicsdemand:
Fastmemoryaccess
Manylookups[vertices,normal,textures,]

Highbandwidthusage
AfewGB/sneededinregularcases!

Largenumberofflops
Flops=FloatingPointOperations[ADD,MUL,SUB,]
Illustration:matrixvectorproducts
(16MUL+12ADD)x(#vertices+#normals)xfps=
(28Flops)x(6.000.000)x30 5GFlops

Conclusion:Realtimegraphicsneedssupporting
hardware!

HistoryofGraphicsHardware
mid90s
SGImainframesandworkstations
PC:only2Dgraphicshardware

mid90s
Consumer3Dgraphicshardware(PC)
3dfx,nVidia,Matrox,ATI,

Trianglerasterization (only)
Cheap:pushedbygameindustry

1999
PCcardwithTnL [TransformandLighting]
nVIDIA GeForce:GraphicsProcessingUnit(GPU)

PCcardmorepowerfulthanspecializedworkstations

Moderngraphicshardware
Graphicspipelinepartlyprogrammable
Leaders:ATIandnVidia
GameconsolessimilartoGPUs
Xbox,PS2,XBOX360,PS3,XONE,PS4,

GraphicsPipeline
Application

LOD selection
Frustum Culling
Portal Culling

Geometry
Processing

Modelview/Projection tr.
Lighting
Primitive Assembly
Backface culling

Rasterization
Output

Clipping
Division by w
Viewport transform

Scan Conversion
Fragment Shading [Color and Texture interpol.]
Frame Buffer Ops [Z-buffer, Alpha Blending,]

Output to Device

GraphicsPipeline
Application
Programmable

Geometry
Processing
Rasterization
Output

LOD selection
Frustum Culling
Portal Culling

VERTEX & GEOMETRY


Clipping
SHADERS
Division by w
Primitive Assembly
Backface culling

Viewport transform

Scan Conversion
FRAGMENT SHADER

Output to Device

VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)

VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )

( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER

VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )

( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER

( x, y )
( r, g, b, a )
( depth )

VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )

( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER

( x, y )
( r, g, b, a )
( depth )
FRAGMENT
SHADER

VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )

( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER

( x, y )
( r, g, b, a )
( depth )

( x, y )
( r, g, b, a )
( depth )
FRAGMENT
SHADER

(More)CompleteSystemArchitecture
Vertex
Buffer
Index
Buffer

Input
Assembler

Texture

Vertex
Shader

Texture

Geometry
Shader

Buffer

Setup/
Rasterization

Texture

Pixel
Shader

Color

Logical pipeline
Programmers

view

Stream
Out

Memory

Depth

Output
Merger

GeometryShader
Entireprimitiveasinput
AdjacencyOptional

Outputszeroormore
primitives
1024scalarsoutmax

FullPipeline

10

Possibilities

PARALLELISM

11

ModernGraphicsHardware
GPU=GraphicsProcessingUnit
Vectorprocessor
Operateson4tuples
Position
(x,y,z,w)
Color
(red,green,blue,alpha)
TextureCoordinates (s,t,r,q)

4tupleops,1clockcycle
SIMD[SingleInstructionMultipleData]
ADD,MUL,SUB,DIV,MADD,

ModernGraphicsHardware
Pipelining
Numberofstages

Parallelism

1
2

Numberofparallelprocesses

Parallelism+pipelining
Numberofparallelpipelines

12

ModernGraphicsHardware
Parallelism+pipelining:ATIRadeon9700

4 vertex pipelines

8 pixel pipelines

ModernGraphicsHardware
FeaturesofGeForceGTX480
Corespeed700Mhz
Shader clock:1400Mhz
480shader processors
60textureunits
177GB/smemorybandwidth
1536MBmemory

13

ModernGraphicsHardware
HighMemoryBandwidth
GPU
650Mhz

Graphics memory
GB

CPU
3Ghz

High bandwidth
77GB/s

Output

AGP bus
2GB/s

Processor Chip
Cache
MB

Parallel Processes

Graphics Card
High bandwidth
51GB/s

3GB/s

AGP memory
GB

Main memory
1GB

GPGPU

14

StreamProgramming
Input:streamofdatarecords
Output:stream(s)ofdatarecords
Kernel:operatessequentiallyonthedatarecords,
accessingone recordatatime!
ReadOnlyMemory:recordindependent readonly
memory

GPUStreamProgramming
VertexShader
Inputandoutputstreams
Vertices,normals,colors,texturecoordinates

Readonlymemory
Uniformvariables
Uniform=constantperstream
Textures,floats,ints,
arrays,

FragmentShader
Inputandoutputstreams
Pixels
Zvalues

Readonlymemory
Seeabove

15

Languages

Assembly
Cg
[CforGraphics]
HLSL
[HighLevelShadingLanguage]
GLSL
[OpenGLShadingLanguage]
Sh,BrookGPU(obsolete?)
Cuda
OpenCL

Assembly

SpecializedInstructionSet
DP4:4tupledotpoduct
RSQ:reciprocalsquareroot
MAD:multiplyandadd
DPH:homogeneousdot
product
SCS:sineandcosine
LRP:linearinterpolate
TEX:texturemap

!!ARBvp1.0

Nowadays,notuseddirectly

#Passtheprimarycolorthroughw/o
#lighting.
MOVresult.color,vertex.color;

anymore

Generatedbyhighlevellanguage
compilers

ATTRIBpos=vertex.position;
PARAMmat[4]={state.matrix.mvp};
#Transformbyconcatenationofthe
#MODELVIEWandPROJECTION
#matrices.
DP4result.position.x,mat[0],pos;
DP4result.position.y,mat[1],pos;
DP4result.position.z,mat[2],pos;
DP4result.position.w,mat[3],pos;

END

16

Cg/HLSL/GLSL

Highlevelprogramminglanguage
Staticconditionaljumps
if,while,for,
Datadependentconditionaljumps
SIMDFragmentshader:only
efficientincaseofcoherent
programflow!
Nopointers!

struct appdata{
float4 position:POSITION;
float3normal:NORMAL;
float3color:DIFFUSE;
float3VertexColor:SPECULAR;
};
struct vfconn{
float4HPOS:POSITION;
float4COL0:COLOR0;
};
vfconnmain(
appdataIN,
uniform float4Kd,
uniformfloat4x4 mvp){
vfconnOUT;
OUT.HPOS=mul(mvp,IN.position);
OUT.COL0.xyz=Kd.xyz*
IN.VertexColor.xyz;
OUT.COL0.w=1.0;
return OUT;
}

Sh

Shader codeembeddedinC++

//C++Code
vsh=SH_BEGIN_VERTEX_PROGRAM{
ShInputNormal3fnormal;
ShInputPosition4fp;
ShOutputPoint4fov;
ShOutputNormal3fon;
ShOutputVector3flvv;
ShOutputPosition4fopd;
opd=Globals::mvp|p;
on=normalize(Globals::mv|normal);
ov=normalize(Globals::mv|p);
lvv=normalize(Globals::lightPos
(Globals::mv|p)(0,1,2));
}SH_END_PROGRAM;
fsh=SH_BEGIN_FRAGMENT_PROGRAM{
ShInputVector4fv;
ShInputNormal3fn;
ShInputVector3flvv;
ShInputPosition4fp;
ShOutputColor3f out;
out(0,1,2)=Globals::color*
dot(normalize(n),normalize(lvv));
}SH_END_PROGRAM;

17

BrookGPU

GPGPULanguage
GeneralPurposeGPU
Language
Brook:StreamingextensionofC
BrookGPU:GPUportofBrook
Nocomputergraphicsknowledge
required!

kernelvoidk(floats<>,float3f,float
a[10][10],outfloato<>);
float a<100>;
floatb<100>;
floatc<10,10>;
streamRead(a,data1);
streamRead(b,data2);
streamRead(c,data3);
//Callkernel"k"
k(a,3.2f,c,b);
streamWrite(b,result)

Programminglanguages:
CUDA
CUDA:Compute
Unified Device
Architecture
Requires an Nvidia
GPUand drivers

Genral pourpose
GPUs have aparallel
"manycore"
architecture
each core capable of
running thousands of
threads
simultaneously

Bindings for C++,


Python,etc.

18

Programminglanguages:
CUDA
CUDA hasseveral advantages overtraditional
generalpurpose computation onGPUs (GPGPU)
using graphics APIs
It usesthe standard Clanguage,with some extensions
Scattered writes write toarbitrary addresses inmem
Sharedmemory exposesafastshared memory region
(16KBinsize)that canbeshared amongst threads
This canbeused asausermanaged cache
Enabling higher bandwidth than ispossibleusing texture
lookups

Faster downloads and readbacks toand from the GPU


Fullsupport forinteger and bitwise operations
including integer texture lookups

CUDA:Example:loadatexturefromanimage
intoanarrayontheGPU
cudaArray*cu_array;
texture<float,2>tex;
//Allocate array
cudaMalloc(&cu_array,cudaCreateChannelDesc<float>(),width,height );
//Copy image datatoarray
cudaMemcpy(cu_array,image,width*height,cudaMemcpyHostToDevice);
//Bind the array tothe texture
cudaBindTexture(tex,cu_array);
//Run kernel
dim3blockDim(16,16,1);
dim3gridDim(width /blockDim.x,height /blockDim.y,1);
kernel<<<gridDim,blockDim,0>>>(d_odata,width,height);
cudaUnbindTexture(tex);
__global__void kernel(float*odata,int height,int width)
{
unsigned int x=blockIdx.x*blockDim.x +threadIdx.x;
unsigned int y=blockIdx.y*blockDim.y +threadIdx.y;
float c=texfetch(tex,x,y);
odata[y*width+x]=c;
}

19

OpenCL (Open Computing


Language)
Aframeworkforwritingprogramsthatexecuteacross
heterogeneousplatformsconsistingofCPUs,GPUs,andother
processors
Includes
Alanguage(basedonC99)forwritingkernels (functionsthatexecute
onOpenCL devices)
APIsthatareusedtodefineandthencontroltheplatforms
OpenCL providesparallelcomputingusingtaskbasedanddatabased
parallelism

OpenCL isanalogousto
OpenGL,for3Dgraphics
OpenAL, forcomputeraudio

OpenCL extendsthepoweroftheGPUbeyondgraphics(GPGPU)
OpenCL ismanagedbythenonprofittechnologyconsortium
KhronosGroup

Screenshots
nVidiaToolkit[ReflectionBumpMapping]

20

Screenshots
Crysis[UBISOFT]

Screenshots
NPR[ATIResearchGroup]

21

MoreInformation
nVidia
http://developer.nvidia.com/

ATI
http://www.ati.com/developer/

GeneralPurposeGPUProgramming
http://www.gpgpu.org

GPUProgrammingandArchitecture
http://www.cis.upenn.edu/~suvenkat/700/

Hardware
http://www.beyond3d.com
http://www.tomshardware.com

Questions?

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy