Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
andGPUs
GustavoPatow
IMAE UdG
Overview
Definition
Motivation
HistoryofGraphicsHardware
GraphicsPipeline
Vertex,GeometryandFragmentShaders
ModernGraphicsHardware
StreamProgramming
GPUStreamProgramming
Languages
MoreInformation
Definition
Logical Representation
of Visual Information
Output Signal
Motivation
RealTime:15 60fps
HighResolution
Motivation
HighCPUload
Physics,AI,sound,network,
Graphicsdemand:
Fastmemoryaccess
Manylookups[vertices,normal,textures,]
Highbandwidthusage
AfewGB/sneededinregularcases!
Largenumberofflops
Flops=FloatingPointOperations[ADD,MUL,SUB,]
Illustration:matrixvectorproducts
(16MUL+12ADD)x(#vertices+#normals)xfps=
(28Flops)x(6.000.000)x30 5GFlops
Conclusion:Realtimegraphicsneedssupporting
hardware!
HistoryofGraphicsHardware
mid90s
SGImainframesandworkstations
PC:only2Dgraphicshardware
mid90s
Consumer3Dgraphicshardware(PC)
3dfx,nVidia,Matrox,ATI,
Trianglerasterization (only)
Cheap:pushedbygameindustry
1999
PCcardwithTnL [TransformandLighting]
nVIDIA GeForce:GraphicsProcessingUnit(GPU)
PCcardmorepowerfulthanspecializedworkstations
Moderngraphicshardware
Graphicspipelinepartlyprogrammable
Leaders:ATIandnVidia
GameconsolessimilartoGPUs
Xbox,PS2,XBOX360,PS3,XONE,PS4,
GraphicsPipeline
Application
LOD selection
Frustum Culling
Portal Culling
Geometry
Processing
Modelview/Projection tr.
Lighting
Primitive Assembly
Backface culling
Rasterization
Output
Clipping
Division by w
Viewport transform
Scan Conversion
Fragment Shading [Color and Texture interpol.]
Frame Buffer Ops [Z-buffer, Alpha Blending,]
Output to Device
GraphicsPipeline
Application
Programmable
Geometry
Processing
Rasterization
Output
LOD selection
Frustum Culling
Portal Culling
Viewport transform
Scan Conversion
FRAGMENT SHADER
Output to Device
VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER
VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER
( x, y )
( r, g, b, a )
( depth )
VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER
( x, y )
( r, g, b, a )
( depth )
FRAGMENT
SHADER
VertexandFragmentShaders
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a )
( x, y, z, w )
( nx, ny, nz )
( s, t, r, q )
( r, g, b, a)
VERTEX
SHADER
( x, y )
( r, g, b, a )
( depth )
( x, y )
( r, g, b, a )
( depth )
FRAGMENT
SHADER
(More)CompleteSystemArchitecture
Vertex
Buffer
Index
Buffer
Input
Assembler
Texture
Vertex
Shader
Texture
Geometry
Shader
Buffer
Setup/
Rasterization
Texture
Pixel
Shader
Color
Logical pipeline
Programmers
view
Stream
Out
Memory
Depth
Output
Merger
GeometryShader
Entireprimitiveasinput
AdjacencyOptional
Outputszeroormore
primitives
1024scalarsoutmax
FullPipeline
10
Possibilities
PARALLELISM
11
ModernGraphicsHardware
GPU=GraphicsProcessingUnit
Vectorprocessor
Operateson4tuples
Position
(x,y,z,w)
Color
(red,green,blue,alpha)
TextureCoordinates (s,t,r,q)
4tupleops,1clockcycle
SIMD[SingleInstructionMultipleData]
ADD,MUL,SUB,DIV,MADD,
ModernGraphicsHardware
Pipelining
Numberofstages
Parallelism
1
2
Numberofparallelprocesses
Parallelism+pipelining
Numberofparallelpipelines
12
ModernGraphicsHardware
Parallelism+pipelining:ATIRadeon9700
4 vertex pipelines
8 pixel pipelines
ModernGraphicsHardware
FeaturesofGeForceGTX480
Corespeed700Mhz
Shader clock:1400Mhz
480shader processors
60textureunits
177GB/smemorybandwidth
1536MBmemory
13
ModernGraphicsHardware
HighMemoryBandwidth
GPU
650Mhz
Graphics memory
GB
CPU
3Ghz
High bandwidth
77GB/s
Output
AGP bus
2GB/s
Processor Chip
Cache
MB
Parallel Processes
Graphics Card
High bandwidth
51GB/s
3GB/s
AGP memory
GB
Main memory
1GB
GPGPU
14
StreamProgramming
Input:streamofdatarecords
Output:stream(s)ofdatarecords
Kernel:operatessequentiallyonthedatarecords,
accessingone recordatatime!
ReadOnlyMemory:recordindependent readonly
memory
GPUStreamProgramming
VertexShader
Inputandoutputstreams
Vertices,normals,colors,texturecoordinates
Readonlymemory
Uniformvariables
Uniform=constantperstream
Textures,floats,ints,
arrays,
FragmentShader
Inputandoutputstreams
Pixels
Zvalues
Readonlymemory
Seeabove
15
Languages
Assembly
Cg
[CforGraphics]
HLSL
[HighLevelShadingLanguage]
GLSL
[OpenGLShadingLanguage]
Sh,BrookGPU(obsolete?)
Cuda
OpenCL
Assembly
SpecializedInstructionSet
DP4:4tupledotpoduct
RSQ:reciprocalsquareroot
MAD:multiplyandadd
DPH:homogeneousdot
product
SCS:sineandcosine
LRP:linearinterpolate
TEX:texturemap
!!ARBvp1.0
Nowadays,notuseddirectly
#Passtheprimarycolorthroughw/o
#lighting.
MOVresult.color,vertex.color;
anymore
Generatedbyhighlevellanguage
compilers
ATTRIBpos=vertex.position;
PARAMmat[4]={state.matrix.mvp};
#Transformbyconcatenationofthe
#MODELVIEWandPROJECTION
#matrices.
DP4result.position.x,mat[0],pos;
DP4result.position.y,mat[1],pos;
DP4result.position.z,mat[2],pos;
DP4result.position.w,mat[3],pos;
END
16
Cg/HLSL/GLSL
Highlevelprogramminglanguage
Staticconditionaljumps
if,while,for,
Datadependentconditionaljumps
SIMDFragmentshader:only
efficientincaseofcoherent
programflow!
Nopointers!
struct appdata{
float4 position:POSITION;
float3normal:NORMAL;
float3color:DIFFUSE;
float3VertexColor:SPECULAR;
};
struct vfconn{
float4HPOS:POSITION;
float4COL0:COLOR0;
};
vfconnmain(
appdataIN,
uniform float4Kd,
uniformfloat4x4 mvp){
vfconnOUT;
OUT.HPOS=mul(mvp,IN.position);
OUT.COL0.xyz=Kd.xyz*
IN.VertexColor.xyz;
OUT.COL0.w=1.0;
return OUT;
}
Sh
Shader codeembeddedinC++
//C++Code
vsh=SH_BEGIN_VERTEX_PROGRAM{
ShInputNormal3fnormal;
ShInputPosition4fp;
ShOutputPoint4fov;
ShOutputNormal3fon;
ShOutputVector3flvv;
ShOutputPosition4fopd;
opd=Globals::mvp|p;
on=normalize(Globals::mv|normal);
ov=normalize(Globals::mv|p);
lvv=normalize(Globals::lightPos
(Globals::mv|p)(0,1,2));
}SH_END_PROGRAM;
fsh=SH_BEGIN_FRAGMENT_PROGRAM{
ShInputVector4fv;
ShInputNormal3fn;
ShInputVector3flvv;
ShInputPosition4fp;
ShOutputColor3f out;
out(0,1,2)=Globals::color*
dot(normalize(n),normalize(lvv));
}SH_END_PROGRAM;
17
BrookGPU
GPGPULanguage
GeneralPurposeGPU
Language
Brook:StreamingextensionofC
BrookGPU:GPUportofBrook
Nocomputergraphicsknowledge
required!
kernelvoidk(floats<>,float3f,float
a[10][10],outfloato<>);
float a<100>;
floatb<100>;
floatc<10,10>;
streamRead(a,data1);
streamRead(b,data2);
streamRead(c,data3);
//Callkernel"k"
k(a,3.2f,c,b);
streamWrite(b,result)
Programminglanguages:
CUDA
CUDA:Compute
Unified Device
Architecture
Requires an Nvidia
GPUand drivers
Genral pourpose
GPUs have aparallel
"manycore"
architecture
each core capable of
running thousands of
threads
simultaneously
18
Programminglanguages:
CUDA
CUDA hasseveral advantages overtraditional
generalpurpose computation onGPUs (GPGPU)
using graphics APIs
It usesthe standard Clanguage,with some extensions
Scattered writes write toarbitrary addresses inmem
Sharedmemory exposesafastshared memory region
(16KBinsize)that canbeshared amongst threads
This canbeused asausermanaged cache
Enabling higher bandwidth than ispossibleusing texture
lookups
CUDA:Example:loadatexturefromanimage
intoanarrayontheGPU
cudaArray*cu_array;
texture<float,2>tex;
//Allocate array
cudaMalloc(&cu_array,cudaCreateChannelDesc<float>(),width,height );
//Copy image datatoarray
cudaMemcpy(cu_array,image,width*height,cudaMemcpyHostToDevice);
//Bind the array tothe texture
cudaBindTexture(tex,cu_array);
//Run kernel
dim3blockDim(16,16,1);
dim3gridDim(width /blockDim.x,height /blockDim.y,1);
kernel<<<gridDim,blockDim,0>>>(d_odata,width,height);
cudaUnbindTexture(tex);
__global__void kernel(float*odata,int height,int width)
{
unsigned int x=blockIdx.x*blockDim.x +threadIdx.x;
unsigned int y=blockIdx.y*blockDim.y +threadIdx.y;
float c=texfetch(tex,x,y);
odata[y*width+x]=c;
}
19
OpenCL isanalogousto
OpenGL,for3Dgraphics
OpenAL, forcomputeraudio
OpenCL extendsthepoweroftheGPUbeyondgraphics(GPGPU)
OpenCL ismanagedbythenonprofittechnologyconsortium
KhronosGroup
Screenshots
nVidiaToolkit[ReflectionBumpMapping]
20
Screenshots
Crysis[UBISOFT]
Screenshots
NPR[ATIResearchGroup]
21
MoreInformation
nVidia
http://developer.nvidia.com/
ATI
http://www.ati.com/developer/
GeneralPurposeGPUProgramming
http://www.gpgpu.org
GPUProgrammingandArchitecture
http://www.cis.upenn.edu/~suvenkat/700/
Hardware
http://www.beyond3d.com
http://www.tomshardware.com
Questions?
22