Transcript
The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact
[email protected] . I will correct the error asap. This course used only and please do NOT broadcast. Thank you.
Introduction to Modern GPU Hardware Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2016 1
Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali GPU Applications Summary 2
GPU Fundamentals: Graphics Pipeline Graphics State
GPU
Shade
Final Pixels (Color, Depth)
Rasterize
Fragments (pre-pixels)
Assemble Primitives
Screenspace triangles (2D)
Transform & Light
Xformed, Lit Vertices (2D)
CPU
Vertices (3D)
Application
Video Memory (Textures)
Render-to-texture
• A simplified graphics pipeline – Note that pipe widths vary – Many caches, FIFOs, and so on not shown
GPU Fundamentals: Modern Graphics Pipeline Graphics State
GPU
• Programmable vertex processor!
Fragment Shade Processor
Final Pixels (Color, Depth)
CPU
Rasterize
Fragments (pre-pixels)
Assemble Primitives
Screenspace triangles (2D)
Xformed, Lit Vertices (2D)
Vertices (3D)
Application
Vertex Transform Processor & Light
Video Memory (Textures)
Render-to-texture
• Programmable pixel processor!
GPU Fundamentals: Modern Graphics Pipeline Graphics State
Programmable primitive assembly!
Fragment Processor
Final Pixels (Color, Depth)
CPU
Rasterize
Fragments (pre-pixels)
Geometry Assemble Processor Primitives
Screenspace triangles (2D)
Vertex Processor
Xformed, Lit Vertices (2D)
Vertices (3D)
Application
Video Memory (Textures)
GPU
Render-to-texture
More flexible memory access!
History of Graphics Hardware (1/3) … - mid ’90s
SGI mainframes and workstations
PC: only 2D graphics hardware
mid ’90s
Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, Matrox, ATI, …
Triangle rasterization (only)
Cheap: pushed by game industry
1999
PC-card with TnL (Transform and Lighting)
3DFX Voodoo graphics 4MB - 1997
- NVIDIA GeForce: Graphics Processing Unit (GPU)
PC-card more powerful than specialized workstations 6
History of Graphics Hardware (2/3)
https://www.zhihu.com/question/21980949
History of Graphics Hardware (3/3) Modern graphics hardware
Graphics pipeline partly programmable
Leaders: AMD(ATI) and NVIDIA - “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”
Game consoles similar to GPUs (Xbox)
8
Computational Power (1/2) • GPUs are fast… – 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): • Computation: 48 GFLOPS peak • Memory bandwidth: 21 GB/s peak • Price: $874 (chip)
– NVIDIA GeForce 8800 GTX: • Computation: 330 GFLOPS observed • Memory bandwidth: 55.2 GB/s observed • Price: $599 (board)
• GPUs are getting faster, faster – CPUs: 1.4× annual growth – GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth
Computational Power (2/2) GPU
CPU
Courtesy Naga Govindaraju
Flops Comparison on GPU and CPU
Memory Bandwidths Comparison of CPU and GPU
Motivation • Why are GPUs getting faster so fast? – Arithmetic intensity • the specialized nature of GPUs makes it easier to use additional transistors for computation
– Economics • multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property
Flexible and Precise • Modern GPUs are deeply programmable – Programmable pixel, vertex, and geometry engines – Solid high-level language support
• Modern GPUs support “real” precision – 32-bit/64-bit floating point throughout the pipeline • High enough for many applications
– DX10-class GPUs add 32-bit integers
Graphics Hardware Consideration (1/2) • GPU = Graphics Processing Unit – Vector processor – Operates on 4 tuples • Position ( x, y, z, w ) • Color ( red, green, blue, alpha ) • Texture Coordinates ( s, t, r, q )
– 4 tuple ops, 1 clock cycle • SIMD [ Single Instruction Multiple Data ] – ADD, MUL, SUB, DIV, MADD, …
Graphics Hardware Consideration (2/2) • Pipelining
1
2
3
– Number of stages 1
• Parallelism
2 3
– Number of parallel processes
• Parallelism + pipelining – Number of parallel pipelines
1
2
3
1
2
3
1
2
3
Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali Summary 17
Growth of NVIDIA GPU • Performance matrices – Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.
Growth of NVIDIA GPU
NVIDIA GeForce 7900 GTX
Nvidia Graphics Card Architecture •
GeForce-8 Series – 12,288 concurrent threads, hardware managed – 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak Work Distribution
Host CPU
IU SP
Shared Memory
IU
IU SP
SP
Shared Memory
TF
Shared Memory
TF
TEX L1
TEX L1
IU SP
Shared Memory
IU
IU
SP
Shared Memory
TF TEX L1
SP
Shared Memory
IU SP
Shared Memory
IU SP
Shared Memory
TF
IU
IU
SP
SP
Shared Memory
TF
TEX L1
Shared Memory
IU SP
Shared Memory
TF
TEX L1
TEX L1
IU
IU SP
IU
SP
Shared Memory
Shared Memory
SP
Shared Memory
TF
IU SP
Shared Memory
IU SP
Shared Memory
TF
TEX L1
TEX L1
L2
L2
L2
L2
L2
L2
Memory
Memory
Memory
Memory
Memory
Memory
NVIDIA FERMI
FERMI: Streaming Multiprocessor (SM) • Each SM contains • 32 Cores • 16 Load/Store units • 32,768 registers
• Newer FP representation • IEEE 754-2008
• Two units • Floating point • Integer
FERMI: Results
FERMI: Comparison
Kepler: Core Architecture http://www.weistang.com/article-941-1.html
Titan vs Tesla Comparison
09/02/11
Maxwell: Core Architecture http://www.weistang.com/article-941-1.html
http://www.coolaler.com/showthread.php/313295%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIAMaxwell%E6%9E%B6%E6%A7%8B
Kepler vs Maxwell Comparison
http://www.coolaler.com/showthread.php/313295%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIAMaxwell%E6%9E%B6%E6%A7%8B
09/02/11
https://zh.wikipedia.org/wiki/CUDA
09/02/11
NVIDIA ULP-Geforce (Tegra2)
31
NVIDIA ULP-Geforce (Tegra2) • Ultra low power (ULP) GeForce GPU with 4 pixel shaders + 4 vertex shaders • 32-bit single-channel memory controller with either LPDDR2-600 or DDR2-667 memory
32
NVIDIA ULP-Geforce (Tegra3)
33
NVIDIA ULP-Geforce (Tegra3) • The GPU in Tegra 3 is an evolution of the Tegra 2 GPU, with twice the number of pixel shader units (8 compared to 4) and higher clock frequency. • 32-bit single-channel memory controller with either LPDDR2 or DDR3 memory
34
Tegra Roadmap
09/02/11
Mobile Roadmap
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-table09/02/11 drawing-tablet?page=2
ATI Radeon X1900 XTX • Features of ATI Radeon X1900 XTX – Core speed 650 MHz – 48 pixel shader processors – 8 vertex shader processors – 51 GB/s memory http://product.pcpop.com/000024721/Index .html bandwidth – 512 MB memory
ATI Radeon X1900 XTX • High Memory Bandwidth GPU 650MHz
High bandwidth 51GB/s
Graphics memory ½ GB
CPU 3GHz
High bandwidth 77GB/s
Output
AGP bus 2GB/s
Processor Chip Cache ½ MB
Parallel Processes
Graphics Card
3GB/s
AGP memory ½ GB
Main memory 1GB
ATI Radeon 9700 • Parallelism + pipelining: ATI Radeon 9700
4 vertex pipelines
8 pixel pipelines
Radeon Comparison
http://www.pcdiy.com.tw/detail/4275
09/02/11
IMG PowerVR Series5XT (SGXMP)
41
IMG PowerVR Series5XT (SGXMP)
• Shader-driven Tile-Based Deferred Rendering (TBDR) architecture • Fully programmable GPU using unique USSE architecture • All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 42
IMG PowerVR Series6 (Rogue)
43
IMG PowerVR Series6 (Rogue)
• Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality 44
IMG PowerVR 7XT Plus
http://imgtec.eetrend.com/article/7130 45
IMG PowerVR 7XT Plus
http://imgtec.eetrend.com/article/7130 46
Features of ARM Mali
47
ARM Mali-200
48
ARM Mali-300
49
ARM Mali-400MP
50
ARM Mali-450MP
51
ARM Mali-T604
52
ARM Mali-T604 • GPGPU (support OpenCL 1.1) • Tri-pipe architecture
• The first GPU based on the Midgard architecture • True IEEE double-precision floating-point math in hardware for Full Profile • The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU • 5x performance improvement over previous Mali graphics processors.
53
ARM Mali-T624
54
9/27/2016
ARM Mali-T678
55
ARM Mali-T678
• 50% performance improvement compared to the MaliT658. 56
ARM Mali-T760
57
ARM Mali-T880
58
ARM Mali Comparison
https://zh.wikipedia.org/wiki/Mali_(GPU) 59
ARM Mali Comparison
https://zh.wikipedia.org/wiki/Mali_(GPU) 60
Applications (1/7) • Includes lots of applications – Ray-tracer – Image segmentation – FFT/Linear Algebra
http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3Ds canrep/stanford-bunny-cebal-ssh.jpg
Applications (2/7)
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- 09/02/11 drawing-tablet?page=2
Applications (3/7)
http://5pit.tw/tech/computer/tid_12880
Applications (4/7)
http://wechatinchina.com/thread-461154-1-1.html 09/02/11
Applications (5/7) https://read01.com/Pnd3D.html
09/02/11
Applications (6/7) AR and VR Applications @@
http://wechatinchina.com/thread-461154-1-1.html 09/02/11
Applications (7/7)
http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E conomy/publish-482.htm
09/02/11
Summary Understand the GPU pipeline in depth Understand the motivation of of GPU hardware Understand modern GPU hardware architecture and specifications Understand GPU/GPGPU applications
68
Reference GPU Architecture & CG, Mark Colbert, 2006 Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens
GPU Tutorial, Yiyunjin, 2007 Evolution of GPU and Graphics Pipelining, Weijun Xiao Commercial product website (NVIDIA, ATI, IMG, ARM). Referencing SIGGRAPH 2005 Course Notes from David Luebke Adapted from: David Luebke (University of Virginia) and NVIDIA Jan Verschelde, MCS 572 Lecture 27, Introduction to Supercomputing, 17 March 2014 Acknowledgement: Thanks for TA’s help for preparing the material. 69