COM4521 Parallel Computing with Graphical Processing Units
Computing architectures are rapidly changing towards scalable parallel computing devices with many cores. Performance is gained by new designs which favour a high number of parallel compute cores at the expe…
04:27
CloudComputing-Part3.mkv
09:58
CloudComputing-Part2
08:54
CloudComputing-Part1
08:22
Lecture 17 - Part 04 - Multi GPU Programming
09:19
Lecture 17 - Part 03 - Synchronisation
15:32
Lecture 17 - Part 02 - CUDA Streams
10:35
Lecture 17 - Part 01 - Synchronous and Asynchronous Execution
07:11
Lecture 16 - Part 03 - Applications of Sort
14:37
Lecture 16 - Part 02 - Libraries
20:56
Lecture 16 - Part 01 - Sorting
41:30
Lecture 14 & 15 Profiling
17:27
Lecture 13 - Part 03 - Scan
25:04
Lecture 13 - Part 02 - Reduction
05:42
Lecture 13 - Part 01 - Parallel Patterns Overview
18:00
Lecture 12 - Part 03 - Atomics and Warp Operations
12:43
Lecture 12 - Part 02 - Advanced Divergence
14:16
Lecture 12 - Part 01 - Scheduling and Divergence
14:16
Lecture 11 - Part 03 - Occupancy
07:18
Lecture 11 - Part 02 - The L1 Cache
19:30
Lecture 11 - Part 01 - Memory Coalescing
08:17
Lecture 10 - Part 03 - Boundary Conditions
19:26
Lecture 10 - Part 02 - Shared Memory Bank Conflicts
22:27
Lecture 10 - Part 01 - Introduction to Shared Memory
14:42
Lecture 09 - Part 03 - Read-Only and Texture Memory
14:45
Lecture 09 - Part 02 - Global and Constant Memory
20:16
Lecture 09 - Part 01 - Memory Overview
15:07
Lecture 08 - Part 03 - CUDA Host Code and Memory Management
15:45
Lecture 08 - Part 02 - CUDA Device Code
14:21
Lecture 08 - Part 01 - CUDA Programming Model
14:13
Lecture 01 - Part 02 - Super Computing and Software
17:53
Lecture 01 - Part 01 - Course Context
18:24
Lecture 07 - Part 03 - GPU Hardware
12:49
Lecture 07 - Part 02 - Programming GPUs
13:54
Lecture 07 - Part 01 - Introduction to GPUs
07:38
Lecture 06 - Part 03 - Nesting and Summary
18:37
Lecture 06 - Part 02 - Scheduling
05:52
Lecture 06 - Part 01 - Parallel Reductions
13:03
Lecture 05 - Part 03 - Scoping and Tasks
18:22
Lecture 05 - Part 02 - Loops and Critical Sections
08:30
Lecture 05 - Part 01 - OpenMP Overview
08:51
Lecture 04 - Part 03 - Memory Bound Code
14:11
Lecture 04 - Part 02 - Compute Bound Code
08:43
Lecture 04 - Part 01 - Optmisation Overview
14:40
Lecture 03 - Part 04 - Structures and Binary Files
10:32
Lecture 03 - Part 03 - Manual Memory Management
15:43
Lecture 03 - Part 02 - Advanced Pointers
20:19
Lecture 03 - Part 01 - Pointers
21:53
Lecture 02 - Part 03 - Arrays Strings and IO
15:23
Lecture 02 - Part 02 - Functions and Scoping
26:53
Lecture 02 - Part 01 - Introducing C
Computing architectures are rapidly changing towards scalable parallel computing devices with many cores. Performance is gained by new designs which favour a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at parallel computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised. Students should be aware that there are limited places available on this course.