课程视频链接https://www.youtube.com/playlist?list=PLAwxTw4SYaPnFKojVQrmyOGFCqHTxfdv2

[TOC]

1. Unit 1

1.1 typical CUDA Program

screenshot-124.390.jpg

screenshot-32.070.jpg

1.2 parallel communication patterns

screenshot-43.120.jpg

1.3 GPU allocate blocks to SMs

screenshot-78.208.jpg

1.3 GPU memory hierarchy

screenshot-80.277.jpg

screenshot-63.013.jpg

1.4 high level strategies of optimizing performance

screenshot-106.397.jpg

2. Unit 2

2.1 parallel communication patters - Map

2.2 parallel communication patters - Gather

2.3 parallel communication patters - Scatter

2.4 parallel communication patters - Stencil

2.5 parallel communication patters - Transpose

2.6 parallel communication patters recap

2.7 SM, Kernel, Thread Blocks, Thread