课程视频链接: https://www.youtube.com/playlist?list=PLAwxTw4SYaPnFKojVQrmyOGFCqHTxfdv2
[TOC]
1. Unit 1
1.1 typical CUDA Program


1.2 parallel communication patterns

1.3 GPU allocate blocks to SMs

1.3 GPU memory hierarchy


1.4 high level strategies of optimizing performance




2. Unit 2
2.1 parallel communication patters - Map

2.2 parallel communication patters - Gather

2.3 parallel communication patters - Scatter

2.4 parallel communication patters - Stencil

2.5 parallel communication patters - Transpose


2.6 parallel communication patters recap

2.7 SM, Kernel, Thread Blocks, Thread

