如上图红框所示,我不明白为什么一个线程连续访问两个数据数组会导致bank冲突,但是下面的访问,如下图,不会引起冲突。
谢谢你的回答!!!
https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/
共享内存库冲突
To achieve high memory bandwidth for concurrent accesses, shared memory is divided into equally sized memory modules (banks) that can be accessed simultaneously. Therefore, any memory load or store of n addresses that spans b distinct memory banks can be serviced simultaneously, yielding an effective bandwidth that is b times as high as the bandwidth of a single bank.
However, if multiple threads’ requested addresses map to the same memory bank, the accesses are serialized. The hardware splits a conflicting memory request into as many separate conflict-free requests as necessary, decreasing the effective bandwidth by a factor equal to the number of colliding memory requests. An exception is the case where all threads in a warp address the same shared memory address, resulting in a broadcast. Devices of compute capability 2.0 and higher have the additional ability to multicast shared memory accesses, meaning that multiple accesses to the same location by any number of threads within a warp are served simultaneously.
Let's assume that there are 8 memory banks of size 4 bytes for your example of parallel reduction. Element i is served by bank i % 8.
然后,在第一个示例中,bank 0、2、4、6 需要处理两个请求。
在第二个例子中,每家银行只需要处理一个请求。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句