Prefetching

memory hierarchy?

Prefetching is a technique in which data blocks needed in the future are brought into the cache early by using special instructions that specify the address of the block. This is prediction.

Prefetching is a technique used to improve memory access performance by bringing data into the cache before it is actually needed. In CUDA, prefetching is achieved by using the cudaMemPrefetchAsync function, which is used to asynchronously prefetch data from host or device memory to device memory.

See: Boosting Application Performance with GPU Memory Prefetching