Global memory and the CUDA profiler