What happens if an invalid address is prefetched?

Daniel Langr

Simple MWE:

int* ptr = (int*)malloc(64 * sizeof(int));
_mm_prefetch((const char*)(ptr + 64), _MM_HINT_0);
  1. Is this defined or undefined behavior?
  2. Can this raise a signal and abort the program run?

I'm asking since I can see such prefetching in compiler generated code, where inside a loop prefetching is done without checking the address (stored in rbx):

400e73:       49 83 c5 40             add    r13,0x40
400e77:       62 f1 f9 08 28 03       vmovapd zmm0,ZMMWORD PTR [rbx]
400e7d:       4d 3b ec                cmp    r13,r12
400e80:       62 d1 f9 08 eb 4d ff    vporq  zmm1,zmm0,ZMMWORD PTR [r13-0x40]
400e87:       90                      nop
400e88:       62 d1 78 08 29 4d ff    vmovaps ZMMWORD PTR [r13-0x40],zmm1
400e8f:       72 03                   jb     400e94 <main+0x244>
400e91:       49 89 c5                mov    r13,rax
400e94:       62 f1 78 08 18 53 1d    vprefetch1 [rbx+0x740]
400e9b:       ff c1                   inc    ecx
400e9d:       62 f1 78 08 18 4b 02    vprefetch0 [rbx+0x80]
400ea4:       48 83 c3 40             add    rbx,0x40
400ea8:       81 f9 00 00 10 00       cmp    ecx,0x100000
400eae:       72 c3                   jb     400e73 <main+0x223>

First of all, the compiler doing it or you doing it are very different things in theory. Just because it looks equivalent doesn't make it so, the compiler is allowed to use any dirty hacks that work no matter whether they're expressible or defined in fully standard C.

Of course prefetching doesn't generate signals*, it would be nearly useless if it did. It can be very slow for some invalid pointers though, depending on whether they trigger a TLB miss. So the compiler can safely use it, but it shouldn't indiscriminately use it for everything ever.

Now using pointer arithmetic to create out of bounds pointers (except just past the end) is UB in theory, but when applied to a pointer it's the kind of UB that will mostly work anyway (with flat memory it's just an addition, the only way it could fail is if the compiler goes out of its way to detect it, and that means it would have to reason about dynamic sizes). Obviously the above case must be supported by compilers claiming to support SSE intrinsics otherwise you couldn't reasonably use prefetching, as demonstrated by this answer (and there's a bunch more extra guarantees they must make on top of the Standard).

* from the manual:

The PREFETCHh instruction is merely a hint and does not affect program behavior.

Signal would affect program behavior, so they cannot be generated.

