It is possible to broadcast 8bit, 16bit, 32bit, 64bit, 128bit and 256bit data onto 512bit register using AVX512 vpbroadcast/vbroadcast instructions.
- vpbroadcastb zmm1, xmm2/m8 is AVX512BW.
- vpbroadcastw zmm1, xmm2/m16 is AVX512BW.
- vpbroadcastd zmm1, xmm2/m32 is AVX512F.
- vpbroadcastq zmm1, xmm2/m64 is AVX512F.
- vbroadcasti32x2 zmm1, xmm2/m64 is AVX512DQ.
- vbroadcasti32x4 zmm1, m128 is AVX512F.
- vbroadcasti64x2 zmm1, m128 is AVX512DQ.
- vbroadcasti32x8 zmm1, m256 is AVX512DQ.
- vbroadcasti64x4 zmm1, m256 is AVX512F.
On large data broadcasts, there is a choice to broadcast data with the same pattern, if it is to simply broadcast data,
- vpbroadcastq is preferable than vbroadcasti32x2
- vbroadcasti32x4 is preferable than vbroadcasti64x2
- vbroadcast64x4 is preferable than vbroadcasti32x8
because the former is more standard AVX512F instruction.
References
- https://www.officedaytime.com/simd512e/simdimg/si.php?f=vbroadcastf128
- Intel® 64 and IA-32 Architectures Software Developer’s Manual
No comments:
Post a Comment