Inline assembly has been shown to given the fastest vector processing available from Intel SIMD hardware. However the latest release of those instructions are more complex to program. In the attached file extending the previous released instructions actioned on the latest vector lengths is presented. This doubles the vector length.
Inline assembly is not complicated when using AVX instructions. Here extended inline assembly is used to embed vector processing using AVX instructions within a C program. Data flows between the two parts under programmer direction. The program is compiled using gcc or clang alone.
This promotion of AVX instructions is not universal, so tables of those instructions which were promoted are given. Both useful arithmetic and data permutation actions were found.
The information here is a tool for exploiting the available vector hardware singularly and in an interconnected manner.
Further amendments --- June 2024