The downloadable file here gives data comparing OpenCL and OpenCilk programming which performs double precision floating point matrix multiplication as the example.
All programs are written in C, compiled using the clang compiler, and executed on a Mac Pro 2019 operating under Debian Linux. The matrices are square, ranging in size from 10000 to 1000. Both standard C row matrix storage and column storage of the second matrix are compared. Comparison is based on execution time in performing matrix multiplication which is proposed as a typical intensive conputation.
Coparison with modification to serial implementation of programs to perform the same operations are given. Using multi-core OpenCL and OpenCilk 'for' loops alone is shown to greatly reduce execution time with respect to that achievable using serial programming. Multi-cores 56 in number are compared to a single core as used in the serial case.
Programs used and their changes to implement the different configurations are listed. Determining of control parameters to minimize execution time is covered. Full data covering all program configurations are tabulated, then graphed to show significant relations between those configurations. Speed ups obtained are given and placed into the perspective of large execution time reductions.
Sufficient information is given to enable similar data to be obtained from other computing environments and applications for thorough assessment.
Revised -- December 2023 -- graphs and results from them