Pure parallel computing on MacPros

This is a work in progress. So watch this space for revision and addition to the attached document.

Pure parallel computing taken to mean how to do parallel computing. Once the pure is know, then applied parallel computing can be performed. The interest here is performing parallel computing on MacPro computers running on Debian Linux and using free access software for developing applications. Applications are aimed in the science and engineering fields. Reported here are the experimental evaluations of the computational environment to support such applied interests. The applications here were chosen to explore that environment.

OpenMP is taken as the starting point for parallel computation. It is evaluated via execution time while performing prime number finding and matrix multiplication. Vectorization using Intel Intrinsics is applied to matrix multiplication and pattern finding. This vectorization is then extended to multi-cores by pthreads and OpenCilk while using double precision Intrinsics. A comparison performance using CUDA is included. All timing data is presented in tables of maximum, minimum, and mean for 25 runs of the program, which is also listed. All programming is done in C. C compilers gcc, clang, and nvcc are used.

The data suggests double precision using Intel Intrinsics and OpenCilk offers the most promise for parallel computing on a MacPro. Speed ups of 50 to 100 times are reported which far exceed speed ups via other combinations explored in this research.

Document version: Typo and edit correction -- July 2021