Click here to Skip to main content
14,641,306 members
Rate this:
Please Sign up or sign in to vote.
I am optimizing native version of matrix multiplication and I want to optimize it with OpenMP, SIMD and loop reordering. The following code is my attempt. My question is how I can modify the code so that I can avoid expensive memory write which is store function in the inner-most loop.

What I have tried:

void mul_matrix(matrix *result, matrix *mat1, matrix *mat2){
    int I = mat1->rows;
    int J = mat2->cols;
    int K = mat2->rows;
    #pragma omp parallel for
    for(int i = 0; i < I; i++){
        for(int k = 0; k < K; k++){
            _m256d vA = _mm256_set1_pd(mat1->data[i * K + k]);
            for(int j = 0; j < J / 4 * 4; j += 4){
                _m256d sum = _mm256_loadu_pd(result->data + i * J + j);
                _m256d vB = _mm256_loadu_pd(mat2->data + k * J + j);
                _m256d intermediate = _mm256_mul_pd(vA, vB);
                sum = _mm256_add_pd(sum, intermediate);
                _mm256_storeu_pd(result->data + i * J + j, sum);
             }
             for(int x = J / 4 * 4; x < J; x++){
                 result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
             }
         }
     }
}
typedef struct matrix{
    int rows;
    int cols;
    double* data;
}matrix;
Posted
Updated 9-Aug-20 7:26am
v2

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 2

check that you only calculate what you need and extract ANY constant value out of the loops and use const where possible.

Like in
for(int x = J / 4 * 4; x < J; x++){
  result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
}
the constant
int J4 = J / 4 * 4;
matrix *mr = result->data[i * J];
const matrix *m1 = mat1->data[i * K + k];
const matrix *m2 = mat2 -> data[k * J};

for(int x = J4; x < J; x++){
     mr[x] += m1 * mat2->m3[x];
}
Advanced is to create the assembly code and look if it shows some strange operations like type conversion or data aligments.
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100