14,641,306 members
Rate this:
See more:
I am optimizing native version of matrix multiplication and I want to optimize it with OpenMP, SIMD and loop reordering. The following code is my attempt. My question is how I can modify the code so that I can avoid expensive memory write which is store function in the inner-most loop.

What I have tried:

```void mul_matrix(matrix *result, matrix *mat1, matrix *mat2){
int I = mat1->rows;
int J = mat2->cols;
int K = mat2->rows;
#pragma omp parallel for
for(int i = 0; i < I; i++){
for(int k = 0; k < K; k++){
_m256d vA = _mm256_set1_pd(mat1->data[i * K + k]);
for(int j = 0; j < J / 4 * 4; j += 4){
_m256d sum = _mm256_loadu_pd(result->data + i * J + j);
_m256d vB = _mm256_loadu_pd(mat2->data + k * J + j);
_m256d intermediate = _mm256_mul_pd(vA, vB);
sum = _mm256_add_pd(sum, intermediate);
_mm256_storeu_pd(result->data + i * J + j, sum);
}
for(int x = J / 4 * 4; x < J; x++){
result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
}
}
}
}
typedef struct matrix{
int rows;
int cols;
double* data;
}matrix;```
Posted
Updated 9-Aug-20 7:26am
v2

Rate this:

## Solution 2

check that you only calculate what you need and extract ANY constant value out of the loops and use const where possible.

Like in
```for(int x = J / 4 * 4; x < J; x++){
result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
}```
the constant
```int J4 = J / 4 * 4;
matrix *mr = result->data[i * J];
const matrix *m1 = mat1->data[i * K + k];
const matrix *m2 = mat2 -> data[k * J};

for(int x = J4; x < J; x++){
mr[x] += m1 * mat2->m3[x];
}```
Advanced is to create the assembly code and look if it shows some strange operations like type conversion or data aligments.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Top Experts
Last 24hrsThis month
 OriginalGriff 303 Sandeep Mewara 180 Richard MacCutchan 88 Maciej Los 80 BillWoodruff 60
 OriginalGriff 5,531 Sandeep Mewara 2,966 Richard MacCutchan 1,799 Richard Deeming 1,619 CPallini 1,607

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100