However, a possible remainder is not considered when distributing the elements.

C++

unsigned pSize = N / size; // Number of elements per process unsigned remainder = N % size; // Remaining elements to be distributed // Check whether the current process should get more elements if (rank < remainder) { pSize++; }

edit:

The length of the data to be sent does not seem necessary. Instead of sending the result data and receiving it in process 0 in a loop with MPI_Recv rather MPI_Reduce would make sense.

All processes calculate a partial result, it would be therefore usual not to hold the formula twice in the code. Also the initialization of the variables should not be done in separate if else branches.

Example:

C++

srand((unsigned)time(0)); if (rank == 0) { for (int i = 0; i < N; i++) { A[i] = (rand() % 20) / 2.; C[i] = (rand() % 20) / 2.; } } unsigned pSize = N / size; unsigned remainder = N % size; if (rank < remainder) { pSize++; } // Distribution of the data in parts if (rank == 0) { // Send a range of data with MPI_ISend (non blocking) } else { // receive a range of data with MPI_Recv (blocking) } // Calculation with all processes (without else) for (int i = 0; i < pSize; i++) { double temp = A[i] + C[i]; localMin = min(localMin, temp); } // Collecting the data with MPI_Reduce (blocking)

I'm sure you would find more points, but I won't comment further. Just a note that I see no need for MPI_Wait or MPI_Waitall.