I have cluster of 3 nodes and each node has a quad core cpu and a GPU
why does the performance degrade when i increase number of mpi tasks.
and cpu performs better than gpu when number of tasks are increased.
Is it because of scheduling capability of cpus ?
I am guessing that gpu execution becomes serial as more number of tasks wait for resources.
Is my understanding correct?
It would be great help if u can provide some inputs