Write a program to measure the time it takes to perform an MPI_Allreduce on
MPI_COMM_WORLD. Use a single MPI_DOUBLE as the in and out arguments. Use the
same techniques as in the memcpy assignment to average out variations and
overhead in MPI_Wtime.
Print the size of MPI_COMM_WORLD and time for each test.
Make sure that both sender and reciever are ready when you begin the test.
How does the performance of MPI_Allreduce vary with the size of MPI_COMM_WORLD?