MPI Scatter and Ibcast leads to deadlock

66 views Asked by At

I have tried to create a simple program that scatters and reduces some simple data between processes in MPI and in the meantime sends a broadcast to other processes in a non blocking way.

Almost all of my tries result in a deadlock and i cannot undestand why since the broadcast call is nonblocking.

This is my code:

#include<iostream>
#include <algorithm> 
#include <memory> 
#include <random>
#include "mpi.h"

int main(int argc, char *argv[]){
    MPI_Init(&argc, &argv);

    int num_tasks;
    MPI_Comm_size(MPI_COMM_WORLD, &num_tasks);

    const int num_elements = 1 << 10;
    const int chunk_size = num_elements / num_tasks;

    int task_id;
    int local_buffer;
    int num_proc;
    MPI_Status stat;
    MPI_Request req;
    MPI_Comm_rank(MPI_COMM_WORLD, &task_id);
    MPI_Comm_size(MPI_COMM_WORLD, &num_proc);



    std::unique_ptr<int[]> send_ptr;
    
    if(task_id == 0){
        local_buffer = 42;
        send_ptr = std::make_unique<int []>(num_elements);
    
        std::random_device rd;
        std::mt19937 mt(rd());
        std::uniform_int_distribution dist(1, 1);
    
        std::generate(send_ptr.get(), send_ptr.get() + num_elements,
                                    [&] {return dist(mt);});
    }
    
    std::cout << "Processor : " << task_id << " declares : " << local_buffer << std::endl;
    
    auto recv_buffer = std::make_unique<int []>(chunk_size);
    
    std::cout << "Before di scatter" << std::endl;
    MPI_Scatter(send_ptr.get(), chunk_size, MPI_INT, recv_buffer.get(), chunk_size, MPI_INT, 0, MPI_COMM_WORLD);
    std::cout << "After scatter" << std::endl;

    int local_result = 0;
    for(int i = 0; i < chunk_size; i++){
        std::cout << "Processor: " << task_id << " completition: "<<
    ((float)i/chunk_size)*100 << "%\n";
        local_result += recv_buffer[i];
    }
    
    MPI_Ibcast(&local_buffer, 1, MPI_INT, 0, MPI_COMM_WORLD, &req);
    
    int global_result;
    MPI_Reduce(&local_result, &global_result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    
    if (task_id == 0){
        std::cout << "global result : " << global_result << "\n";
    }

    MPI_Wait(&req, MPI_STATUS_IGNORE);
    std::cout << "Processor : " << task_id << " recieves the value : " << local_buffer << std::endl;
    MPI_Finalize();
    return 0;
}

I have tried to change all the calls in a non-blocking version but often i get wrong results. Getting rid of the final waits clears the program of any deadlock but at the cost of having also wrong output.

Removing the broadcast results in a correct output with no deadlocks every time

What i needed is simply to have a program that sends a broadcast at the end of the sum and the try and modify it to make so that the first process to finish is the one to send the broadcast

I would also point out that removing Scatter and Reduce function allows the program to finish without deadlocks, i tought the problem might be caused by some sort of interaction between the Ibcast and the Scatter/Reduce op. but to me this does not make a lot of sense since they are working on different buffers and the broadcast is non-blocking

Moving the broadcast call before the Scatter operation also results in the Scatter never being executed and a frozen execution

0

There are 0 answers