Collectives

One-sided

C | Fortran-2008 | Fortran-90

MPI_Win_allocate_shared

Definition

MPI_Win_allocate_shared is a variant of MPI_Win_allocate which exposes a window containing data accessible by other MPI processes via remote-memory access operations such as MPI_Put and MPI_Get. However, with MPI_Win_allocate_shared, the data exposed through the window is also accessible via direct store and load operations. Note that the user must ensure that all MPI processes in the communicator concerned in the MPI_Win_allocate_shared operation can share the memory segments created. For instance, MPI processes located on distinct nodes do not satisfy this condition. MPI_Comm_split_type can be used to find the MPI processes able to create a shared memory region in a given communicator, by passing the MPI_COMM_TYPE_SHARED split type. MPI_Win_allocate_shared is a collective operation; it must be called by all MPI processes in the communicator concerned.

Copy

Feedback

int MPI_Win_allocate_shared(MPI_Aint size,
                            int displacement_unit,
                            MPI_Info info,
                            MPI_Comm communicator,
                            void* base,
                            MPI_Win* window);

Parameters

size

The size of the memory area exposed through the window, in bytes.

displacement_unit

The displacement unit is used to provide an indexing feature during RMA operations. Indeed, the target displacement specified during RMA operations is multiplied by the displacement unit on that target. The displacement unit is expressed in bytes, so that it remains identical in an heterogeneous environment.

info

The info argument provides optimisation hints to the runtime about the expected usage pattern of the window.

  • no_locks: if set to true, then the implementation may assume that passive target synchronisation (that is, MPI_Win_lock, MPI_Win_lock_all) will not be used on the given window. This implies that this window is not used for 3-party communication, and RMA can be implemented with no (less) asynchronous agent activity at this process.
  • accumulate_ordering: controls the ordering of accumulate operations at the target. The default value is rar,raw,war,waw.
  • accumulate_ops: if set to same_op, the implementation will assume that all concurrent accumulate calls to the same target address will use the same operation. If set to same_op_no_op, then the implementation will assume that all concurrent accumulate calls to the same target address will use the same operation or MPI_NO_OP. This can eliminate the need to protect access for certain operation types where the hardware can guarantee atomicity. The default is same_op_no_op.
  • same_size: if set to true, then the implementation may assume that the argument size is identical on all processes, and that all processes have provided this info key with the same value.
  • same_disp_unit: if set to true, then the implementation may assume that the argument displacement_unit is identical on all processes, and that all processes have provided this info key with the same value.
communicator

The communicator containing all MPI processes involved in RMA communications. The various processes in the corresponding group may specify completely different target windows, in location, size, displacement units and info arguments. As long as all the get, put and accumulate accesses to a particular process fit their specific target window this should pose no problem. The same area in memory may appear in multiple windows, each associated with a different window object. However, concurrent communications to distinct, overlapping windows may lead to undefined results. Also, the MPI processes in this communicator must be able to create a memory segment that can be shared with all other MPI processes in the group.

base

The address of locally allocated window segment.

window

A pointer to the variable in which store the window created.

Return value

The error code returned from the shared window creation:

Example

Copy

Feedback

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

/**
 * @brief Illustrate how to create a shared window.
 * @details This application consists in creating a shared window and interact
 * with it using direct memory accesses. Two MPI processes are used, each will
 * hold an integer initialised to 100. MPI process 0 will increment MPI process
 * 1's variable, and MPI process 1 will decrement MPI process 0's variable.
 * In this application, the overall shared window uses the default configuration
 * where it is made of contiguous data.
 *
 * This can be visualised as follows:
 *
 * - Start situation:
 *         Held on MPI process 0 | Held on MPI process 1
 *                         +-----+-----+
 *                         | 100 | 100 |
 *                         +-----+-----+
 *         My element = array[0] | My element = array[0]
 *       Peer element = array[1] | Peer element = array[-1]
 *
 * - End situation:
 *         Held on MPI process 0 | Held on MPI process 1
 *                         +-----+-----+
 *                         |  99 | 101 |
 *                         +-----+-----+
 *         My element = array[0] | My element = array[0]
 *       Peer element = array[1] | Peer element = array[-1]
 *
 * This code assumes MPI processes must be able to physically share memory.
 **/
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);

    // Check that only 2 MPI processes are spawn
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    if(comm_size != 2)
    {
        printf("This application is meant to be run with 2 MPI processes, not %d.\n", comm_size);
        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
    }

    // Get my rank
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    // Create the window
    const int ARRAY_SIZE = 1;
    int* window_buffer;
    MPI_Win window;
    MPI_Win_allocate_shared(ARRAY_SIZE * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &window_buffer, &window);
    printf("[MPI process %d] Window created.\n", my_rank);

    // Initialise my element
    *window_buffer = 100;
    printf("[MPI process %d] Value before direct write from MPI process %d: %d.\n", my_rank, comm_size - 1 - my_rank, *window_buffer);

    // Modify peer's element
    MPI_Barrier(MPI_COMM_WORLD);
    if(my_rank == 0)
    {
        window_buffer[1]++;
    }
    else
    {
        window_buffer[-1]--;
    }
    MPI_Barrier(MPI_COMM_WORLD);

    // Check end values
    printf("[MPI process %d] Value after direct write from MPI process %d: %d.\n", my_rank, comm_size - 1 - my_rank, *window_buffer);

    // Destroy the window
    printf("[MPI process %d] Window destroyed.\n", my_rank);
    MPI_Win_free(&window);

    MPI_Finalize();

    return EXIT_SUCCESS;
}