A tour of Bulk

On this page we aim to highlight some of the main features of the library, and to give an impression of the overall syntax.

Hello `bulk::world`!

We start out with the obligatory Hello World! in Bulk , and subsequently explain the code line-by-line. In this code we will use the MPI backend, but everything written here is completely general, and guaranteed to work on top of any conforming Bulk backend.

#include <bulk/bulk.hpp>
#include <bulk/backends/mpi/mpi.hpp>

int main() {
    bulk::mpi::environment env;
    env.spawn(env.available_processors(), [](auto& world) {
        auto s = world.rank();
        auto p = world.active_processors();

        world.log("Hello world from processor %d / %d!", s, p);
    });
}

On lines 1 and 2 we include the library, and the backend of our choosing (in our case MPI). On line 5, we initialize an environment, which sets up the parallel or distributed system.

On line 6, we spawn the SPMD section of our program within the environment. The first argument denotes the number of processors that we want to run the section on, while the second argument provides a function-like object (here a C++ lambda function) that is executed on the requested number of processors. This function obtains a bulk::world object, which it can use to communicate with other processors. For convenience we suggest to alias the processor identifier (or rank) to s and the total number of processes that are spawned to p. As shown in the example, these can be obtained from world using world.rank() and world.active_processors() respectively.

Communication between processors

Next, we look at some basic forms of communication between processors. The main way to talk to other processors, is by using variables. A variable is created as follows:

auto x = bulk::var<T>(world);

Here, T is the type of the variable, for example an int. Values can be assigned to the (local) variable:

x = 5;

The reason to use such a distributed variable, is that a processor can write to a remote image of a variable.

bulk::put(world.next_rank(), 4, x);
// or the short-hand:
x(world.next_rank()) = 4;

This will overwrite the value of the variable x on the next logical processor (i.e. processor (s + 1) % p) with 4. We can obtain the value of a remote image using:

auto y = bulk::get(world.next_rank(), x);
// or the short-hand:
auto y = x(world.next_rank()).get();

Here, y is a bulk::future object. A future object does not immediately hold the remote value of x, but after a future call to world.sync(), we can extract the remote value out of y.

world.sync();
auto x_next = y.value();

Coarrays

Coarrays are a convenient way to store, and manipulate distributed data. We provide a coarray that is modeled after Coarray Fortran. Arrays are initialized and used as follows:

auto xs = bulk::coarray<int>(world, s);
xs(3)[2] = 1;

Here, we create a coarray of varying local size (each processor holds s many elements). Next we write the value 1 to the element with local index 2 on processor with index 3.

Algorithmic skeletons

Bulk comes equipped with a number of higher-level functions, also known as algorithmic skeletons. For example, say we want to compute the dot-product of two coarrays, then we write this as:

auto xs = bulk::coarray<int>(world, s);
auto ys = bulk::coarray<int>(world, s);

// fill xs and ys with data
auto result = bulk::var<int>(world);
for (int i = 0; i < s; ++i) {
    result.value() += xs[i] * ys[i];
}

// reduce to find global dot product
auto alpha = bulk::foldl(result, [](int& lhs, int rhs) { lhs += rhs; });

Here we first compute the local inner product, and finally use the higher-level function bulk::foldl result.

Another example is finding a maximum element over all processors, here max is the maximum value found locally:

auto maxs = bulk::gather_all(world, max);
max = *std::max_element(maxs.begin(), maxs.end());

A tour of Bulk

Hello bulk::world!

Communication between processors

Coarrays

Algorithmic skeletons

Hello `bulk::world`!