API Reference

On this page we give a complete overview of all the primitives we expose in the main EBSP library.

Host

bsp_init

int bsp_init(const char *e_name, int argc, char **argv)

Initializes the BSP system.

Sets up all the BSP variables and loads the epiphany BSP program.

Return
1 on success, 0 on failure
Parameters
  • e_name: A string with the srec binary name of the Epiphany program
  • argc: The number of input arguments
  • argv: An array of strings with the input arguments

The string e_name must be of the form myprogram.srec. This function will search for the file in the same directory as the host program, and not in the current working directory.

Usage example:

int main(int argc, char** argv)
{
    bsp_init("e_program.srec", argc, argv);
    ...
    return 0;
}

Remark
The argc and argv parameters are ignored in the current implementation.

ebsp_spmd

int ebsp_spmd()

Runs the Epiphany program on the Epiphany cores.

This function will block until the BSP kernel program is finished.

Return
1 on success, 0 on failure (e.g. after bsp_abort is called on a core)

bsp_begin

int bsp_begin(int nprocs)

Loads the BSP program onto the Epiphany cores.

Usage example:

int main(int argc, char** argv)
{
    bsp_init("e_program.srec", argc, argv);
    bsp_begin(bsp_nprocs());
    ...
    return 0;
}
Return
1 on success, 0 on failure
Parameters
  • nprocs: The number of processors to run on

Remark
The current implementation only allows nprocs to be a multiple of 4 on the 16-core Parallella. Other values of nprocs are rounded down.

bsp_end

int bsp_end()

Finalizes and cleans up the BSP program.

Usage example:

int main(int argc, char** argv)
{
    bsp_init("e_program.srec", argc, argv);
    bsp_begin(bsp_nprocs());
    ebsp_spmd();
    bsp_end();
    return 0;
}
Return
1 on success, 0 on failure

Remark
This function is different from the bsp_end function in e_bsp.h

bsp_nprocs

int bsp_nprocs()

Returns the number of available processors (Epiphany cores).

This function may be called after

bsp_init().
Return
The number of available processors

ebsp_set_tagsize

void ebsp_set_tagsize(int *tag_bytes)

Set initial tagsize for message passing.

The default tagsize is zero. This function should be called at most once, before any messages are sent. Calling this when receiving messages results in undefined behaviour.

Parameters
  • tag_bytes: A pointer to an integer containing the new tagsize, receiving the old tagsize on return.

It is not possible to send messages with different tag sizes. Doing so will result in undefined behaviour.

Remark
The tagsize set using this function is also used for inter-core messages.

ebsp_send_down

void ebsp_send_down(int pid, const void *tag, const void *payload, int nbytes)

Send a message to the Epiphany cores.

This is the preferred way to send initial data (for computation) to the Epiphany cores.

Parameters
  • pid: The pid of the target processor
  • tag: A pointer to the message tag
  • payload: A pointer to the data payload
  • nbytes: The size of the payload in bytes

The size of the buffer pointed to by tag has to be tagsize, and must be the same for every message being sent.

ebsp_get_tagsize

int ebsp_get_tagsize()

Get the tagsize as set by the Epiphany program.

Use only for gathering result messages at the end of a BSP program.

Return
The tagsize in bytes

When ebsp_spmd() returns, the Epiphany program can have set a different tagsize which can be obtained using this function.

ebsp_qsize

void ebsp_qsize(int *packets, int *accum_bytes)

Get the amount of messages in the queue and their total size in bytes.

Use only for gathering result messages at the end of a BSP program.

Parameters
  • packets: A pointer to an integer receiving the number of messages
  • accum_bytes: The total size of the data payloads of the messages, in bytes.

ebsp_get_tag

void ebsp_get_tag(int *status, void *tag)

Peek the next message.

Use only for gathering result messages at the end of a BSP program.

Parameters
  • status: A pointer to an integer receiving the amount of bytes of the next message payload, or -1 if there are no more messages.
  • tag: A pointer to a buffer receiving the tag of the next message. This buffer should be large enough (ebsp_get_tagsize()).

ebsp_move

void ebsp_move(void *payload, int buffer_size)

Get the next message from the message queue and pop the message.

This will copy the payload and pop the message from the queue. The size of the payload can be obtained by calling bsp_get_tag(). If

buffer_size is smaller than the data payload then the data is truncated.
Parameters
  • payload: A pointer to a buffer receiving the data payload
  • buffer_size: The size of the buffer

Use only for gathering result messages at the end of a BSP program.

ebsp_hpmove

int ebsp_hpmove(void **tag_ptr_buf, void **payload_ptr_buf)

Get the next message, with tag, from the queue and pop the message.

This is the faster alternative of

ebsp_move(), as this function does not copy the data but returns the pointers to it.
Return
The number of bytes of the payload data
Parameters
  • tag_ptr_buf: A pointer to a pointer receiving the location of the tag
  • payload_ptr_buf: A pointer to a pointer receiving the location of the data pyaload

Use only for gathering result messages at the end of a BSP program.

bsp_stream_create

void *bsp_stream_create(int stream_size, int token_size, const void *initial_data)

Creates a generic stream for streaming data to or from an Epiphany core.

The function returns NULL on failure.

Return
A pointer to a section of external memory storing the tokens.
Parameters
  • stream_size: The total number of bytes of data in the stream.
  • token_size: The size in bytes of a single token. Must be at least 16.
  • initial_data: (Optional) The data which should be streamed to an Epiphany core.

If initial_data is nonzero, it is copied to the stream (stream_size bytes). If initial_data is zero, an empty stream of size stream_size is created. In this case, stream_size should be the maximum number of bytes that will be sent up from the Epiphany cores to the host.

This function prints an error if token_size is less than 16.

The format of the data pointed to by the return value is as follows: Before every token, there are two integers that specify the size of the preceding token and the size of the token itself.

00000000, nextsize, data, prevsize, nextsize, data, … prevsize, nextsize, data, prevsize, 00000000

So a header consists of two integers (8 byte total). The two sizes do NOT include these headers. They are only the size of the data inbetween.

If you want to use the returned pointer directly you have to manually take care of this data format.

Remark
If initial_data is nonzero, the data is copied so that after the call it can safely be freed or overwritten by the user.

ebsp_write

int ebsp_write(int pid, void *src, off_t dst, int size)

Write data to the Epiphany processor.

This is an alternative to the BSP Message Passing system.

Return
1 on success, 0 on failure
Parameters
  • pid: The pid of the target processor
  • src: A pointer to the source data
  • dst: The destination address (as seen by the Epiphany core)
  • size: The amount of bytes to be copied

ebsp_read

int ebsp_read(int pid, off_t src, void *dst, int size)

Read data from the Epiphany processor.

This is an alternative to the BSP Message Passing system.

Return
1 on success, 0 on failure
Parameters
  • pid: The pid of the source processor
  • src: The source address (as seen by the Epiphany core)
  • dst: A pointer to a buffer receiving the data
  • size: The amount of bytes to be copied

ebsp_set_sync_callback

void ebsp_set_sync_callback(void (*cb)())

Set the (optional) callback for synchronizing epiphany cores with the host program.

This callback is called when all Epiphany cores have called ebsp_host_sync(). Note that this does not happen at bsp_sync().

Parameters
  • cb: A function pointer to the callback function

ebsp_set_end_callback

void ebsp_set_end_callback(void (*cb)())

Set the (optional) callback for finalizing.

This callback is called when

ebsp_spmd() finishes. It is primarily used by the ebsp memory inspector and should not be needed.
Parameters
  • cb: A function pointer to the callback function

Epiphany

bsp_begin

void bsp_begin()

Denotes the start of a BSP program.

This initializes the BSP system on the core.

Must be called before calling any other BSP function. Should only be called once in a program.

bsp_end

void bsp_end()

Denotes the end of a BSP program.

Finalizes and cleans up the BSP program. No other BSP functions are allowed to be called after this function is called.

Remark
Must be followed by a return statement in your main function if you want to call ebsp_spmd() multiple times.

bsp_nprocs

int bsp_nprocs()

Obtain the number of Epiphany cores currently in use.

Return
An integer indicating the number of cores on which the program runs.

bsp_pid

int bsp_pid()

Obtain the processor identifier of the local core.

Return
An integer with the id of the core The processor id is an integer in the range [0, .., bsp_nprocs() - 1].

bsp_time

float bsp_time()

Obtain the time in seconds since bsp_begin() was called.

The native Epiphany timer does not support time differences longer than

UINT_MAX/(600000000) which is roughly 7 seconds.
Return
A floating point value with the number of elapsed seconds since the call to bsp_begin()

If you want to measure longer time intervals, we suggest you use the (less accurate) ebsp_host_time().

Remark
Using this in combination with ebsp_raw_time() leads to unspecified behaviour, you should only use one of these in your program.
Remark
This uses the internal Epiphany E_CTIMER_0 timer so the second timer can be used for other purposes.

ebsp_host_time

float ebsp_host_time()

Obtain the time in seconds since bsp_begin() was called.

This function uses the system clock of the host to obtain the elapsed time. Because of varying amounts of latency this can be very inaccurate (its precision is in the order of milliseconds), but it supports time intervals of arbitrary length.

Return
A floating point value with the number of seconds since bsp_begin()

ebsp_raw_time

unsigned int ebsp_raw_time()

Obtain the number of clockcycles that have passed since the previous call to ebsp_raw_time().

This function has less overhead than bsp_time.

Return
An unsigned integer with the number of clockcycles

Divide the number of clockcycles by 600 000 000 to get the time in seconds.

Remark
Using this in combination with bsp_time() leads to unspecified behaviour, you should only use one of these in your program.
Remark
This uses the internal Epiphany E_CTIMER_0 timer so the second timer can be used for other purposes.

bsp_sync

void bsp_sync()

Denotes the end of a superstep, and performs all outstanding communications and registrations.

Serves as a blocking barrier which halts execution until all Epiphany cores are finished with the current superstep.

If only a synchronization is required, and you do not want the outstanding communications and registrations to be resolved, then we suggest you use the more efficient function ebsp_barrier()

ebsp_barrier

void ebsp_barrier()

Synchronizes cores without resolving outstanding communication.

This function is more efficient than bsp_sync().

bsp_push_reg

void bsp_push_reg(const void *variable, const int nbytes)

Register a variable as available for remote access.

The operation takes effect after the next call to

bsp_sync(). Only one registration is allowed in a single superstep. When a variable is registered, every core must do so.
Parameters
  • variable: A pointer to the local variable
  • nbytes: The size in bytes of the variable

The system maintains a stack of registered variables. Any variables registered in the same superstep are identified with each other. There is a maximum number of allowed registered variables at any given time, the specific number is platform dependent. This limit will be lifted in a future version.

Registering a variable needs to be done before it can be used with the functions bsp_put(), bsp_hpput(), bsp_get(), bsp_hpget().

Usage example:

int a, b, c, p;
int x[16];

bsp_push_reg(&a, sizeof(int));
bsp_sync();
bsp_push_reg(&x, sizeof(x));
bsp_sync();

p = bsp_pid();

// Get the value of the `a` variable of core 0 and save it in `b`
bsp_get(0, &a, 0, &b, sizeof(int));

// Save the value of `c` into the array `x` on core 0, at array location p
bsp_put(0, &c, &x, p*sizeof(int), sizeof(int));

Remark
In the current implementation, the parameter nbytes is ignored. In future versions it will be used to make communication more efficient.

bsp_pop_reg

void bsp_pop_reg(const void *variable)

De-register a variable for remote memory access.

The operation takes effect after the next call to

bsp_sync(). The order in which the variables are popped does not matter.
Parameters
  • variable: A pointer to the variable, which must have been previously registered with bsp_push_reg()

bsp_put

void bsp_put(int pid, const void *src, void *dst, int offset, int nbytes)

Copy data to another processor (buffered).

The data in src is copied to a buffer (currently in the inefficient external memory) at the moment bsp_put is called. Therefore the caller can replace the data in src right after bsp_put returns. When

bsp_sync() is called, the data will be transferred from the buffer to the destination at the other processor.
Parameters
  • pid: The pid of the target processor (this is allowed to be the id of the sending processor)
  • src: A pointer to the source data
  • dst: A variable location that was previously registered using bsp_push_reg()
  • offset: The offset in bytes to be added to the remote location corresponding to the variable location dst
  • nbytes: The number of bytes to be copied

Remark
No warning is thrown when nbytes exceeds the size of the variable src.
Remark
The current implementation uses external memory which restrains the performance of this function greatly. We suggest you use bsp_hpput() wherever possible to ensure good performance.

bsp_get

void bsp_get(int pid, const void *src, int offset, void *dst, int nbytes)

Copy data from another processor (buffered)

No data transaction takes place until the next call to bsp_sync, at which point the data will be copied from source to destination.

Parameters
  • pid: The pid of the target processor (this is allowed to be the id of the sending processor)
  • src: A variable that has been previously registered using bsp_push_reg()
  • dst: A pointer to a local destination
  • offset: The offset in bytes to be added to the remote location corresponding to the variable location src
  • nbytes: The number of bytes to be copied

Remark
The official BSP standard dictates that first all the data of all bsp_get() transactions is copied into a buffer, after which all the data is written to the proper destinations. This would allow one to use bsp_get to swap to variables in place. Because of memory constraints we do not comply with the standard. In our implementation. The bsp_get() transactions are all executed at the same time, therefore such a swap would result in undefined behaviour.
Remark
No warning is thrown when nbytes exceeds the size of the variable src.

bsp_hpput

void bsp_hpput(int pid, const void *src, void *dst, int offset, int nbytes)

Copy data to another processor, unbuffered.

The data is immediately copied into the destination at the remote processor, as opposed to bsp_put which first copies the data to a buffer. This means the programmer must make sure that the other processor is not using the destination at this moment. The data transfer is guaranteed to be complete after the next call to

bsp_sync().
Parameters
  • pid: The pid of the target processor (this is allowed to be the id of the sending processor)
  • src: A pointer to local source data
  • dst: A variable location that was previously registered using bsp_push_reg()
  • offset: The offset in bytes to be added to the remote location corresponding to the variable location dst
  • nbytes: The number of bytes to be copied

Remark
No warning is thrown when nbytes exceeds the size of the variable src.

bsp_hpget

void bsp_hpget(int pid, const void *src, int offset, void *dst, int nbytes)

Copy data from another processor.

This function is the unbuffered version of bsp_get().

As opposed to

bsp_get(), the data is transferred immediately When bsp_hpget() is called. When using this function you must make sure that the source data is available and prepared upon calling. For performance reasons, communication using this function should be preferred over buffered communication.
Parameters
  • pid: The pid of the target processor (this is allowed to be the id of the sending processor)
  • src: A variable that has been previously registered using bsp_push_reg()
  • dst: A pointer to a local destination
  • offset: The offset in bytes to be added to the remote location corresponding to the variable location src
  • nbytes: The number of bytes to be copied

Remark
No warning is thrown when nbytes exceeds the size of the variable src.

bsp_set_tagsize

void bsp_set_tagsize(int *tag_bytes)

Set the tag size.

Upon return, the value pointed to by tag_bytes will contain the old tag size. The new tag size will take effect in the next superstep, so that messages sent in this superstep will have the old tag size.

Parameters
  • tag_bytes: A pointer to the tag size, in bytes

ebsp_get_tagsize

int ebsp_get_tagsize()

Obtain the tag size.

This function gets the tag size currently in use. This tagsize remains valid until the start of the next superstep.

Return
The tag size in bytes

bsp_send

void bsp_send(int pid, const void *tag, const void *payload, int nbytes)

Send a message to another processor.

This will send a message to the target processor, using the message passing system. The tag size can be obtained by ebsp_get_tagsize. When this function returns, the data has been copied so the user can use the buffer for other purposes.

Parameters
  • pid: The pid of the target processor (this is allowed to be the id of the sending processor)
  • tag: A pointer to the tag data
  • payload: A pointer to the data payload
  • nbytes: The size of the data payload

bsp_qsize

void bsp_qsize(int *packets, int *accum_bytes)

Obtain The number of messages in the queue and the combined size in bytes of their data.

Upon return, the integers pointed to by packets and accum_bytes will hold the number of messages in the queue, and the sum of the sizes of their data payloads respectively.

Parameters
  • packets: A pointer to an integer which will be overwritten with the number of messages
  • accum_bytes: A pointer to an integer which will be overwritten with the combined number of bytes of the message data.

bsp_get_tag

void bsp_get_tag(int *status, void *tag)

Obtain the tag and size of the next message without popping the message.

Upon return, the integer pointed to by status will receive the size of the data payload in bytes of the next message in the queue. If there is no next message it will be set to -1. The buffer pointed to by tag should be large enough to store the tag. The minimum size can be obtained by calling ebsp_get_tagsize.

Parameters
  • status: A pointer to an integer receiving the message data size in bytes.
  • tag: A pointer to a buffer receiving the message tag

bsp_move

void bsp_move(void *payload, int buffer_size)

Obtain the next message from the message queue and pop the message.

This will copy the payload and pop the message from the queue. The size of the payload can be obtained by calling

bsp_get_tag(). If buffer_size is smaller than the data payload then the data is truncated.
Parameters
  • payload: A pointer to a buffer receiving the data payload
  • buffer_size: The size of the buffer

bsp_hpmove

int bsp_hpmove(void **tag_ptr_buf, void **payload_ptr_buf)

Obtain the next message, with tag, from the queue and pop the message.

This function will give the user direct pointers to the tag and data of the message. This avoids the data copy as done in

bsp_move().
Return
The number of bytes of the payload data
Parameters
  • tag_ptr_buf: A pointer to a pointer receiving the location of the tag
  • payload_ptr_buf: A pointer to a pointer receiving the location of the data pyaload

Remark
that both tag and payload can be stored in external memory. Repeated use of these tags will lead to overall worse performance, such that bsp_move() can actually outperform this variant.

bsp_stream_open

int bsp_stream_open(ebsp_stream *stream, int stream_id)

Open a stream that was created using bsp_stream_create on the host.

The first stream created by the host will have

stream_id 0.
Return
Nonzero if succesful.
Parameters
  • stream: Pointer to an existing bsp_stream struct to hold the stream data. This struct can be allocated on the stack by the user.
  • stream_id: The index of the stream.

Usage example:

bsp_stream mystream;
if( bsp_stream_open(&mystream, 3) ) {
    // Get some data
    void* buffer = 0;
    bsp_stream_move_down(&mystream, &buffer, 0);
    // The data is now in buffer
    // Finally, close the stream
    bsp_stream_close(&mystream);`
}

Remark
This function has to be called before performing any other operation on the stream.
Remark
A call to the function should always match a single call to bsp_stream_close.

bsp_stream_close

void bsp_stream_close(ebsp_stream *stream)

Wait for pending transfers to complete and close a stream.

Behaviour is undefined if

stream is not a handle opened by bsp_stream_open.
Parameters
  • stream: The handle of the stream, opened by bsp_stream_open.

Cleans up the stream, and frees any buffers that may have been used by the stream.

bsp_stream_move_up

int bsp_stream_move_up(ebsp_stream *stream, const void *data, int data_size, int wait_for_completion)

Write a local token up to a stream.

The function

always waits for the previous token to have finished.
Return
Number of bytes written. Zero if an error has occurred.
Parameters
  • stream: The handle of the stream
  • data: The data to be sent up the stream
  • data_size: The size of the data to be sent, i.e. the size of the token. Behaviour is undefined if it is not a multiple of 8. If it is not a multiple of 8 bytes then transfers will be slow.
  • wait_for_completion: If nonzero this function blocks untill the data is completely written to the stream.

If wait_for_completion is nonzero, this function will wait untill the data is transferred. This corresponds to single buffering.

Alternativly, double buffering can be used as follows. Set wait_for_completion to zero and continue constructing the next token in a different buffer. Usage example:

int* buf1 = ebsp_malloc(100 * sizeof(int));
int* buf2 = ebsp_malloc(100 * sizeof(int));
int* curbuf = buf1;
int* otherbuf = buf2;

ebsp_stream s;
bsp_stream_open(&s, 0); // open stream 0
while (...) {
    // Fill curbuf
    for (int i = 0; i < 100; i++)
        curbuf[i] = 5;
    
    // Send up
    bsp_stream_move_up(&s, curbuf, 100 * sizeof(int), 0);
    // Use other buffer
    swap(curbuf, otherbuf);
}
ebsp_free(buf1);
ebsp_free(buf2);

Remark
Behaviour is undefined if the stream was not opened using bsp_stream_open.
Remark
Memory is transferred using the DMA1 engine.

bsp_stream_move_down

int bsp_stream_move_down(ebsp_stream *stream, void **buffer, int preload)

Obtain the next token from a stream.

When calling this function, the token that was obtained at the previous call will be overwritten.

Return
Number of bytes of the obtained chunk. If stream has finished or an error has occurred this function will return 0.
Parameters
  • stream: The handle of the stream
  • buffer: Receives a pointer to a local copy of the next token.
  • preload: If this parameter is nonzero then the BSP system will preload the next token asynchroneously (double buffering).

Remark
Behaviour is undefined if the stream was not opened using bsp_stream_open.
Remark
Memory is transferred using the DMA1 engine.
Remark
When using double buffering, the BSP system will allocate memory for the next chunk, and will start writing to it using the DMA engine while the current chunk is processed. This requires more (local) memory, but can greatly increase the overall speed.

bsp_stream_seek

void bsp_stream_seek(ebsp_stream *stream, int delta_tokens)

Move the cursor in the stream, to change the next token to be obtained.

If

delta_tokens is out of bounds, then the cursor will be moved to the start or end of the stream respectively. bsp_stream_seek(i, INT_MIN) will set the cursor to the start bsp_stream_seek(i, INT_MAX) will set the cursor to the end of the stream
Parameters
  • stream: The handle of the stream
  • delta_tokens: The number of tokens to skip if delta_tokens > 0, or to go back if delta_tokens < 0.

Note that if bsp_stream_move_down is used with preload enabled (meaning the last call to that function had preload enabled), then calling ebsp_stream_seek will discard any token that was preloaded in memory, so the first call to ebsp_stream_move_down after this will yield a token from the new position.

Remark
This function provides a mechanism through which chunks can be obtained multiple times. It gives you random access in the memory in the data stream.
Remark
This function has O(delta_tokens) complexity.