Data streams¶
Streaming¶
When dealing with problems that involve a lot of data such as images or large matrices, it is often the case that the data for the problem does not fit on the combined local memory of the Epiphany processor. In order to work with the data we must then use the larger (but much slower) external memory, which slows the programs down tremendously.
For these situations we provide a streaming mechanism. When writing your program to use streams, it will work on smaller tokens of the problem at any given time – such that the data currently being treated is always local to the core. The EBSP library prepares the next token to work on while the previous token is being processed such that there is minimal downtime because the Epiphany cores are waiting for the slow external memory.
Making and using down streams¶
A stream contains data to be processed by an Epiphany core, and can also be used to obtain results from computations performed by the Epiphany core. Every stream has a total size and a token size. The total size is the total number of bytes of the entire set of data. This set of data then gets partitioned into tokens consisting of the number of bytes set by the token size. This size need not be constant (i.e. it may vary over a single stream), but for our discussion here we will assume that it is constant.
A stream is created before the call to ebsp_spmd
on the host processor. The host prepares the data to be processed by the Epiphany cores, and the EBSP library then performs the necessary work needed for each core to receives its token. Note that this data is copied efficiently to the external memory upon creation of the stream, so that the user data should be stored in the ordinary RAM, e.g. allocated by a call to malloc
. A stream is created as follows:
// (on the host)
int count = 256;
int count_in_token = 32;
float* data = malloc(count * sizeof(float));
// ... fill data
bsp_stream_create(count * sizeof(float), count_in_token * sizeof(float), data);
This will create a stream containing user data. This stream is chopped up in 256/32 = 8
tokens. If you want to use this streams in the kernel of a core you need to open it and move tokens from a stream to the local memory. Every stream you create on the host gets is identified by the order in which they are created, starting from index 0
. For example, the stream we created above will obtain the id 0
. A second stream (regardless of whether it is up or down) will be identified with 1
, etc. These identifiers are shared between cores. Opening a stream is done by using this identifier, for example, to open a stream with identifier 3
:
bsp_stream mystream;
if(bsp_stream_open(&mystream, 3)) {
// ...
}
After this call, the stream will start copying data to the core, but the data is not necessarily there yet (it might still be copying). A stream can only be opened by a single core at a time. To access this data we move a token:
// Get some data
void* buffer = NULL;
bsp_stream_move_down(&mystream, &buffer, 0);
// The data is now in buffer
The first argument is the stream object that was filled using bsp_stream_open
. The second argument is a pointer to a pointer that will be set to the data location. The final double_buffer
argument, gives you the option to start writing the next token to local memory (using the DMA engine), while you process the current token that you just moved down. This can be done simultaneously to your computations, but will take up twice as much memory. It depends on the specific situation whether double buffered mode should be turned on or off. Subsequent blocks are obtained using repeated calls to bsp_stream_move_down
.
If you want to use a token multiple times at different stages of your algorithm, you need to be able to instruct EBSP to change which token you want to obtain. Internally the EBSP system has a cursor for each stream which points to the next token that should be obtained. You can modify this cursor using the following two functions:
// move the cursor of the stream forward by 5 tokens
bsp_stream_seek(&mystream, 5);
// move the cursor of the stream back by 3 tokens
bsp_stream_seek(&mystream, -3);
When you exceed the bounds of the stream, it will be set to the final or first token respectively. Note that this gives you random access inside your streams. Therefore our streaming approach should actually be called pseudo-streaming, because formally streaming algorithms only process tokens in a stream a constant number of times. However on the Epiphany we can provide random-access in our streams, opening the door to different semantics such as moving the cursor.
Moving results back up¶
A stream can also be used to move results back up, for example:
int* buffer1 = ebsp_malloc(100 * sizeof(int));
int* buffer2 = ebsp_malloc(100 * sizeof(int));
int* curbuffer = buffer1;
int* otherbuffer = buffer2;
ebsp_stream s;
bsp_stream_open(&s, 0); // open stream 0
while (...) {
// Fill curbuffer
for (int i = 0; i < 100; i++)
curbuffer[i] = 5;
// Send up
bsp_stream_move_up(&s, curbuffer, 100 * sizeof(int), 0);
// Use other bufferfer
swap(curbuffer, otherbuffer);
}
ebsp_free(buffer1);
ebsp_free(buffer2);
Here, we have two buffers containing data. While filling one of the buffers with data, we move the other buffer up. We do this using the bsp_stream_move_up
function which has as arguments respectively: the stream handle, the data to send up, the size of the data to send up, and a flag that indicates whether we want to wait for completion. In this case, we do not wait, but use two buffers to perform computations and to send data up to the host simulatenously.
Closing streams¶
The EBSP stream system allocates buffers for you on the cores. When you are done with a stream you should tell the EBSP system by calling:
bsp_stream_close(&my_stream);
which will free the buffers for other use, and allow other cores to use the streams.
Interface¶
Host¶
-
void *
bsp_stream_create
(int stream_size, int token_size, const void *initial_data) Creates a generic stream for streaming data to or from an Epiphany core.
The function returns NULL on failure.
- Return
- A pointer to a section of external memory storing the tokens.
- Parameters
stream_size
: The total number of bytes of data in the stream.token_size
: The size in bytes of a single token. Must be at least 16.initial_data
: (Optional) The data which should be streamed to an Epiphany core.
If
initial_data
is nonzero, it is copied to the stream (stream_size
bytes). Ifinitial_data
is zero, an empty stream of sizestream_size
is created. In this case,stream_size
should be the maximum number of bytes that will be sent up from the Epiphany cores to the host.This function prints an error if
token_size
is less than 16.The format of the data pointed to by the return value is as follows: Before every token, there are two integers that specify the size of the preceding token and the size of the token itself.
00000000, nextsize, data, prevsize, nextsize, data, … prevsize, nextsize, data, prevsize, 00000000
So a header consists of two integers (8 byte total). The two sizes do NOT include these headers. They are only the size of the data inbetween.
If you want to use the returned pointer directly you have to manually take care of this data format.
- Remark
- If
initial_data
is nonzero, the data is copied so that after the call it can safely be freed or overwritten by the user.
Epiphany¶
-
int
bsp_stream_open
(ebsp_stream *stream, int stream_id) Open a stream that was created using
bsp_stream_create
on the host.The first stream created by the host will have
stream_id
0.- Return
- Nonzero if succesful.
- Parameters
stream
: Pointer to an existingbsp_stream
struct to hold the stream data. This struct can be allocated on the stack by the user.stream_id
: The index of the stream.
Usage example:
bsp_stream mystream; if( bsp_stream_open(&mystream, 3) ) { // Get some data void* buffer = 0; bsp_stream_move_down(&mystream, &buffer, 0); // The data is now in buffer // Finally, close the stream bsp_stream_close(&mystream);` }
- Remark
- This function has to be called before performing any other operation on the stream.
- Remark
- A call to the function should always match a single call to
bsp_stream_close
.
-
void
bsp_stream_close
(ebsp_stream *stream) Wait for pending transfers to complete and close a stream.
Behaviour is undefined if
stream
is not a handle opened bybsp_stream_open
.- Parameters
stream
: The handle of the stream, opened bybsp_stream_open
.
Cleans up the stream, and frees any buffers that may have been used by the stream.
-
int
bsp_stream_move_up
(ebsp_stream *stream, const void *data, int data_size, int wait_for_completion) Write a local token up to a stream.
The function
always waits for the previous token to have finished.- Return
- Number of bytes written. Zero if an error has occurred.
- Parameters
stream
: The handle of the streamdata
: The data to be sent up the streamdata_size
: The size of the data to be sent, i.e. the size of the token. Behaviour is undefined if it is not a multiple of 8. If it is not a multiple of 8 bytes then transfers will be slow.wait_for_completion
: If nonzero this function blocks untill the data is completely written to the stream.
If
wait_for_completion
is nonzero, this function will wait untill the data is transferred. This corresponds to single buffering.Alternativly, double buffering can be used as follows. Set
wait_for_completion
to zero and continue constructing the next token in a different buffer. Usage example:int* buf1 = ebsp_malloc(100 * sizeof(int)); int* buf2 = ebsp_malloc(100 * sizeof(int)); int* curbuf = buf1; int* otherbuf = buf2; ebsp_stream s; bsp_stream_open(&s, 0); // open stream 0 while (...) { // Fill curbuf for (int i = 0; i < 100; i++) curbuf[i] = 5; // Send up bsp_stream_move_up(&s, curbuf, 100 * sizeof(int), 0); // Use other buffer swap(curbuf, otherbuf); } ebsp_free(buf1); ebsp_free(buf2);
- Remark
- Behaviour is undefined if the stream was not opened using
bsp_stream_open
. - Remark
- Memory is transferred using the
DMA1
engine.
-
int
bsp_stream_move_down
(ebsp_stream *stream, void **buffer, int preload) Obtain the next token from a stream.
When calling this function, the token that was obtained at the previous call will be overwritten.
- Return
- Number of bytes of the obtained chunk. If stream has finished or an error has occurred this function will return
0
. - Parameters
stream
: The handle of the streambuffer
: Receives a pointer to a local copy of the next token.preload
: If this parameter is nonzero then the BSP system will preload the next token asynchroneously (double buffering).
- Remark
- Behaviour is undefined if the stream was not opened using
bsp_stream_open
. - Remark
- Memory is transferred using the
DMA1
engine. - Remark
- When using double buffering, the BSP system will allocate memory for the next chunk, and will start writing to it using the DMA engine while the current chunk is processed. This requires more (local) memory, but can greatly increase the overall speed.
-
void
bsp_stream_seek
(ebsp_stream *stream, int delta_tokens) Move the cursor in the stream, to change the next token to be obtained.
If
delta_tokens
is out of bounds, then the cursor will be moved to the start or end of the stream respectively.bsp_stream_seek(i, INT_MIN)
will set the cursor to the startbsp_stream_seek(i, INT_MAX)
will set the cursor to the end of the stream- Parameters
stream
: The handle of the streamdelta_tokens
: The number of tokens to skip ifdelta_tokens > 0
, or to go back ifdelta_tokens < 0
.
Note that if
bsp_stream_move_down
is used withpreload
enabled (meaning the last call to that function hadpreload
enabled), then callingebsp_stream_seek
will discard any token that was preloaded in memory, so the first call toebsp_stream_move_down
after this will yield a token from the new position.- Remark
- This function provides a mechanism through which chunks can be obtained multiple times. It gives you random access in the memory in the data stream.
- Remark
- This function has
O(delta_tokens)
complexity.