Buffer server overflow

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Buffer server overflow

Mateusz Zakarczemny
HI,
I'm reading Apex documentation regarding buffer servers. I'm wondering what will happen if buffers between operators became overflowed (lets assume non partitioned operator)?
I read somewhere that data is split to disk. But what's next? What if disk space will be exhausted?

Regards,
Mateusz Zakarczemny
Reply | Threaded
Open this post in threaded view
|

Re: Buffer server overflow

Pramod Immaneni-2
When back pressure is enabled (default) the upstream operators are blocked till space is freed up by downstream operators consuming data.

Since bufferserver also provides fault recovery functionality it cannot immediately clear out the data when it is consumed by the downstream operators and needs to keep it around till next checkpoints thoughout the dag and the spillover to disk can come into play if the amount of data between checkpoints is greater than the in memory buffer capacity.

Thanks
On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny <[hidden email]> wrote:
HI,
I'm reading Apex documentation regarding buffer servers. I'm wondering what will happen if buffers between operators became overflowed (lets assume non partitioned operator)?
I read somewhere that data is split to disk. But what's next? What if disk space will be exhausted?

Regards,
Mateusz Zakarczemny
Reply | Threaded
Open this post in threaded view
|

Re: Buffer server overflow

Vlad Rozov-3
When spilling to disk is enabled, an upstream operator will be blocked from emitting more tuples to a corresponding output port when the size of a buffer (in bytes) exceeds a limit (see documentation on how to configure the limit). This is a back pressure mechanism that Pramod refers to. There are two ways how data/tuples may be removed from the buffer to make more space on the buffer available and enable back the upstream operator. Tuples can be either spooled to a local disk or completely purged from the buffer. The purge happens only after window (actually the earliest checkpoint window after the window that the tuple belongs to) is completely processed by an application/dag. If there is not enough disk space for spooling, buffer server would fail the container that it belongs to. There are few JIRAs filed to improve the current behavior (for example limit amount of disk space that the buffer server can use for spilling).

Thank you,

Vlad

On 6/20/18 17:24, Pramod Immaneni wrote:
When back pressure is enabled (default) the upstream operators are blocked till space is freed up by downstream operators consuming data.

Since bufferserver also provides fault recovery functionality it cannot immediately clear out the data when it is consumed by the downstream operators and needs to keep it around till next checkpoints thoughout the dag and the spillover to disk can come into play if the amount of data between checkpoints is greater than the in memory buffer capacity.

Thanks
On Wed, Jun 20, 2018 at 4:41 PM Mateusz Zakarczemny <[hidden email]> wrote:
HI,
I'm reading Apex documentation regarding buffer servers. I'm wondering what will happen if buffers between operators became overflowed (lets assume non partitioned operator)?
I read somewhere that data is split to disk. But what's next? What if disk space will be exhausted?

Regards,
Mateusz Zakarczemny