Optimizing Checkpointing Performance in Spark
Abstract
Spark [1] is a cluster framework that performs in-memory computing. As with other distributed data processing platforms, fault tolerant plays an important role in the whole architecture. The Fault Tolerant of Spark contains Lineage and Checkpointing. The latter is expensive because doing checkpoint always causes RDD recomputation. In this paper, we analyze the workflow in the execution of the current design and propose alternatives to improve the performance of checkpointing, one of which is based on an existing approach. We evaluate our results in terms of application level throughput.
Keywords
Apache spark, Fault tolerant, Checkpointing
DOI
10.12783/dtcse/csma2017/17315
10.12783/dtcse/csma2017/17315
Refbacks
- There are currently no refbacks.