Optimizing Checkpointing Performance in Spark

Ya-Meng ZHANG; Yu LUO; Yan-Chen LI

doi:10.12783/dtcse/csma2017/17315

Optimizing Checkpointing Performance in Spark

Ya-Meng ZHANG, Yu LUO, Yan-Chen LI

Abstract

Spark [1] is a cluster framework that performs in-memory computing. As with other distributed data processing platforms, fault tolerant plays an important role in the whole architecture. The Fault Tolerant of Spark contains Lineage and Checkpointing. The latter is expensive because doing checkpoint always causes RDD recomputation. In this paper, we analyze the workflow in the execution of the current design and propose alternatives to improve the performance of checkpointing, one of which is based on an existing approach. We evaluate our results in terms of application level throughput.

Keywords

Apache spark, Fault tolerant, Checkpointing

DOI
10.12783/dtcse/csma2017/17315

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Optimizing Checkpointing Performance in Spark

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING