Design and Implementation of Parallelization of BLAST Algorithm Based on Spark
Abstract
BLAST (Basic Local Alignment Search Tool) is a local alignment algorithm, which has high accuracy and is used widely. It can reduce the running time of program while maintaining high precision, but it has performance bottleneck and low efficiency when comparing large gene data sets. Therefore, a distributed parallel method named Spark_BLAST based on Spark was proposed. The method uses Spark memory computation to identify and divide tasks, and realizes the distributed parallel computing of the BLAST algorithm. Finally, the method was implemented on the Spark cluster with 5 nodes. Comparing with single machine shows that the speedup of Spark cluster can reach about 4 without changing the accuracy of the comparison result. The method provides an efficient alignment method for bioinformatics.
Keywords
Spark, Parallel computing, Bioinformatics, Sequence alignment, Big data, Basic Local Alignment Search Tool (BLAST)
DOI
10.12783/dtcse/iece2018/26643
10.12783/dtcse/iece2018/26643
Refbacks
- There are currently no refbacks.