Parallel implementation of bioconductor submodules on several architectures (SMP, MPI, GPGPU)
Background: Bioconductor is an open source and open development software project for analysis and comprehension of genomic data. It is based primarily on the R programming language. With the increase of biological data size, the research problems are becoming so large and complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. R can make computational problems that take a long time to process run faster through parallel computing by means of SMP computers, MPI clusters or GPGPUs.
Results: Three R packages were built to solve alignment, clustering and classification problems, which are commonly encountered in bioinformatical research.
The function pairwiseAlignment in Bioconductor Package Biostrings was adapted into parallel function on SMP and MPI systems. This function solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment and (ends-free) overlap alignment problems for a large number of protein/nucleic acid sequences.
The function adSplit in Bioconductor Package adSplit was adapted into parallel function on SMP and MPI systems. This function searches for annotation-driven splits of patients in microarray data. A split is a partitioning of patients into two groups. In order to do so it refers to GO terms and KEGG pathways.
The function macluster in Bioconductor Package maanova was adapted into parallel function on GPGPUs system. This function bootstraps K-means or hierarchical clusters and builds a consensus tree (consensus group for K-means) from the bootstrap result. The function macluster uses the funktion hclust and dist in Package stats. The new function gpuMacluster in package bioconductorGPU uses the function gpuHclust and gpuDist in package gputools.
- KONWIHR funding: two months during Multicore-Software-Initiative 2009/2010
- Dr. Ferdinand Jamitzky, LRZ-München