The advent of Next Generation Sequencing (NGS) technologies and the increase in read length and number of reads per run poses a computational challenge to bioinformatics. The demand for sensible, inexpensive, and fast methods to align reads to a reference genome is constantly increasing. Due to the high sensitivity the Smith-Waterman (SW) algorithm is best suited for that. However, its high demand for computational resources makes it unpractical. Here we present an optimal SWimplementation for NGS data and demonstrate the advantages of using common and inexpensive high performance architectures to improve the computing time of NGS applications. We implemented a C++ library (MASon) that exploits graphic cards (CUDA, OpenCL) and CPU vector instructions (SSE, OpenCL) to efficiently handle millions of short local pairwise sequence alignments (36bp - 1,000bp). All libraries can be easily integrated into existing and upcoming NGS applications and allow programmers to optimally utilize moder n hardware, ranging from desktop computers to high-end cluster.