My research focuses on the development of scalable algorithms to assess the variability in terms of single nucleotide polymorphisms as well as structural variations within and among populations. I am particular interested in developing and applying algorithms that utilize previous observation from biology to achieve high accuracy and a fast runtime utilizing hardware infrastructures like graphic processing units (GPU) or grid engines.
High throughput sequencing (HTS) has become a standard method in molecular biological research. Consortium projects like the 1000 genome or the 10k genomes project show a clear trend to sequence more genomes or transcriptomes to assess the variability within organisms or among whole populations or even eco systems. In the same time, it is recognized that standard programs are not capable to achieve the speed and the accuracy (e.g. in terms of correctly mapped reads) in every scenario. Indeed, the questions related to the easily accessible regions of genomes are almost solved. What remains are regions in the genome that are hard to assess due to their high polymorphism or low complexity (e.g. repeats). This demands for algorithms that while preforming a through out search still have a short runtime to be able to cope with the flood of data. My research explores the possibilities to robustly assess and investigate regions of the genome that are highly variable and most often the driver for evolution or genetic diseases like cancer.
- Structural Variations
- Sequence Alignment
- High Performance and Multi-Core Computing
- Human Genetics
- Non model organisms
Selected Software Packages
Phased Diploid Genome Assembly with PacBio sequencing reads
10x Genomics read simulator.
Parallization script for MUMMer.
Long read mapper to improve mapping and SV calling
Short read mapper especially suitable for high SNP rates
Structural variation caller for PacBio and Oxford Nanopore reads.
Tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
Method to annotate SVs using gff, bed or other vcf files. Github
Method to optimally select samples for validation and resequencing based on a multi sample VCF. Github