Computational systems virology
Bioinformatics is the interface between computer science and its application to biological problems. The biological problems we are interested in include characterizing herpes viruses and their mechanisms of infection.
Our approach is based on high-throughput experiments measuring specific parameters during infection in a quantitative and genome-wide manner. This can be used to generate new hypotheses testable in wet-lab and to make statistical inferences of properties of the whole system of virus infection.
A typical workflow in systems virology starts with wet-lab work: Samples must be prepared and subjected to the high-throughput technique (e.g. microarrays, sequencing, mass spectrometry). This usually results in gigabytes of raw data that must be analyzed in an appropriate manner to extract the relevant information. Finally, the still very large tables containing all the information must be interpreted using computational and statistical methods, including (I) basic descriptive analyses, (II) functional or (III) network based methods and (IV) mechanistic models. In general, the result of these analyses lead to new experiments providing more and more knowledge of the system.
From data to information
Two properties of high-throughput data complicate their analysis: They are huge and affected by errors inherent to the experiments. Their sheer size necessitates the use of sophisticated methods from several areas of computer science (e.g. algorithmics, databases, machine learning, parallelism, software engineering) and experimental bias or noise must be controlled using appropriate statistical tools.
Importantly, the ever-growing amount of experimental variants leads to a profound bottleneck in research: Frequently, there is no “push the button” method available for data analysis, or it is only appropriate to some extent. Thus, specifically tailored analysis methods and tools are often necessary to control bias introduced by modified experimental protocols or to extract additional information from the data.
In the past, we have developed several methods and software tools to extract relevant information from different types of high-throughput experiments (for sRNA-seq: Erhard & Zimmer 2010; for LC-MS/MS: Erhard & Zimmer 2012, for RIP-Chip: Erhard et al. 2013a, for AGO-PAR-CLIP: Erhard et al. 2013b, for all quantitative NGS experiments: Erhard & Zimmer 2015).
From information to insight
Having huge tables full of numbers describing different parameters of a system may be informative, but is still not enough to understand virus infection. Thus, our second research focus is the interpretation of these tables in the context of what is already known about the system and what data is publicly available.
For instance, by integrating several data sets relevant for microRNA targeting (PAR-CLIP, Rip-Chip, LC-MS/MS, 4sU-microarrays), we have shown that viral and host microRNAs bind to their target sites in a context-dependent manner and that context-dependent binding has context-dependent impact on gene expression (Erhard et al. 2014). Interestingly, we found that this context cannot be explained by the presence of absence of microRNA or target mRNA, but that other factors must be involved that constitute the context (e.g. competition with RNA binding proteins or stable RNA secondary structures).