Our Research
What do we want to understand?
Eukaryotic gene expression is a complex process that is amenable to regulation and fine-tuning at multiple levels. During transcription, the nascent pre-mRNA is capped at the 5′ end, introns are removed by splicing, and the 3′ end is cleaved and polyadenylated. The mature mRNA is then actively exported to the cytoplasm - sometimes to defined locations within the cell - where it is available as a template for synthesis of the encoded protein. Translation itself is regulated on many levels and can further contribute to proteome diversity.
Most of the information if and how a gene is expressed is encoded in the DNA sequence. Every step can be regulated in a gene- and cell-specific manner. Understanding how this is achieved and being able to predict the functional consequence of a change in the DNA sequence is at the core of our research interests.
Why do we want to understand it?
The ever-growing number of non-coding genetic variants that have been implicated in common and rare diseases hold the promise of being the key to understanding and ultimately treating these diseases. While a subset of these sequence variants lie in coding regions and change the resulting protein, sometimes offering a straightforward link to the functional consequence underlying the disease phenotype, many disease-associated sequence changes are synonymous or lie in non-coding regions, hampering the interpretability of the DNA variant and obscuring the path to therapeutic intervention.
In order to close the gap in understanding of coding vs. non-coding variants we need to be able to distinguish between functionally relevant and irrelevant DNA changes also on the non-coding level. A prerequisite for this, however, is a thorough understanding of the rules underlying gene expression. High-throughput approaches developed by us and many other groups can not only significantly expand our knowledge of the principles of gene regulation, but also allow us to predict the effect of non-coding variants and find ways to interfere with (mis)regulation in a controlled and informed way.
RNA-based therapeutics constitute a powerful new avenue of research into drug development, as exemplified by the successful use of antisense-oligonucleotides to treat Spinal Muscular Atrophy (“Spinraza”) and the success of RNA-based vaccines (e.g. against SARS-CoV-2). Establishing high-throughput assays and understanding the regulatory grammar of gene expression greatly increases our chances of identifying a suitable drug target.
How do we get there?
We are employing a wide range of techniques to expand the scope of research on gene regulation beyond the limits imposed by the endogenous sequence space and classic divides like the distinction between individual molecular mechanisms or experimental vs. computational approaches.
We perform high-throughput reporter assays in combination with more targeted approaches to investigate the function of the regulatory mechanisms we discover. This involves molecular biology and biochemical techniques, next generation sequencing, synthetic biology, microscopy, FACS and machine learning. High-throughput testing of synthetic sequence libraries is increasingly recognized as a powerful and indispensable tool to obtain a systematic understanding of biological processes. We build on the extensive work on gene regulation carried out in the last decades and on novel experimental tools in order to get to a comprehensive and predictive understanding of the DNA-encoded rules dictating gene expression across regulatory layers.
We use two main model systems: human cells in culture and the nematode C. elegans.
Specific research directions:
Deciphering the rules of RNA splicing
RNA splicing, i.e. the removal of parts of the primary transcript, is a tightly regulated process. Most human transcripts can be spliced in several ways; this alternative splicing allows many different mRNAs to be made from the same gene. Many properties of a transcript, a gene, or a genomic environment can influence the decision how to splice, such as binding sites for splicing factors, RNA secondary structure, DNA methylation and nucleosome positioning. In past work we used a massively parallel reporter assay to dissect the different contributions to splicing decisions and to investigate the cell-to-cell variability of alternative splicing (Mikl et al., 2019). In current research projects we utilize this experimental paradigm to investigate the many unknowns of splicing regulation.
Recoding
Programmed ribosomal frameshifting (PRF) is a strikingly robust and precise process by which the ribosome shifts reading frame in a defined fraction of translation events, leading to the production of two different proteins from the same mRNA. This process is most prominent in viruses like HIV and coronaviruses, where it is a widespread and indispensable mechanism. A major limitation for studying PRF has been the lack of amenability to high-throughput approaches. We recently overcame this limitation by developing a massively parallel reporter assay that enables screening of a large number of sequences for their potential to induce frameshifting (Mikl et al., 2020). We are using this assay to dissect the regulation of frameshifting and to identify ways to interfere with viral PRF events and therefore replication of the virus.
In human, PRF was found to regulate several genes with important roles in metabolic pathways, proliferation and apoptosis and host-virus interactions. PRF can also target mRNA for degradation. Accordingly, it can have a dual function, diversifying the proteome (productive PRF) and regulating mRNA abundances (destructive PRF). Using high-throughput functional assays we aim to elucidate the prevalence and role of PRF in human.
Dissecting the interplay between gene regulatory layers
Attempts to understand the rules of gene expression have mostly focused on regulatory layers in isolation. In the context of the functioning of a cell or an organism, however, this separation can only be the first step, as all aspects of gene expression are interconnected and ultimately part of one process aimed at carrying out a defined function. The immense complexity of (possible) regulatory connections and the uniqueness of each combination of regulatory regions set very strict limits on the rules that can be learned based on endogenous sequences, emphasizing the need for complementary approaches.
In past research we identified novel cases of such interactions between regulatory layers, specifically a delicate balance between splicing and polyadenylation in the context of robustness in cell polarity (Mikl and Cowan, 2014) and an unexpected effect of the choice of splicing machinery on nuclear export of the RNA and subsequently relative levels of the resulting protein isoforms (Mikl et al., 2019, Zuckerman et al., 2020).
Large scale testing of rationally designed or random sequence libraries have immensely contributed to elucidating the grammar of gene regulatory mechanisms in isolation. In ongoing research projects we are using high-throughput testing of rationally designed sequence libraries to to get to a comprehensive understanding of the rules dictating gene expression across gene regulatory layers. Investigating such interactions in a systematic manner will open the door for understanding the interplay between regulatory domains and contribute to the development of a framework in which gene expression is examined in its entirety.
Investigating gene regulation on an organismal level - C. elegans
Ultimately we want to understand gene regulation not only on the level of individual cells, but in the context of an organism. In addition, a comparative evolutionary approach to functional genomics will allow us to better understand how organisms distribute the regulatory burden and what are the driving forces that gave rise to the complexity of regulatory connections controlling gene expression.
In the past we identified a gene regulatory mechanisms based on alternative splicing and alternative poly-A site selection that confers robustness to the early development in the worm (Mikl and Cowan, 2014). C. elegans is arguably the model organism most amenable to large-scale approaches and automated handling and screening, which allows us to perform high-throughput assays on an organismal level. In ongoing research projects, we are using C. elegans to investigate the usage and functional significance of regulatory principles identified in single cells for the development and functioning of an entire organism. In particular, we are using C. elegans as a model system for the link between translational fidelity, recoding and aging.