- Next-Gen Sequencing and disease
- long non-coding RNAs and the role in cell differentiation
- Predicting novel cis-regulatory modules with prior knowledge of related CRMs
Thanks to the extensive work of biologists over the last few decades, who have tested many sequence fragments for regulatory activity in a reporter gene assay, we have now an invaluable collection of known enhancers in variety of species and tissues.
Our goal here is to use a small set of known CRMs participating in a transcriptional network as “training data” to guide the search for other CRMs with similar functionality in the network. We call this task “supervised CRM prediction”. To this end, we constantly develop novel statistical/probabilistic models to capture the similarity between any given sequence and the training data set. We employ our models to locate the high scoring regions of the genome that are potential candidate CRMs of the same network. E s in fly and mouse confirms the power of these techniques.
Experimentally validated human CRMs in mouse
Cis regulatory module discovery
Predicted regulatory network
Our model enables us to not only search genome-wide for other CRMs of the network, but also provides a simple mechanism to statistically infer the effect of each TF on each CRM. The video on the left shows how the model is used to scan the region around a gene (e.g. hkb) for segments that could drive expression similar to that of the gene. The middle panel plots the expected and predicted expression pattern for each window, and the bottom panel plots the similarity between expected and predicted expression patterns for that window.
- Transcription factor interactome (iTFs)
- Annotating genome (Genome Surveyor)