Selected Poster Presentations/Talks After 2013 

  • M Kazemian, H Pham, M Brodsky, S Sinha, "Widespread and distinct sequence signatures of combinatorial transcriptional regulation", 2013.

There is a growing realization that transcriptional gene regulation is often combinatorial, with multiple transcription factors (TFs) co-regulating the same genes, either independently or through direct or indirect interactions. Here, we explore the extent and diversity of combinatorial regulation in the Drosophila genome. We utilized the binding motifs of 322 TFs and chromatin accessibility data to produce computational TF-DNA interaction maps through different stages of embryonic development in fruit fly. We examined these binding maps to identify pairs of co-expressed TFs that either prefer to or avoid binding at common locations. We find that TF-TF aversion is as prevalent as co-binding, suggesting a less appreciated aspect of the combinatorial regulation. Several TFs had unusually many aversion partners including known chromatin remodeling TFs. We explored TF-TF co-binding and aversion partnerships in the context of nearly 100 gene expression domains and four stages of development, and found that the frequency of such partnerships varies greatly across expression domains. We then analyzed the common binding locations of TF-pairs for statistical patterns in terms of relative spacing and orientation between binding sites, using a newly designed statistical tool called interacting TF signatures (iTFs). We identified many instances of short distance biases between binding sites of TF-pairs including examples where such biases are stronger under certain relative orientations. To test if the genomic arrangement of these binding sites might reflect physical interactions between the corresponding TFs, we selected 28 TF-pairs whose binding sites exhibited short distance biases (<10bp) for further analysis. In vitro pull-down experiments revealed that ~65% of these pairs can directly interact with each other. For 5 of these pairs, we further demonstrate that they bind cooperatively to DNA if both sites are present with the preferred spacing. Overall, this study produces a comprehensive map of various types of sequence signatures of combinatorial TF action.

  • C Blatti, M Kazemian, S Celniker, M Brodsky, S Sinha, "Mapping the cis-regulatory landscape of early embryonic development in Drosophila with hundreds of TFs", 2013.

While ModENCODE data enables the genome-wide annotation of potential regulatory elements in Drosophila, it does not generally provide their specific spatial-temporal activity pattern nor identify which transcription factors (TFs) and DNA binding sites drive those patterns. We developed a strategy to produce this type of comprehensive description of the cis-regulatory landscape by modeling TF occupancy from the binding specifies (motifs) for > 300 TFs and by examining sets of genes expressed in ~200 distinct early embryonic expression domains annotated in the BDGP in situ image database. First, we predicted each TFs genome-wide binding profile using a HMM-based motif-scanning method and stage-specific DNA accessibility data. Comparison of these profiles to data from 60 ChIP experiments revealed a high degree of agreement (avg corr coeff >0.6). Next, for each gene set from the ~200 expression domains, we searched for enrichments of predicted TF binding within the regulatory regions. This procedure generated a compendium of > 5000 significant associations between TFs and expression terms with 21% supported by the TF having the associated or a related expression pattern. For this analysis, we identified TFs and expression terms with systematic biases for regulatory regions that are gene-proximal or distal. Finally, we annotated candidate enhancers, defined as stage-specific open chromatin regions, for the likely expression pattern they drive. To predict a specific pattern from regulatory sequence, we fit a regression model incorporating information from TF binding profiles, TF expression, and our functional associations. Our model accurately recovered REDfly enhancers for 18 separate expression domains. By leveraging available comprehensive sets of TF binding specificities and gene expression patterns, we are able to systematically describe embryonic development in terms of TFs and their target regulatory sequences.

  • K Suryamohan, M Kazemian, J Chen, Y Zhang, M Halfon, S Sinha, "Leveraging a knowledge base of Drosophila cis-regulatory modules for regulatory element discovery in diverged insect species", 2013. 
Although growing numbers of insect genomes are being sequenced, defining the sequences involved in transcriptional regulation within these genomes remains a challenge. Most effective methods for cis-regulatory module (CRM) discovery rely either on empirical assays or computational models that rely on sequence alignment to closely related species and knowledge of CRMs or transcription factor binding sites for the organism being studied. The lack of well annotated databases for regulatory regions of DNA for insect species outside of the well-studied Drosophila genus makes such approaches intractable. We previously demonstrated success at computational CRM discovery in Drosophila using a supervised learning approach in which experimentally validated CRMs are used to train a CRM prediction algorithm, with as low as a 10% false-positive rate. We demonstrate here that these same Drosophila CRM training data can be leveraged to identify CRMs in diverged species such as the emerging model insects Nasonia vitripennisTribolium castaneumAnopheles gambiae, and Apis mellifera. Examination of 16 predicted CRMs for regulatory activity in vivo in transgenic Drosophila showed positive regulatory activity in 12 of the 16 CRMs with 75% clearly associated with the expected gene and about 50% regulating gene expression in the expected pattern. Our results indicate that the extensive experimental CRM data that exists for Drosophila can be used to facilitate CRM discovery in distant insect species with sequenced genomes but little functional data, and suggests that core regulatory strategies have been conserved despite the lack of any clear non-coding sequence alignment.