• Chauss et al, “Autocrine vitamin D signaling switches off pro-inflammatory programs of TH1 cells”, Nature Immunology, 2021.

The molecular mechanisms governing orderly shutdown and retraction of CD4+ type 1 helper T (TH1) cell responses remain poorly understood. Here we show that complement triggers contraction of TH1 responses by inducing intrinsic expression of the vitamin D (VitD) receptor and the VitD-activating enzyme CYP27B1, permitting T cells to both activate and respond to VitD. VitD then initiated the transition from pro-inflammatory interferon-γ+ TH1 cells to suppressive interleukin-10+ cells. This process was primed by dynamic changes in the epigenetic landscape of CD4+ T cells, generating super-enhancers and recruiting several transcription factors, notably c-JUN, STAT3 and BACH2, which together with VitD receptor shaped the transcriptional response to VitD. Accordingly, VitD did not induce interleukin-10 expression in cells with dysfunctional BACH2 or STAT3. Bronchoalveolar lavage fluid CD4+ T cells of patients with COVID-19 were TH1-skewed and showed de-repression of genes downregulated by VitD, from either lack of substrate (VitD deficiency) and/or abnormal regulation of this system.

  • Osman et al, “TCF-1 controls Treg cell functions that regulate inflammation, CD8+ T cell cytotoxicity and severity of colon cancer”, Nature Immunology, 2021 .

The transcription factor TCF-1 is essential for the development and function of regulatory T (Treg) cells; however, its func-tion is poorly understood. Here, we show that TCF-1 primarily suppresses transcription of genes that are co-bound by Foxp3. Single-cell RNA-sequencing analysis identified effector memory T cells and central memory Treg cells with differential expres-sion of Klf2 and memory and activation markers. TCF-1 deficiency did not change the core Treg cell transcriptional signature, but promoted alternative signaling pathways whereby Treg cells became activated and gained gut-homing properties and charac-teristics of the TH17 subset of helper T cells. TCF-1-deficient Treg cells strongly suppressed T cell proliferation and cytotoxicity, but were compromised in controlling CD4+ T cell polarization and inflammation. In mice with polyposis, Treg cell–specific TCF-1 deficiency promoted tumor growth. Consistently, tumor-infiltrating Treg cells of patients with colorectal cancer showed lower TCF-1 expression and increased TH17 expression signatures compared to adjacent normal tissue and circulating T cells. Thus, Treg cell–specific TCF-1 expression differentially regulates TH17-mediated inflammation and T cell cytotoxicity, and can deter-mine colorectal cancer outcome.


Paper selected 

for Cover

  • Canaria et al, “STAT5 Represses a STAT3-independent Th17-like program during Th9 cell differentiation”, Journal of Immunology, 2021 .

IL-9–producing Th cells, termed Th9 cells, contribute to immunity against parasites and cancers but have detrimental roles in allergic disease and colitis. Th9 cells differentiate in response to IL-4 and TGF-β, but these signals are insufficient to drive Th9 differentiation in the absence of IL-2. IL-2–induced STAT5 activation is required for chromatin accessibility within Il9 enhancer and promoter regions and directly transactivates the Il9 locus. STAT5 also suppresses gene expression during Th9 cell development, but these roles are less well defined. In this study, we demonstrate that human allergy-associated Th9 cells exhibited a signature of STAT5-mediated gene repression that is associated with the silencing of a Th17-like transcriptional signature. In murine Th9 cell differentiation, blockade of IL-2/STAT5 signaling induced the expression of IL-17 and the Th17-associated transcription factor Rorγt. However, IL-2–deprived Th9 cells did not exhibit a significant Th17- or STAT3-associated transcriptional signature. Consistent with these observations, differentiation of IL-17–producing cells under these conditions was STAT3-independent but did require Rorγt and BATF. Furthermore, ectopic expression of Rorγt and BATF partially rescued IL-17 production in STAT3-deficient Th17 cells, highlighting the importance of these factors in this process. Although STAT3 was not required for the differentiation of IL-17–producing cells under IL-2–deprived Th9 conditions, their prolonged survival was STAT3-dependent, potentially explaining why STAT3-independent IL-17 production is not commonly observed in vivo. Together, our data suggest that IL-2/STAT5 signaling plays an important role in controlling the balance of a Th9 versus a Th17-like differentiation program in vitro and in allergic disease.


Paper selected 

for Cover

  • Yan et al, “Host-Virus Chimeric Events in SARS-CoV-2-Infected Cells Are Infrequent and Artifactual, Journal of Virology, 2021.

The pathogenic mechanisms underlying severe SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infection remain largely unelucidated. High-throughput sequencing technologies that capture genome and transcriptome information are key approaches to gain detailed mechanistic insights from infected cells. These techniques readily detect both pathogen- and host-derived sequences, providing a means of studying host-pathogen interactions. Recent studies have reported the presence of host-virus chimeric (HVC) RNA in transcriptome sequencing (RNA-seq) data from SARS-CoV-2-infected cells and interpreted these findings as evidence of viral integration in the human genome as a potential pathogenic mechanism. Since SARS-CoV-2 is a positive-sense RNA virus that replicates in the cytoplasm, it does not have a nuclear phase in its life cycle. Thus, it is biologically unlikely to be in a location where splicing events could result in genome integration. Therefore, we investigated the biological authenticity of HVC events. In contrast to true biological events like mRNA splicing and genome rearrangement events, which generate reproducible chimeric sequencing fragments across different biological isolates, we found that HVC events across >100 RNA-seq libraries from patients with coronavirus disease 2019 (COVID-19) and infected cell lines were highly irreproducible. RNA-seq library preparation is inherently error prone due to random template switching during reverse transcription of RNA to cDNA. By counting chimeric events observed when constructing an RNA-seq library from human RNA and spiked-in RNA from an unrelated species, such as the fruit fly, we estimated that ∼1% of RNA-seq reads are artifactually chimeric. In SARS-CoV-2 RNA-seq, we found that the frequency of HVC events was, in fact, not greater than this background “noise.” Finally, we developed a novel experimental approach to enrich SARS-CoV-2 sequences from bulk RNA of infected cells. This method enriched viral sequences but did not enrich HVC events, suggesting that the majority of HVC events are, in all likelihood, artifacts of library construction. In conclusion, our findings indicate that HVC events observed in RNA-sequencing libraries from SARS-CoV-2-infected cells are extremely rare and are likely artifacts arising from random template switching of reverse transcriptase and/or sequence alignment errors. Therefore, the observed HVC events do not support SARS-CoV-2 fusion to cellular genes and/or integration into human genomes.

  • Yan et al, “SARS-CoV-2 drives JAK1/2-dependent local complement hyperactivation, Science Immunology, 2021.

Patients with coronavirus disease 2019 (COVID-19) present a wide range of acute clinical manifestations affecting the lungs, liver, kidneys and gut. Angiotensin converting enzyme (ACE) 2, the best-characterized entry receptor for the disease-causing virus SARS-CoV-2, is highly expressed in the aforementioned tissues. However, the pathways that underlie the disease are still poorly understood. Here, we unexpectedly found that the complement system was one of the intracellular pathways most highly induced by SARS-CoV-2 infection in lung epithelial cells. Infection of respiratory epithelial cells with SARS-CoV-2 generated activated complement component C3a and could be blocked by a cell-permeable inhibitor of complement factor B (CFBi), indicating the presence of an inducible cell-intrinsic C3 convertase in respiratory epithelial cells. Within cells of the bronchoalveolar lavage of patients, distinct signatures of complement activation in myeloid, lymphoid and epithelial cells tracked with disease severity. Genes induced by SARS-CoV-2 and the drugs that could normalize these genes both implicated the interferon-JAK1/2-STAT1 signaling system and NF-κB as the main drivers of their expression. Ruxolitinib, a JAK1/2 inhibitor, normalized interferon signature genes and all complement gene transcripts induced by SARS-CoV-2 in lung epithelial cell lines, but did not affect NF-κB-regulated genes. Ruxolitinib, alone or in combination with the antiviral remdesivir, inhibited C3a protein produced by infected cells. Together, we postulate that combination therapy with JAK inhibitors and drugs that normalize NF-κB-signaling could potentially have clinical application for severe COVID-19.

  • Ebina-Shibuya et al, "Thymic stromal lymphopoietin limits primary and recall CD8+ T-cell anti-viral responses", eLife, 2021.

Thymic stromal lymphopoietin (TSLP) is a cytokine that acts directly on CD4+ T cells and dendritic cells to promote progression of asthma, atopic dermatitis, and allergic inflammation. However, a direct role for TSLP in CD8+ T-cell primary responses remains controversial and its role in memory CD8+ T cell responses to secondary viral infection is unknown. Here, we investigate the role of TSLP in both primary and recall responses in mice using two different viral systems. Interestingly, TSLP limited the primary CD8+ T-cell response to influenza but did not affect T cell function nor significantly alter the number of memory CD8+ T cells generated after influenza infection. However, TSLP inhibited memory CD8+ T-cell responses to secondary viral infection with influenza or acute systemic LCMV infection. These data reveal a previously unappreciated role for TSLP on recall CD8+ T-cell responses in response to viral infection, findings with potential translational implications.

  • Martinez-Fabregas et al, “CDK8 Fine-Tunes IL-6 Transcriptional Activities by Limiting STAT3 Resident Time at the Gene Loci”. Cell Reports, 2020.

Cytokines are highly pleiotropic ligands that regulate the immune response. Here, using interleukin-6 (IL-6) as a model system, we perform detailed phosphoproteomic and transcriptomic studies in human CD4+ T helper 1 (Th-1) cells to address the molecular bases defining cytokine functional pleiotropy. We identify CDK8 as a negative regulator of STAT3 transcriptional activities, which interacts with STAT3 upon IL-6 stimulation. Inhibition of CDK8 activity, using specific small molecule inhibitors, reduces the IL-6-induced phosphoproteome by 23% in Th-1 cells, including STAT3 S727 phosphorylation. STAT3 binding to target DNA sites in the genome is increased upon CDK8 inhibition, which results in a concomitant increase in STAT3-mediated transcriptional activity. Importantly, inhibition of CDK8 activity under Th-17 polarizing conditions results in an enhancement of Th-17 differentiation. Our results support a model where CDK8 regulates STAT3 transcriptional processivity by modulation of its gene loci resident time, critically contributing to diversification of IL-6 responses.

  • Wang et al, “Epstein-Barr Virus Episome Physically Interacts with Active Regions of the Host Genome in Lymphoblastoid Cells”. JVI, 2020.

The Epstein-Barr virus (EBV) episome is known to interact with the three-dimensional structure of the human genome in infected cells. However, the exact locations of these interactions and their potential functional consequences remain unclear. Recently, high-resolution chromatin conformation capture (Hi-C) assays in lymphoblastoid cells have become available, enabling us to precisely map the contacts between the EBV episome(s) and the human host genome. Using available Hi-C data at a 10-kb resolution, we have identified 15,000 reproducible contacts between EBV episome(s) and the human genome. These contacts are highly enriched in chromatin regions denoted by typical or super enhancers and active markers, including histone H3K27ac and H3K4me1. Additionally, these contacts are highly enriched at loci bound by host transcription factors that regulate B cell growth (e.g., IKZF1 and RUNX3), factors that enhance cell proliferation (e.g., HDGF), or factors that promote viral replication (e.g., NBS1 and NFIC). EBV contacts show nearly 2-fold enrichment in host regions bound by EBV nuclear antigen 2 (EBNA2) and EBNA3 transcription factors. Circular chromosome conformation capture followed by sequencing (4C-seq) using the EBV origin of plasmid replication (oriP) as a “bait” in lymphoblastoid cells further confirmed contacts with active chromatin regions. Collectively, our analysis supports interactions between EBV episome(s) and active regions of the human genome in lymphoblastoid cells.

  • Mani et al. “Restoration of RNA helicase DDX5 suppresses hepatitis B virus (HBV) biosynthesis and Wnt signaling in HBV-related hepatocellular carcinoma”. Theranostics, 2020.

RNA helicase DDX5 is downregulated during hepatitis B virus (HBV) replication, and poor prognosis HBV-related hepatocellular carcinoma (HCC). The aim of this study is to determine the mechanism and significance of DDX5 downregulation for HBV-driven HCC, and identify biologics to prevent DDX5 downregulation.Molecular approaches including immunoblotting, qRT-PCR, luciferase transfections, hepatosphere assays, Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq), and RNA-seq were used with cellular models of HBV replication, HBV infection, and HBV-related liver tumors, as well as bioinformatic analyses of liver cancer cells from two independent cohorts. We demonstrate that HBV infection induces expression of the proto-oncogenic miR17~92 and miR106b~25 clusters which target the downregulation of DDX5. Increased expression of these miRNAs is also detected in HBV-driven HCCs exhibiting reduced DDX5 mRNA. Stable DDX5 knockdown (DDX5KD) in HBV replicating hepatocytes increased viral replication, and resulted in hepatosphere formation, drug resistance, Wnt activation, and pluripotency gene expression. ATAC-seq of DDX5KD compared to DDX5 wild-type (WT) cells identified accessible chromatin regions enriched in regulation of Wnt signaling genes. RNA-seq analysis comparing WT versus DDX5KD cells identified enhanced expression of multiple genes involved in Wnt pathway. Additionally, expression of Disheveled, DVL1, a key regulator of Wnt pathway activation, was significantly higher in liver cancer cells with low DDX5 expression, from two independent cohorts. Importantly, inhibitors (antagomirs) to miR17~92 and miR106b~25 restored DDX5 levels, reduced DVL1 expression, and suppressed both Wnt activation and viral replication. DDX5 is a negative regulator of Wnt signaling and hepatocyte reprogramming in HCCs. Restoration of DDX5 levels by miR17~92 / miR106b~25 antagomirs in HBV-infected patients can be explored as both antitumor and antiviral strategy.

  • Park et al, “Granzyme A-producing T helper cells are critical for acute graft-versus-host disease”, JCI Insight, 2020.

Acute graft-versus-host disease (aGVHD) can occur after hematopoietic cell transplant in patients undergoing treatment for hematological malignancies or inborn errors. Although CD4+ T helper (Th) cells play a major role in aGVHD, the mechanisms by which they contribute, particularly within the intestines, have remained elusive. We have identified a potentially novel subset of Th cells that accumulated in the intestines and produced the serine protease granzyme A (GrA). GrA+ Th cells were distinct from other Th lineages and exhibited a noncytolytic phenotype. In vitro, GrA+ Th cells differentiated in the presence of IL-4, IL-6, and IL-21 and were transcriptionally unique from cells cultured with either IL-4 or the IL-6/IL-21 combination alone. In vivo, both STAT3 and STAT6 were required for GrA+ Th cell differentiation and played roles in maintenance of the lineage identity. Importantly, GrA+ Th cells promoted aGVHD-associated morbidity and mortality and contributed to crypt destruction within intestines but were not required for the beneficial graft-versus-leukemia effect. Our data indicate that GrA+ Th cells represent a distinct Th subset and are critical mediators of aGVHD.

  • Kolev et al, "Diapedesis mediates protective tissue Th1 immunity via ‘complement C3-licensing", Immunity, 2020.

Intrinsic complement C3 activity is integral to human T helper type 1 (Th1) and cytotoxic T cell responses. Increased or decreased intracellular C3 results in autoimmunity and infections, respectively. The mechanisms regulating intracellular C3 expression remain undefined. We identified complement, including C3, as among the most significantly enriched biological pathway in tissue-occupying cells. We generated C3-reporter mice and confirmed that C3 expression was a defining feature of tissue-immune cells, including T cells and monocytes, occurred during transendothelial diapedesis, and depended on integrin lymphocyte-function-associated antigen 1 (LFA-1) signals. Immune cells from patients with leukocyte adhesion deficiency type 1 (LAD-1) had reduced C3 transcripts and diminished effector activities, which could be rescued proportionally by intracellular C3 provision. Conversely, increased C3 expression by T cells from arthritis patients correlated with disease severity. Our study defines integrins as key controllers of intracellular complement, demonstrates that perturbations in the LFA-1-C3-axis contribute to primary immunodeficiency, and identifies intracellular C3 as biomarker of severity in autoimmunity.

  • Ren et al, “Transcription factor p73 regulates Th1 differentiation”, Nature Communication, 2020.

Inter-individual differences in T helper (Th) cell responses affect susceptibility to infectious, allergic and autoimmune diseases. To identify factors contributing to these response differences, here we analyze in vitro differentiated Th1 cells from 16 inbred mouse strains. Haplotype-based computational genetic analysis indicates that the p53 family protein, p73, affects Th1 differentiation. In cells differentiated under Th1 conditions in vitro, p73 negatively regulates IFNγ production. p73 binds within, or upstream of, and modulates the expression of Th1 differentiation-related genes such as Ifng and Il12rb2. Furthermore, in mouse experimental autoimmune encephalitis, p73-deficient mice have increased IFNγ production and less disease severity, whereas in an adoptive transfer model of inflammatory bowel disease, transfer of p73-deficient naïve CD4+ T cells increases Th1 responses and augments disease severity. Our results thus identify p73 as a negative regulator of the Th1 immune response, suggesting that p73 dysregulation may contribute to susceptibility to autoimmune disease.

  • Martinez-Fabregas et al, "Kinetics of cytokine receptor trafficking determine signaling and functional selectivity", eLife, 2019.

Cytokines activate downstream signaling networks via assembly of cell surface receptors, but it is unclear whether modulation of cytokine-receptor binding parameters can modify biological outcomes. We have engineered variants of IL-6 with different affinities to the gp130 receptor chain to investigate how cytokine receptor binding kinetics influence functional selectivity. Engineered IL-6 variants showed a range of signaling amplitudes, from minimal to full agonist, and induced biased signaling, with changes in receptor binding kinetics affecting more profoundly STAT1 than STAT3 phosphorylation. We show that this differential signaling arises from defective translocation of ligand-gp130 complexes to the endosomal compartment and competitive STAT1/STAT3 binding to phospho-tyrosines in gp130, and results in unique patterns of STAT3 binding to chromatin. This, in turn, leads to a graded gene expression response and substantial differences in ex vivo differentiation of Th17, Th1 and Treg cells. These results provide a molecular understanding of signaling biased by cytokine receptors, and demonstrate that manipulation of signaling thresholds is a useful strategy to decouple cytokine functional pleiotropy.

  • Chakravorty et al, "Integrated Pan-Cancer Map of EBV-Associated Neoplasms Reveals Functional Host–Virus Interactions", Cancer Research, 2019.

Epstein–Barr virus (EBV) is a complex oncogenic symbiont. The molecular mechanisms governing EBV carcinogenesis remain elusive and the functional interactions between virus and host cells are incompletely defined. Here we present a comprehensive map of the host cell–pathogen interactome in EBV-associated cancers. We systematically analyzed RNA sequencing from >1,000 patients with 15 different cancer types, comparing virus and host factors of EBV+ to EBV− tissues. EBV preferentially integrated at highly accessible regions of the cancer genome, with significant enrichment in super-enhancer architecture. Twelve EBV transcripts, including LMP1 and LMP2, correlated inversely with EBV reactivation signature. Overexpression of these genes significantly suppressed viral reactivation, consistent with a “virostatic” function. In cancer samples, hundreds of novel frequent missense and nonsense variations in virostatic genes were identified, and variant genes failed to regulate their viral and cellular targets in cancer. For example, one-third of patients with EBV+ NK/T-cell lymphoma carried two novel nonsense variants (Q322X, G342X) of LMP1 and both variant proteins failed to restrict viral reactivation, confirming loss of virostatic function. Host cell transcriptional changes in response to EBV infection classified tumors into two molecular subtypes based on patterns of IFN signature genes and immune checkpoint markers, such as PD-L1 and IDO1. Overall, these findings uncover novel points of interaction between a common oncovirus and the human genome and identify novel regulatory nodes and druggable targets for individualized EBV and cancer-specific therapies.

  • Spolski et al, "IL-21/type I interferon interplay regulates neutrophil-dependent innate immune responses to Staphylococcus aureus", eLife, 2019.

Methicillin-resistant Staphylococcus aureus (MRSA) is a major hospital- and community-acquired pathogen, but the mechanisms underlying host-defense to MRSA remain poorly understood. Here, we investigated the role of IL-21 in this process. When administered intra-tracheally into wild-type mice, IL-21 induced granzymes and augmented clearance of pulmonary MRSA but not when neutrophils were depleted or a granzyme B inhibitor was added. Correspondingly, IL-21 induced MRSA killing by human peripheral blood neutrophils. Unexpectedly, however, basal MRSA clearance was also enhanced when IL-21 signaling was blocked, both in Il21r KO mice and in wild-type mice injected with IL-21R-Fc fusion-protein. This correlated with increased type I interferon and an IFN-related gene signature, and indeed anti-IFNAR1 treatment diminished MRSA clearance in these animals. Moreover, we found that IFNβ induced granzyme B and promoted MRSA clearance in a granzyme B-dependent fashion. These results reveal an interplay between IL-21 and type I IFN in the innate immune response to MRSA. 

  • Povoleri et al, “Retinoic acid-regulated CD161+ Tregs support wound repair in intestinal mucosa”, Nature Immunology, 2018.

Repair of tissue damaged during inflammatory processes is key to the return of local homeostasis and restoration of epithelial integrity. Here we describe CD161+ regulatory T (Treg) cells as a distinct, highly suppressive population of Treg cells that mediate wound healing. These Treg cells were enriched in intestinal lamina propria, particularly in Crohn’s disease. CD161+ Treg cells had an all-trans retinoic acid (ATRA)-regulated gene signature, and CD161 expression on Treg cells was induced by ATRA, which directly regulated the CD161 gene. CD161 was co-stimulatory, and ligation with the T cell antigen receptor induced cytokines that accelerated the wound healing of intestinal epithelial cells. We identified a transcription-factor network, including BACH2, RORγt, FOSL2, AP-1 and RUNX1, that controlled expression of the wound-healing program, and found a CD161+ Treg cell signature in Crohn’s disease mucosa associated with reduced inflammation. These findings identify CD161+ Treg cells as a population involved in controlling the balance between inflammation and epithelial barrier healing in the gut.

  • Lin et al, “Critical roles for STAT5 tetramers in the maturation and survival of natural killer cells”, Nature Communication, 2017.

Interleukin-15 (IL-15) is essential for the development and maintenance of natural killer (NK) cells. IL-15 activates STAT5 proteins, which can form dimers or tetramers. We previously found that NK cell numbers are decreased in Stat5aStat5b tetramer-deficient double knockin (DKI) mice, but the mechanism was not investigated. Here we show that STAT5 dimers are sufficient for NK cell development, whereas STAT5 tetramers mediate NK cell maturation and the expression of maturation-associated genes. Unlike the defective proliferation of Stat5DKI CD8+ T cells, Stat5 DKI NK cells have normal proliferation to IL-15 but are susceptible to death upon cytokine withdrawal, with lower Bcl2and increased active caspases. These findings underscore the importance of STAT5 tetramers in maintaining NK cell homoeostasis. Moreover, defective STAT5 tetramer formation could represent a cause of NK cell immunodeficiency, and interrupting STAT5 tetramer formation might serve to control NK leukaemia.

  • Afzali et al, “BACH2 immunodeficiency illustrates an association between super-enhancers and haploinsufficiency”, Nature Immunology, 2017.

The transcriptional programs that guide lymphocyte differentiation depend on the precise expression and timing of transcription factors (TFs). The TF BACH2 is essential for T and B lymphocytes and is associated with an archetypal super-enhancer (SE). Single-nucleotide variants in the BACH2 locus are associated with several autoimmune diseases, but BACH2 mutations that cause Mendelian monogenic primary immunodeficiency have not previously been identified. Here we describe a syndrome of BACH2-related immunodeficiency and autoimmunity (BRIDA) that results from BACH2 haploinsufficiency. Affected subjects had lymphocyte-maturation defects that caused immunoglobulin deficiency and intestinal inflammation. The mutations disrupted protein stability by interfering with homodimerization or by causing aggregation. We observed analogous lymphocyte defects in Bach2-heterozygous mice. More generally, we observed that genes that cause monogenic haploinsufficient diseases were substantially enriched for TFs and SE architecture. These findings reveal a previously unrecognized feature of SE architecture in Mendelian diseases of immunity: heterozygous mutations in SE-regulated genes identified by whole-exome/genome sequencing may have greater significance than previously recognized.

  • West et al, “TSLP acts on neutrophils to drive complement-mediated killing of methicillin-resistant Staphylococcus aureus”, Science Immunology, 2016.

Community-acquired Staphylococcus aureus infections often present as serious skin infections in otherwise healthy individuals and have become a worldwide epidemic problem fueled by the emergence of strains with antibiotic resistance, such as methicillin-resistant S. aureus (MRSA). The cytokine thymic stromal lymphopoietin (TSLP) is highly expressed in the skin and in other barrier surfaces and plays a deleterious role by promoting T helper cell type 2 (TH2) responses during allergic diseases; however, its role in host defense against bacterial infections has not been well elucidated. We describe a previously unrecognized non-TH2 role for TSLP in enhancing neutrophil killing of MRSA during an in vivo skin infection. Specifically, we demonstrate that TSLP acts directly on both mouse and human neutrophils to augment control of MRSA. Additionally, we show that TSLP also enhances killing of Streptococcus pyogenes, another clinically important cause of human skin infections. Unexpectedly, TSLP mechanistically mediates its antibacterial effect by directly engaging the complement C5 system to modulate production of reactive oxygen species by neutrophils. Thus, TSLP increases MRSA killing in a neutrophil- and complement-dependent manner, revealing a key connection between TSLP and the innate complement system, with potentially important therapeutic implications for control of MRSA infection.

  • Sun et al, “ZSCAN5B and primate-specific paralogs bind RNA polymerase III genes and extra-TFIIIC (ETC) sites to modulate mitotic progression”, Oncotarget, 2016.

Mammalian genomes contain hundreds of genes transcribed by RNA Polymerase III (Pol III), encoding noncoding RNAs and especially the tRNAs specialized to carry specific amino acids to the ribosome for protein synthesis. In addition to this well-known function, tRNAs and their genes (tDNAs) serve a variety of other critical cellular functions. For example, tRNAs and other Pol III transcripts can be cleaved to yield small RNAs with potent regulatory activities. Furthermore, from yeast to mammals, active tDNAs and related “extra-TFIIIC” (ETC) loci provide the DNA scaffolds for the most ancient known mechanism of three-dimensional chromatin architecture. Here we identify the ZSCAN5 TF family - including mammalian ZSCAN5B and its primate-specific paralogs - as proteins that occupy mammalian Pol III promoters and ETC sites. We show that ZSCAN5B binds with high specificity to a conserved subset of Pol III genes in human and mouse. Furthermore, primate-specific ZSCAN5A and ZSCAN5D also bind Pol III genes, although ZSCAN5D preferentially localizes to MIR SINE- and LINE2-associated ETC sites. ZSCAN5 genes are expressed in proliferating cell populations and are cell-cycle regulated, and siRNA knockdown experiments suggested a cooperative role in regulation of mitotic progression. Consistent with this prediction, ZSCAN5A knockdown led to increasing numbers of cells in mitosis and the appearance of cells. Together, these data implicate the role of ZSCAN5 genes in regulation of Pol III genes and nearby Pol II loci, ultimately influencing cell cycle progression and differentiation in a variety of tissues.

  • Kazemian et al Comprehensive assembly of novel transcripts from unmapped human RNA-Sequencing data and their association with cancer”, Molecular Systems Biology, 2015.

Crucial parts of the genome including genes encoding microRNAs and noncoding RNAs went unnoticed for years, and even now, despite extensive annotation and assembly of the human genome, RNA‐sequencing continues to yield millions of unmappable and thus uncharacterized reads. Here, we examined > 300 billion reads from 536 normal donors and 1,873 patients encompassing 21 cancer types, identified ~300 million such uncharacterized reads, and using a distinctive approach de novo assembled 2,550 novel human transcripts, which mainly represent long noncoding RNAs. Of these, 230 exhibited relatively specific expression or non‐expression in certain cancer types, making them potential markers for those cancers, whereas 183 exhibited tissue specificity. Moreover, we used lentiviral‐mediated expression of three selected transcripts that had higher expression in normal than in cancer patients and found that each inhibited the growth of HepG2 cells. Our analysis provides a comprehensive and unbiased resource of unmapped human transcripts and reveals their associations with specific cancers, providing potentially important new genes for therapeutic targeting.

  • Wan et al,

    Opposing Roles of STAT1 and STAT3 in IL-21 Function in CD4+ T cells

    , PNAS, 2015.

IL-21 is a type I cytokine important for immune cell differentiation and function. We found that transcription factors STAT1 and STAT3 play partially opposing roles in IL-21 function in CD4+ T cells. Both STAT1 and STAT3 control IL-21-mediated gene regulation, with some genes including Ifng, Tbx21, and Il21 reciprocally regulated by these STATs. IFN-g production was also differentially regulated by these STATs in vitro during CD4+ T cell differentiation and in vivo during chronic lymphocytic choriomeningitis infection. Importantly, IL-21-induced IFNG and TBX21 expression was higher in CD4+T cells from patients with autosomal dominant hyper-IgE syndrome or with STAT1 gain-of-function mutations, suggesting that dys-regulated IL-21-STAT signaling partially explains the clinical manifestations of these patients.

  • Kazemian et al, “Possible HPV38 contamination of endometrial cancer RNA-Seq samples in The Cancer Genome Atlas database”, J. Virology, 2015.
Viruses are causally associated with a number of human malignancies. In this study, we sought to identify new viral-cancer associations by searching RNA-Sequencing datasets from >2000 patients, encompassing 21 cancers from The Cancer Genome Atlas (TCGA), for the presence of viral sequences. In agreement with previous studies, we found human papillomavirus type 16 (HPV16) and HPV18 in oropharyngeal cancer and hepatitis B and C viruses in liver cancer. Unexpectedly, however, we found HPV38, a cutaneous form of HPV associated with skin cancer, in 32 of 168 samples with endometrial cancer. In 12 of the HPV38+samples, we observed at least one paired read that mapped to both human and HPV38 genomes, indicative of viral integration into host DNA, something not previously demonstrated for HPV38. The expression levels of HPV38 transcripts were relatively low, and all 32 HPV38+ samples belonged to the same experimental batch of 40 samples, whereas none of the other 128 endometrial carcinoma samples were HPV38+, raising doubts about the significance of the HPV38 association. Moreover, the HPV38+ samples contained the same 10 novel single nucleotide variations (SNVs), leading us to hypothesize that one patient was infected with this new isolate of HPV38, which was integrated into his/her genome and may have cross-contaminated other TCGA samples within batch #228. Based on our analysis, we propose guidelines to examine batch effect, virus expression level, and SNVs as part of NGS data analysis for evaluating the significance of viral/pathogen sequences in clinical samples.

Breakthrough paper ...

  • Blatti et al, "Integrating motif, DNA accessibility, and gene expression data to build regulatory maps in an organism". Nucleic Acids Research, 2015. 
Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.
  • Kazemian et al, "Evidence for deep regulatory similarities in early developmental programs across highly diverged insects". Genome Biology and Evolution, 2014.

Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like “long germband” development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studiedDrosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250–350 Myr) insect species including Anopheles gambiae,Tribolium castaneumApis mellifera, and Nasonia vitripennis. InDrosophila, we have successfully used small sets of known CRMs as “training data” to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary history of gene regulatory networks and defining the mechanisms underlying insect evolution.

  • Duque et al, Simulations of enhancer evolution provide mechanistic insights into gene regulation”, Mol Biol Evol, 2013.

There is growing interest in models of regulatory sequence evolution. However, existing models specifically designed for regulatory sequences consider the independent evolution of individual transcription factor (TF) binding sites, ignoring that the function and evolution of a binding site depends on its context, typically the cis-regulatory module (CRM) in which the site is located. Moreover, existing models do not account for the gene-specific roles of TF-binding sites, primarily because their roles often are not well-understood. We introduce two models of regulatory sequence evolution that address some of the shortcomings of existing models and implement simulation frameworks based on them. One model simulates the evolution of an individual binding site in the context of a CRM, while the other evolves an entire CRM. Both models use a state-of-the art sequence-to-expression model to predict the effects of mutations on the regulatory output of the CRM and determine the strength of selection. We use the new framework to simulate the evolution of TF-binding sites in 37 well-studied CRMs belonging to the anterior-posterior patterning system in Drosophila embryos. We show that these simulations provide accurate fits to evolutionary data from 12 Drosophila genomes, which includes statistics of binding site conservation on relatively short evolutionary scales and site loss across larger divergence times. The new framework allows us, for the first time, to test hypotheses regarding the underlying cis-regulatory code by directly comparing the evolutionary implications of the hypothesis to observed evolutionary dynamics of binding sites. Using this capability, we find that explicitly modeling self-cooperative DNA-binding by the TFCaudal (CAD) provides significantly better fits than an otherwise identical evolutionary simulation that lacks this mechanistic aspect. This hypothesis is further supported by a statistical analysis of the distribution of inter-site spacing between adjacent CAD sites. Experimental tests confirm direct homodimeric interaction between CAD molecules as well as self-cooperative DNA-binding by CAD. We note that computational modeling of the D. melanogaster CRMs alone did not yield significant evidence to support CAD self-cooperativity. We thus demonstrate how specific mechanistic details encoded in CRMs can be revealed by modeling their evolution and fitting such models to multi-species data.

 Server ... 

  • Kazemian et al, “Widespread and distinct sequence signatures of combinatorial transcriptional regulation”, Nucleic Acids Research, 2013.

Regulation of eukaryotic gene transcription is often combinatorial in nature, with multiple transcription factors (TFs) regulating common target genes, often through direct or indirect mutual interactions. Many individual examples of cooperative binding by directly interacting TFs have been identified, but it remains unclear how pervasive this mechanism is during animal development. Cooperative TF binding should be manifest in genomic sequences as biased arrangements of TF-binding sites. Here, we explore the extent and diversity of such arrangements related to gene regulation during Drosophila embryogenesis. We used the DNA-binding specificities of 322 TFs along with chromatin accessibility information to identify enriched spacing and orientation patterns of TF-binding site pairs. We developed a new statistical approach for this task, specifically designed to accurately assess inter-site spacing biases while accounting for the phenomenon of homotypic site clustering commonly observed in developmental regulatory regions. We observed a large number of short-range distance preferences between TF-binding site pairs, including examples where the preference depends on the relative orientation of the binding sites. To test whether these binding site patterns reflect physical interactions between the corresponding TFs, we analyzed 27 TF pairs whose binding sites exhibited short distance preferences. In vitro protein-protein binding experiments revealed that >65% of these TF pairs can directly interact with each other. For five pairs, we further demonstrate that they bind cooperatively to DNA if both sites are present with the preferred spacing. This study demonstrates how DNA-binding motifs can be used to produce a comprehensive map of sequence signatures for different mechanisms of combinatorial TF action.

Available source code for iTF

iTFs_v1.0_Mac.tar.gz iTFs_v1.0_Mac.tar.gz
Size : 608.243 Kb
Type : gz
iTFs_v1.0_Linux.tar.gz iTFs_v1.0_Linux.tar.gz
Size : 614.78 Kb
Type : gz
  • Cheng et al, “Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy”, PLoS Genetics. 2013.

ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA.

  • Enuameh et al, “Global analysis of Drosophila Cys2-His2 zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants”, Genome Research, 2013.

Cys2-His2 zinc finger proteins (ZFPs) are the largest group of transcription factors in higher metazoans. A complete characterization of these ZFPs and their associated target sequences is pivotal to fully annotate transcriptional regulatory networks in metazoan genomes. As a first step in this process, we have characterized the DNA-binding specificities of 129 zinc finger sets from Drosophila using a bacterial one-hybrid system. This data set contains the DNA-binding specificities for at least one encoded ZFP from 70 unique genes and 23 alternate splice isoforms representing the largest set of characterized ZFPs from any organism described to date. These recognition motifs can be used to predict genomic binding sites for these factors within the fruit fly genome. Subsets of fingers from these ZFPs were characterized to define their orientation and register on their recognition sequences, thereby allowing us to define the recognition diversity within this finger set. We find that the characterized fingers can specify 47 of the 64 possible DNA triplets. To confirm the utility of our finger recognition models, we employed subsets of Drosophila fingers in combination with an existing archive of artificial zinc finger modules to create ZFPs with novel DNA-binding specificity. These hybrids of natural and artificial fingers can be used to create functional zinc finger nucleases for editing vertebrate genomes.

  • Shahinfar et al, “Prediction of Breeding Values for Dairy Cattle Using Artificial Neural Networks and Neuro-Fuzzy Systems”, Computational and Mathematical Methods in Medicine, 2012.

Developing machine learning and soft computing techniques has provided many opportunities for researchers to establish new analytical methods in different areas of science. The objective of this study is to investigate the potential of two types of intelligent learning methods, artificial neural networks and neuro-fuzzy systems, in order to estimate breeding values (EBV) of Iranian dairy cattle. Initially, the breeding values of lactating Holstein cows for milk and fat yield were estimated using conventional best linear unbiased prediction (BLUP) with an animal model. Once that was established, a multilayer perceptron was used to build ANN to predict breeding values from the performance data of selection candidates. Subsequently, fuzzy logic was used to form an NFS, a hybrid intelligent system that was implemented via a local linear model tree algorithm. For milk yield the correlations between EBV and EBV predicted by the ANN and NFS were 0.92 and 0.93, respectively. Corresponding correlations for fat yield were 0.93 and 0.93, respectively. Correlations between multitrait predictions of EBVs for milk and fat yield when predicted simultaneously by ANN were 0.93 and 0.93, respectively, whereas corresponding correlations with reference EBV for multitrait NFS were 0.94 and 0.95, respectively, for milk and fat production.

  • Kazemian et al, “Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison”, Nucl. Acids Res., 2011.

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. 

Available source codes for enhancer prediction methods and Loci Length-aware Hypergeometric Test

LLHT.tar.tar.gz LLHT.tar.tar.gz
Size : 36.912 Kb
Type : gz
HexMCD.zip HexMCD.zip
Size : 64.066 Kb
Type : zip
IMM.zip IMM.zip
Size : 285.467 Kb
Type : zip
PAC-rc.zip PAC-rc.zip
Size : 876.802 Kb
Type : zip
  • Kazemian et al, “Genome surveyor 2.0: cis-regulatory analysis in Drosophila”, Nucl. Acids Res., 2011.
Genome Surveyor 2.0 is a web-based tool for discovery and analysis of cis-regulatory elements in Drosophila, built on top of the GBrowse genome browser for convenient visualization. Genome Surveyor was developed as a tool for predicting transcription factor (TF) binding targets and cis-regulatory modules (CRMs/enhancers), based on motifs representing experimentally determined DNA binding specificities. Since its first publication, we have added substantial new functionality (e.g. phylogenetic averaging of motif scores from multiple species, and a novel CRM discovery technique), increased the number of supported motifs about 4-fold (from ∼100 to ∼400), added provisions for evolutionary comparison across many more Drosophila species (from 2 to 12), and improved the user-interface. The server is free and open to all users, and there is no login requirement. Address: http://veda.cs.uiuc.edu/gs.
  • Zhu et al, “FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system”. Nucl. Acids Res., 2010.

FlyFactorSurvey (http://pgfe.umassmed.edu/TFDBS/) is a database of DNA binding specificities for Drosophila transcription factors (TFs) primarily determined using the bacterial one-hybrid system. The database provides community access to over 400 recognition motifs and position weight matrices for over 200 TFs, including many unpublished motifs. Search tools and flat file downloads are provided to retrieve binding site information (as sequences, matrices and sequence logos) for individual TFs, groups of TFs or for all TFs with characterized binding specificities. Linked analysis tools allow users to identify motifs within our database that share similarity to a query matrix or to view the distribution of occurrences of an individual motif throughout the Drosophila genome. Together, this database and its associated tools provide computational and experimental biologists with resources to predict interactions between Drosophila TFs and target cis-regulatory sequences.

  • Kazemian et al, Quantitative analysis of the Drosophila segmentation regulatory network using pattern generating potentials”. PLoS Biology, 2010.

Cis-regulatory modules that drive precise spatial-temporal patterns of gene expression are central to the process of metazoan development. We describe a new computational strategy to annotate genomic sequences based on their "pattern generating potential" and to produce quantitative descriptions of transcriptional regulatory networks at the level of individual protein-module interactions. We use this approach to convert the qualitative understanding of interactions that regulate Drosophila segmentation into a network model in which a confidence value is associated with each transcription factor-module interaction. Sequence information from multiple Drosophila species is integrated with transcription factor binding specificities to determine conserved binding site frequencies across the genome. These binding site profiles are combined with transcription factor expression information to create a model to predict module activity patterns. This model is used to scan genomic sequences for the potential to generate all or part of the expression pattern of a nearby gene, obtained from available gene expression databases. Interactions between individual transcription factors and modules are inferred by a statistical method to quantify a factor's contribution to the module's pattern generating potential. We use these pattern generating potentials to systematically describe the location and function of known and novel cis-regulatory modules in the segmentation network, identifying many examples of modules predicted to have overlapping expression activities. Surprisingly, conserved transcription factor binding site frequencies were as effective as experimental measurements of occupancy in predicting module expression patterns or factor-module interactions. Thus, unlike previous module prediction methods, this method predicts not only the location of modules but also their spatial activity pattern and the factors that directly determine this pattern. As databases of transcription factor specificities and in vivo gene expression patterns grow, analysis of pattern generating potentials provides a general method to decode transcriptional regulatory sequences and networks.

  • Kazemian et al, "Using classifier fusion techniques for protein secondary structure prediction", Int. J. Comput. Intelligence in Bioinformatics and Systems Biology, 2010.

Classifier fusion techniques are gaining more popularity for their capability of improving the accuracy achieved by individual classifiers. A common approach is to combine the classifiers' outcome using simple methods, such as majority voting. In this paper, we build a meta-classifier by fusing some already well-known classifiers for protein structure prediction. Each individual classifier outputs a unique structure for every input residue. We have used the confusion matrix of each protein secondary structure classifier, which is representative of classifiers' expertness, as a general reusable pattern for converting its simple class-label assignment to class-preference score. The results obtained using several classifier fusion operators have been compared, on some standard datasets from the EVA server, with simple majority voting and with the results provided by the individual classifiers. The comparative analysis showed that the Choquet fuzzy integral operator had the highest improvement with respect to accuracy, multi-class sensitivity and specificity criteria over both the best performing individual classifier and the other fusion operators, while all of the classifier fusion techniques yielded some improvements too.

 In the news ... 

  • Kantorovitz et al, “Motif-Blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse”, Developmental Cell, 2009.
We present new approaches to cis-regulatory module (CRM) discovery in the common scenario where relevant transcription factors and/or motifs are unknown. Beginning with a small list of CRMs mediating a common gene expression pattern, we search genome-wide for CRMs with similar functionality, using new statistical scores and without requiring known motifs or accurate motif discovery. We cross-validate our predictions on 31 regulatory networks in Drosophila and through correlations with gene expression data. Five predicted modules tested using an in vivo reporter gene assay all show tissue-specific regulatory activity. We also demonstrate our methods' ability to predict mammalian tissue-specific enhancers. Finally, we predict human CRMs that regulate early blood and cardiovascular development. In vivo transgenic mouse analysis of two predicted CRMs demonstrates that both have appropriate enhancer activity. Overall, 7/7 predictions were validated successfully in vivo, demonstrating the effectiveness of our approach for insect and mammalian genomes.
  • Keyhanipoor et al, “Aggregation of web search engines based on users' preferences in WebFusion”, Knowledge-based Systems, 2007.
The required information of users is distributed in the databases of various search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. Meta-search engines could provide a unified access for their users. In this paper, a novel meta-search engine, named as WebFusion, is introduced. WebFusion learns the expertness of the underlying search engines in a certain category based on the users’ preferences. It also uses the “click-through data concept” to give a content-oriented ranking score to each result page. Click-through data concept is the implicit feedback of the users’ preferences, which is also used as a reinforcement signal in the learning process, to predict the users’ preferences and reduces the seeking time in the returned results list. The decision lists of underling search engines have been fused using ordered weighted averaging (OWA) approach and the application of optimistic operator as weightening function has been investigated. Moreover, the results of this approach have been compared with those achieve by some popular meta-search engines such as ProFusion and MetaCrawler. Experimental results demonstrate a significant improvement on average click rate, and the variance of clicks as well as average relevancy criterion.
  • M. Kazemian et al. “A new expertness index for assessment of secondary structure prediction engines”, Journal of Computational Biology and Chemistry, 2007.

Improvement of prediction accuracy of the protein secondary structure is essential for further developments of the whole field of protein research. In this paper, the expertness of protein secondary structure prediction engines has been studied in three levels and a new criterion has been introduced in the third level. This criterion could be considered as an extension of the previous ones based on amino acid index. Using this new criterion, the expertness of some high score secondary structure prediction engines has been reanalyzed and some hidden facts have been discovered. The results of this new assessment demonstrated that a noticeable harmony has been existed among each amino acid prediction behavior in all engines. This harmony has also been seen between single global propensity and prediction accuracy of amino acid types in each secondary structure class. Moreover, it is shown that Proline and Glycine amino acids have been predicted with less accuracy in alpha helices and beta strands. In addition, regardless of different approaches used in prediction engines, beta strands have been predicted with less accuracy.

  • Kazemian et al, “Architecture for biological database integration”, Special Issue on AI & Specific Applications, ICGST International Journal on Artificial Intelligence and Machine Learning, AIML, 2006.

The work in laboratory involves integration of various data sources to solve biological problems. Our philosophy is that different types of data sources will give us more information than a single one. By combining data sources intelligently, we are able to obtain a more complete picture of the problem. Here we introduced a general architecture for Bio Meta Search Engines based on Decision Fusion concept. This architecture has seven stages. In addition, it has three databases for keeping the underlying engines statistics and biological insights and users’ preferences which are evolved through system using.

  • Kazemian et al,  Swarm clustering based on flowers pollination by artificial bees”, Studies in computational intelligence, Swarm Intelligence and Data Mining, Springer, 2006.

This chapter presents a new swarm data clustering method based on flowers pollination by artificial bees we named it FPAB. FPAB does not require any parameter settings and any initial information such as the number of classes and the number of partitions on input data. Initially, in FPAB, bees move the pollens and pollinate them. Each pollen will grow in proportion to its garden flowers. Better growing will occur in better conditions. After some iteration natural selection reduces the pollens and flowers to form gardens of same type of flowers. The prototypes of each gardens are taken as the initial cluster centers for Fuzzy C Means algorithm which is used to reduce obvious misclassification errors. In the next stage the prototypes of gardens are assumed as a single flower and FPAB is applied to them again. Results from three small data sets show that the partitions produced by FPAB are competitive with those obtained from FCM or AntClass. 

  • Kazemian et al, “Protein secondary structure classifiers fusion using OWA”, Lecture Notes in Computer Science, 2005.

The combination of classifiers has been proposed as a method to improve the accuracy achieved by a single classifier. In this study, the performances of optimistic and pessimistic ordered weighted averaging operators for protein secondary structure classifiers fusion have been investigated. Each secondary structure classifier outputs a unique structure for each input residue. We used confusion matrix of each secondary structure classifier as a general reusable pattern for converting this unique label to measurement level. The results of optimistic and pessimistic OWA operators have been compared with majority voting and five common classifiers used in the fusion process. Using a benchmark set from the EVA server, the results showed a significant improvement in the average Q3 prediction accuracy up to 1.69% toward the best classifier results.