Computation

Epigenomics
A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data.

A novel model-based intra-array normalisation strategy for 450k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450k platform.

Independent Surrogate Variable Analysis

Independent Surrogate Variable Analysis (ISVA), to identify features correlating with a phenotype of interest in the presence of potential confounding factors. ISVA should be useful as a feature selection tool in studies that are subject to confounding.

Denoising Algorithm based on Relevance network Topology

Denoising Algorithm based on Relevance network Topology (DART) is an algorithm designed to evaluate the consistency of prior information molecular signatures (e.g in-vitro perturbation expression signatures) in independent molecular data (e.g gene expression data sets). If consistent, a pruning network strategy is then used to infer the activation status of the molecular signature in individual samples.

A novel Bayesian network inference algorithm for integrative analysis of heterogeneous deep sequencing data

Deep sequencing has spurred genome-wide mapping of transcription factor binding sites, histone modifications and DNA methylations.

Expression
Functional dissection of regulatory models using gene expression data of deletion mutants.

Genome-wide gene expression profiles accumulate at an alarming rate, how to integrate these expression profiles generated by different laboratories to reverse engineer the cellular regulatory network has been a major challeng.

iTranscriptome (for in silico transcriptome) is a web portal to provide open access of the spatial transcriptome data of mouse E7.0 embryo.

iTranscriptome (for in silico transcriptome) is a web portal to provide open access of the spatial transcriptome data of mouse E7.0 embryo. The iTranscriptome offers data search functionalities in (1) pattern search by gene: querying and displaying the expression pattern of genes of interest in either a corn plot or a digitally reconstructed d-WISH format; (2) gene search by gene: searching for genes that share a similar expression pattern with the queried gene; and (3) search for a set of genes displaying a predefined (20 in the database) or customized (user-defined) expression pattern. Besides these functionalities, the iTranscriptome also enables matching queried cells to their in vivo counterparts in the epiblast by zip code mapping.

Genome&Phenome
Epigenetic Dissection of Intra-Sample-Heterogeneity

This web toolkit is based on Bioconductor package EpiDISH. The original BioC package contains functions to infer cell-type fractions from DNAm profiles of heterogeneous tissues, using a DNAm reference matrix for common tissue types together with the CellDMC algorithm to identify differentially methylated cell types in EWAS. In addition to all functionalities provided in the BioC R package, the web toolkit provides interactive visualization tools, which are more user-friendly to those who are not familiar with R programming.

Immunology
CE-BLAST is presented here as an un-supervised program to compare the antigenic similarity between any conformational epitopes in protein antigens.

CE-BLAST is presented here as an un-supervised program to compare the antigenic similarity between any conformational epitopes in protein antigens. Neighboring residual layout, structural contacts and physical-chemical environment were fully considered for each epitope residue at the 3-dimentional level during the comparison. Large-scale independent testing on all known conformational epitopes from PDB and hemagglutination inhibition data indicated the high accuracy and sensitivity of CE-BLAST. Currently the three built-in epitope databases include: 1) 559 known epitope structures derived from immune-complexes in PDB database; 2) epitope database of 1725 modeled HA structures representing 15238 HA1 sequences for H3N2; 3) epitope database of 1284 modeled HA structures representing 16672 HA1 sequences for H1N1 respectively. Users can also define their own epitope sites for HA1 antigen. Furthermore, multiple epitopes can be uploaded to search against each other by users.

Spatial Epitope Prediction for Protein Antigens, particularly for N-linked glycoproteins

SEPPA 3.0 is an enhanced version of SEPPA2.0. In this latest version, SEPPA 3.0 realized below functions: 1) Improved the performance on common protein antigens by updating the training dataset; 2) Incorporated new features enabling the accurate prediction for N-linked glycoprotein antigens. As the first algorithm considering the N-glycosylation sites, SEPPA 3.0 shows significant advantages over popular peers both on common and N-linked glycoprotein antigens. More information can be found in help page of SEPPA3.0.

Population Genomics
A Package for Elementary Analysis of SNP Data

PEAS can handle very large data set, it is versatile especially in formatting, data splitting, data combining, sampling for both markers and individuals for further analysis. To fill up the gaps of currently available programs, PEAS are designed to calculate individual allele sharing distance, population genetic distances, do bootstrapping and calculating LD statistics for large-scale SNP data set. As the assistant tools of many other popular programs, PEAS are also designed to provide formatted input files for many programs, such as fastPHASE, PHASE, STRUCTURE, Haploview, Arlequin and LDhat and so on. PEAS can also manage the output results of some other programs. 

Other
MAP
MAP: model-based analysis of proteomic data to detect proteins with significant abundance changes.

MAP is designed to statistically compare quantitative proteomic data generated from two different cell types or states based on isotope-labeling technique, and reliably identify proteins showing significant abundance changes between them (or peptides if the analysis is carried on peptide level). As the key feature of MAP, it can directly model technical errors from the two proteomic profiles under comparison, without borrowing information from parallel technical replicates. It considers all detected proteins as a mixture of differentially and non-differentially expressed ones, and chooses those with low intensity changes to model the contribution of technical and systematic error as a function of the protein intensity level by using a novel step-by-step regression analysis. This estimated error function is then used as the reference to calculate a P-value for each protein to represent the significance of its abundance change.

MAmotif is used to compare two ChIP-seq samples of the same protein from different cell types (or conditions, e.g. wild-type vs mutant) and identify transcriptional factors (TFs) associated with the cell type-biased binding of this protein as its co-factors, by using TF binding information obtained from motif analysis (or from other ChIP-seq data).

MAmotif is used to compare two ChIP-seq samples of the same protein from different cell types (or conditions, e.g. wild-type vs mutant) and identify transcriptional factors (TFs) associated with the cell type-biased binding of this protein as its co-factors, by using TF binding information obtained from motif analysis (or from other ChIP-seq data). MAmotif automatically combines MAnorm model to perform quantitative comparison on input ChIP-seq samples together with Motif-Scan toolkit to scan ChIP-seq peaks for TF binding motifs, and uses a systematic integrative analysis to search for TFs whose binding sites are significantly associated with the cell type-biased peaks between two ChIP-seq samples.

MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets.

MAnorm is a robust computational model designed for quantitative comparison of two ChIP-Seq samples from different cell types. The key feature of MAnorm model is using the common peaks (binding sites) of two ChIP-Seq samples to build the reference model for ChIP-seq signal intensity normalization. For each peak site, MAnorm calculates a log2-ratio of normalized ChIP-seq read densities between two samples (i.e. the M values) to represent the quantitative change of ChIP-seq intensities at this peak, together with a P-value to represent the significance of the ChIP-seq intensity change. MAnorm can also be applied on DNase/ATAC-seq data to detect genomic regions with cell type-biased chromatin accessibility.

Improved nucleosome-positioning algorithm

Accurately detecting genome-wide nucleosome positions is important to understanding chromatin remodeling events in gene regulation. NPS is a widely used software package for detecting nucleosome positions from MNase-seq data, but its accuracy needs much improvement...

CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation.

A routine approach to inferring functions for a gene set is by using function enrichment analysis based on Gene ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) curated terms and pathways…