Epigenetic Dissection of Intra-Sample-Heterogeneity

Welcome to EpiDISH web toolkit

This web toolkit is based on Bioconductor package EpiDISH. The original BioC package contains functions to infer cell-type fractions from DNAm profiles of heterogeneous tissues, using a DNAm reference matrix for common tissue types together with the CellDMC algorithm to identify differentially methylated cell types in EWAS. In addition to all functionalities provided in the BioC R package, the web toolkit provides interactive visualization tools, which are more user-friendly to those who are not familiar with R programming.


How to use

You will find the webpage quite self-explaining. You can just follow the order in the navigator(or download the pdf example file):

  1. Data preparation: Upload your beta value matrix, POI vector(optional) and covariates matrix(used in CellDMC; optional).
  2. Infer CT fraction: Select a mode and reference(s) to infer cell-type fractions. Check the results with interactive figures and save results in pdf and txt files.
  3. Run CellDMC: Run CellDMC with previsouly inferred CT fractions to identify differentially methylated cell types.

Cell-type fractions estimation

Inference of CT fractions proceeds via one of 3 methods (Robust Partial Correlations-RPC(Teschendorff et al. 2017), Cibersort (CBS)(Newman et al. 2015), Constrained Projection (CP)(Houseman et al. 2012), as determined by the user.
For now, we provide 18 references, including 4 EpiDISH references (blood & generic cell-types) and 14 EpiSCORE references (tissue-specific). The 4 EpiDISH references include two blood subtypes references, as well as one reference with epithelial cells, fibroblasts, and total immune cells, and one reference with epithelial cells, fibroblasts, adipose cells, and total immune cells, described in Teschendorff et al. 2017 and Zheng et al. 2018a. If you want to infer CT fractions of each immune cell type, you might want to use HEpiDISH, which is an iterative hierarchical procedure. HEpiDISH uses two distinct DNAm references, a primary reference for the estimation of several cell-types fractions, and a separate secondary non-overlapping DNAm reference for the estimation of underlying subtype fractions of one of the cell types in the primary reference.

The above figure describes how HEpiDISH works. You can find more info in Zheng et al. 2018a.


The 14 tissue-specific EpiSCORE references encompass 13 tissue-types (bladder, brain, breast, colon, esophagus, heart, kidney, liver, lung, olfactory epithelium, pancreas, prostate and skin) and 40 cell-types, described in Zhu et al. 2022. The reference matrices are free to download at EpiSCORE website. In EpiSCORE references, a weight is defined for each marker to show how informative it is. We recommend using weighted robust partial correlation method as implemented in this web server, described in Teschendorff et al. 2022.


Identification of differentially methylated cell-types

An outstanding challenge of epigenome-wide association studies (EWASs) performed in complex tissues is the identification of the specific cell type(s) responsible for the observed differential DNA methylation. We developed a statistical algorithm called CellDMC, which can identify differentially methylated positions within the specific cell type(s) driving the differential methylation. The ability to detect the altered cell types associated with disease and disease risk will facilitate the identification and development of biomarker assays for epigenetic disease risk, in line with the aims of P4 Medicine. CellDMC was published in Zheng et al. 2018b.


Abbreviaions/Acronym

  • CT: Cell-type
  • DNAm: DNA methylation
  • EpiDISH: Epigenetic Dissection of Intra-Sample-Heterogeneity
  • POI: Phenotype of interest
  • DMCs: Differentially methylated cytosines
  • DMCTs: DMC in individual cell types

News

This is the beta version of EpiDISH toolkit, which is still under development and improvement. If you have any issue, please contact Shijie C. Zheng.

Contact

Written by Shijie C. Zheng
Email: shijieczheng@gmail.com
Computational Systems Genomics Group
CAS-MPG Partner Institute for Computational Biology
320 Yueyang Rd, Xuhui District
Shanghai 200031, P.R.China

Data privacy statement

The data you upload will NOT be stored, reused, or shared by us in any form.

Beta value matrix

Here you upload your beta value matrix with rows labeling the CpGs (usually Illumina BeadArray probe IDs) and columns labeling samples. NA values are not allowed. If you only want to infer cell-type fractions (not running CellDMC later), you can upload a subset beta value matrix, which only contains cell-type specific CpGs as in the reference maitrx.

You can download the example beta value file here (Tips: The first column of your data should have a name, e.g. cpg. The values of the first column will be used as feature names later.). Both txt and csv formats are acceptable. You can choose the separator (tab, comma, or semicolon).



POI (Phenotype of interest)

Here you upload your POI vector file. It will be used in CT fractions boxplot and CellDMC. This is not required for CT fraction inference.

You can download the example POI vector file here (Tips: The first column of your data should have a name, e.g. SamleName. The values of the first column will be used as sample names later.). Both txt and csv formats are acceptable. You can choose the separator (tab, comma, or semicolon).



Covariates matrix

Here you upload covariates matrix used in CellDMC with rows labeling samples and columns labeling variables.

You can download the example covariates file here (Tips: The first column of your data should have a name, e.g. SamleName. The values of the first column will be used as sample names later.). Both txt and csv formats are acceptable. You can choose the separator (tab, comma, or semicolon).









You can brush the boxplot to check data points.