In the database section, PGG.Han provides visualization of the fine-scale genetic structure of the Han Chinese population and multiple genetic information of genome-wide SNVs by genetic and geographical sub-populations

Genetic structure (Jump)
1. Genetic Affinity

This page shows the genetic relationship within the Han Chinese population at two different levels. The first half is divided into sub-populations by provinces, and the second half is divided by genetic structure. Click on a sub-population on the map and the genetic relationship between the sub-population and other sub-populations is shown on the right.

2. Population Structure

This part shows the genetic coordinates of Han Chinese population, including grouping by province and by genetic structure. Click on a sub-population on the map and its genetic coordinates will be displayed on the right.

3. Ancestry

This part shows the genetic composition of the Han Chinese population in the context of worldwide populations. Each individual is represented by a single line broken into K colored segments, with lengths proportional to the K inferred components (Cs). The population IDs are presented outside of the circle of the plot. You can see the results of different K by clicking on the drop-down menu.

4. Gene Flow

This part shows the shared genetic drift among selected 3 populations, which can also obtain recent gene flow from 2 source population to the target population. Only 2 of 3 populations from Source1, Source2 and Target should be selected and click the ‘Search’ button will gain the results.

Variant (Jump)

The high-quality genome-wide SNP genotyped data can be queried on this page. We provide two different ways of querying, position or rsID. For one selected SNV, there is multiple genetic information such as allele frequency, variant annotation, genome diversity and so on. In addition to displaying a map of frequency distribution and a data table, we also provide external links to other databases.

In the analysis section, The PGG.Han provides: 1) nested AIMs panels for detecting and controlling population stratification in medical and evolutionary studies; 2) a population-structure-aware shared control for genotype-phenotype association studies (e.g., GWAS); 3) a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user interface is provided for data analysis and results in visualization.

Data security

Any data you upload is protected seriously. Only you can read them. You can delete your data at any time from our servers. We do not use it for our analyses.

Using the server
1. Prepare your data

Acceptable Input: PLINK 1 binary(*.bed, *.bim, *.fam) or VCF format.

The following information is required:
a. All alleles of the forward strand;
b. GRCh38 Coordinates.

2. Registration and Login (Jump)

Registration is required for the first use of PGG. Han Analysis Server. After logging in, the service can be used for free.

3. Upload data & QC (Jump)

We provide two ways to upload data. For small-scale data (<100MB), you can upload it directly on the web through the HTTP protocol. For large-scale data, you need to upload it to our server via FTP. For a better experience, we limit the number of samples to <=200 and the number of variants to <=5,000,000.

Before you start analysis, you need to do data checking. Only data that complete data checking can be used for other analyses.

4. Pipelines

a. Ancestry inference
Various commonly used algorithms/analyses for ancestry inference are applied to dissect the ancestry composition and genetic affinity of an individual of interest. For more details and a demo report, click here.
b. Imputation
Our imputation service is designed to meet a request for imputation: achieving the best imputation result of Han population data with a reference panel based on our NGS datasets of Han Chinese. For more details and a demo report, click here.
We provide the platform for GWAS analysis, as well as the largest control of the Han Chinese population (Han100K). Users only need to provide genotype data, covariate, and phenotype files. For more details and a demo report, click here.
d. Quick Start
A simple, fast, and efficient integrated analysis pipeline integrating three analysis modules of Ancestry Inference, Imputation and GWAS. For more details, click here.

5. Results

All analysis results will be presented in the form of file downloads and visual online reports.

  • If your data have specific information of Han subgroups, reference panel “Subgroup Central China Han, “Subgroup Northeast Han”, “Subgroup Northwest Han”, “Subgroup Southeast Han”, “Subgroup Southwest Han”, “Subgroup Southcoast Han” can be selected according to the subgroups.
  • If you need more variants, “Han deep sequencing” is the best choice.
  • If you want to run faster, “Han100K” and “CONVERGE” can be used.

Currently, the imputation service is only able to handle the biallelic SNP data in autosomes. Imputation with a reference panel requires the quality control and strand recalibration of input data, there might be some variants discarded before imputation. Then, only shared variants between input datasets and reference panels can be used to genotype imputation.