汉族基因组数据库

Help

In the database section, PGG.Han provides visualization of the fine-scale genetic structure of the Han Chinese population and genome-wide allele frequency of genetic and geographical sub-populations

Genetic structure (Jump)

1. Genetic Affinity

This page shows the genetic relationship within the Han Chinese population at two different levels. The first half is divided into sub-populations by provinces, and the second half is divided by genetic structure. Click on a sub-population on the map and the genetic relationship between the sub-population and other sub-populations is shown on the right.

2. Population Structure

This part shows the genetic coordinates of Han Chinese population, including grouping by province and by genetic structure. Click on a sub-population on the map and its genetic coordinates will be displayed on the right.

3. Ancestry

This part shows the genetic composition of the Han Chinese population in the context of worldwide populations. Each individual is represented by a single line broken into K colored segments, with lengths proportional to the K inferred components (Cs). The population IDs are presented outside of the circle of the plot. You can see the results of different K by clicking on the drop-down menu.

Variant (Jump)

The high- quality genome-wide SNP genotyped data can be queried on this page. We provide two different ways of querying, position or rsID. In addition to displaying a map of frequency distribution and a data table, we also provide external links to other databases.

1. How to choose the imputation tools?

If you are going to run imputation with a reference panel, we recommend the combination “SHAPEIT2 + IMPUTE2” which achieves the highest imputation precision and sensitivity among all the combinations of tools. Since only Beagle4 and PBWT are able to do imputation without reference panels, they are the exclusive choices for recalling the missing genotypes in users’ data without reference panels.

2. How to choose the reference panels?

The choices of reference panels are made upon your request for imputation:

If your data only contains Han samples and you only care about the Han population, reference panel “Han100K” works better than any other reference panels in our imputation service.
If your data contains samples of the other populations, reference panels “1KG” “HRC” and “SGDP” can be used to give you results which can be taken into analysis with global populations. While “HRC” includes “1KG”, the advantage of “HRC” over “1KG” mainly lies in the better representation of European populations. Since “SGDP” contains more population than “1KG” and “HRC”, the relatively small sample size limits its application.
If you only wish to recall the missing genotypes in your data rather than expand variant sites, please run imputation without reference panel.

3. Will all variants in my data be taken into imputation?

Currently the imputation service is only able to handle the biallelic SNP data in autosomes. On the other hand, this issue also depends on whether the reference panel is used in imputation. If no reference panel is used, all variant in your data will be taken into imputation. While imputation with reference panel requires the quality control and strand recalibration of input data, there might be some sites discarded before imputation.