
Welcome to PGG.Han !
The Han Chinese Genomes Database (PGG.Han) serves as the central repository of the genomic data of the Han Chinese Genomes Project (Phase I). As is its current version, PGG.Han archives 8,056,973 genome-wide single nucleotide variants (SNVs) of 102,583 Han Chinese individuals (a.k.a. Han100K). PGG.Han aims to: 1) facilitate understanding the population structure and history of Han Chinese; 2) screen AIMs panels for detecting and controlling population stratification in medical and evolutionary studies; 3) create a shared control panel for genotype-phenotype association studies (e.g., GWAS); 4) provide a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user interface is provided for data analysis and results visualization.
Fine-scale population structure: Han Chinese individuals formed a distinct cluster from the surrounding groups including minority groups in China and other neighboring countries, suggesting a full-identity of Han Chinese people in terms of overall genetic make-up. Sub-populations within Han Chinese are seen, which represent 6 sub-groups: North, Northeast, Central, South, Southwest, and Southeast. In spite of connections are also obvious among the groups, the northern Han Chinese have been influenced more by northern Chinese minorities, and southern Han Chinese by southern neighbors.
Ancestry Informative Markers (AIMs): We screened nested AIMs panels for detecting population structure and controlling population stratification to improve association testing and population genetic analysis. Our analysis show that the AIMs panel had sufficient power to discern and control population stratification in Han Chinese, which could significantly reduce false positive rates in both genome-wide association studies (GWAS) and candidate gene association studies (CGAS). We suggest this AIMs panel be genotyped and used to control and correct population stratification in the study design or data analysis of future association studies, especially in CGAS which is the most popular approach to validate previous reports on genetic associations of diseases in post-GWAS era. In addition, we also provide a computational pipeline for users who can customize the reverence populations for screening their own AIMs panels. genetic ancestry and population structure
Han-Chinese-Specific Reference Panel for Genotype Imputation: A reference panel facilitate complex diseases mapping using population-based association studies, which has been well-established and demonstrated its power for populations of European ancestry but lacks for Han Chinese, the largest ethnic group in East Asia and in the world. We develop a Han-Chinese-Specific Reference Panel and an online server for genotype imputation to facilitate further association studies. Notably, we provide a population structure-aware reference panel by which users can customize the imputation reference by selecting particular subpopulation samples with respect to population stratification.
A shared control panel for GWAS:
Similarly, a shared control and a reference panel facilitate complex diseases mapping using population-based association studies, which has been well-established and demonstrated its power for populations of European ancestry but lacks for Han Chinese. Here, taking into account the sub-population structure, we constructed a structure-aware control, Han100K, for further population genetics analysis and association studies.
We do not offer:
1) individual information, but geographical and linguistic information is available for regional groups (e.g. administrative divisions such as province, genetically subpopulations);
2) raw data for export or analysis, but a number of summary reports are available in the database;
3) bulk download of genotype data, but allele frequency data can be retrieved from PGG.snv (https://www.pggsnv.org), another database also constructed and maintained by us.
The database was designed, created and maintained by Population Genomics group (PGG) led by Dr. Shuhua Xu.

Shuhua Xu
Principal Investigator

Chao Zhang
Database

Yang Gao
Pre-processing

Xiaoji Wang
Imputation, Data Processing

Chang Liu
Ancestry Composition

Yuwen Pan
GWAS

Xixian Ma
Ancestry Composition

Xiaoxi Zhang
Data processing

Kai Yuan
Archaic analysis

Yuchen Wang
Ancestry Composition

Yan Lu
Y-DNA & Mt-DNA haplogroups
With great contribution of the IT team from Big Data Center let by Dr. Guoqing Zhang.
Guoqing Zhang
Conception & Design
Liyun Yuan
Framwork & Module
Yunchao Ling
Framwork & Module
Wei Ye
Backend development
Jiaqiang Qian
Jiaqiang Qian
Huidan Chang
Backend development
Ruifang Cao
Data Processing
Xiao Yang
Data Processing
Ling Ma
Web Development
Yuanhu Ju
Web Development
Long Dai
UI Design
Yuanyuan Tang
UI Design
The majority of the genomic data of Han Chinese were contributed by the Han100K Project led by Prof. Dr. Shuhua Xu. Most participants of the Han100K Project are from China and some are from countries & regions where many Han Chinese are residing, such as Singapore and USA.
See below for the full list of participants of the Han100K Project (* indicates PI).
- CAS-MPG Partner Institute for Computational Biology (PICB) (Shuhua Xu*; Xiaoji Wang; Chao Zhang; Yuchen Wang; Chang Liu; Xixian Ma, ….);
- School of Life Science and Technology, ShanghaiTech University (Shuhua Xu*; Yang Gao; Xiaoxi Zhang);
- Shanghai Jiao Tong University (Yongyong Shi*; Jiawei Shen;);
- Obstetrics and Gynecology Hospital, Fudan University (Feng Zhang);
- School of Life Sciences, Fudan University (Li Jin*);
- Paul-Gerson-Unna Research Group of Dermatogenomics, PICB (Sijia Wang);
- Nanjing Medical University School of Public Health (Zhibin Hu*)
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences (Xu Lin*);
- School of Public Health, Huazhong University of Science and Technology (Tangchun Wu*);
- Anhui Medical University (Liangdan Sun; Xuejun Zhang*);
- The First Affiliated Hospital of Xi'an Jiaotong University (Feng Zhu; Xiancang Ma*);
- Chinese Academy of Medical Sciences and Peking Union Medical College, (Chen Wu*)
- Massachusetts General Hospital, United States of America; (Hailiang Huang*)
- Broad Institute of Harvard and MIT, United States of America (Chia-Yen Chen);
- Department of Medicine, Harvard Medical School, United States of America;
- Human Genetics 2, Genome Institute of Singapore, Singapore;(Jianjun Liu*)
- Research Division, Institute of Mental Health, Singapore (Max Lam);
- Icahn School of Medicine at Mount Sinai (René S. Kahn*);
- Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, 200233, People’s Republic of China; (Chen Wu*)
- Chinese National Human Genome Center at Shanghai (Wei Huang*; Haifeng Wang);
Group: Population Genomics Group
Institution: CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS)
Address: 320 Yue Yang Road, Shanghai, China 200031
Email: pggadmin@picb.ac.cn
