MPA

Overview of MPA
1. Overview
MPA (Mycobacteriaceae Phenome Atlas, https://www.biosino.org/mpa/) is a Mycobacteriaceae phenome database, which integrates the phenomic data of Mycobacteriaceae strains by literature mining, third-party database integration, and bioinformatics annotation. The phenotypes of Mycobacteriaceae are inferred from available phenomic data, and 82 microbial phenotypic traits were recruited as data elements of the microbial phenome, containing 5 categories and 20 subcategories of polyphasic phenotypes, and 3 categories and 8 subcategories of functional phenotypes, all of which are complementary to the existing data standards of microbial phenotypes. The phenotypes were searchable and comparable from the website of MPA. A network analysis of MPA topological data revealed the co-evolution between Mycobacterium tuberculosis and some important phenotypes, such as virulence factors, and also uncovered potential pathogenicity-associated phenotypes. The application of MPA may provide novel insights into the pathogenicity mechanism of Mycobacteriaceae.
2. Summary of data elements in MPA.
The sunburst chart shows the three levels of data elements in MPA. Levels I and II indicate the name of categories, and Level III refers to the name of 28 subcategories and the number of embodied phenotypes. Level I includes “Polyphasic phenotypes” and “Functional phenotypes.” Level II contains “Ecology,” “Morphology,” “Physiology,” “Biochemistry,” “Enzymology,” “Gene-related phenotypes,” “Protein-related phenotypes,” and “Compound-related phenotypes.” Level III consists of “Geography,” "Biome,” "Sampling,” “Enrichment,” “Cell,” "Colony morphology,” "Hemolysis,” “C/N source,” "Metabolite production,” "Attributes,” "Temperature,” "pH,” "Halophily,” “Tolerance,” "Bile-susceptible,” "Antibiotica,” "Fatty acids,””Pathogenicity,” "Biochemistry,” "Enzymology,” "GO terms,” "AMR,” "Virulence factors,” "Amino acid mutations,” "Orthologous groups,” "KEGG metabolites,” "MetaCyc metabolites,” and "smBGCs." User can jump to browse page of strains list with corresponding phenotypes by clicking each level of the sunburst chart.

Browse Database
Three filter methods, including All, Culture-dependent Mycobacteriaceae, and Culture-independent Mycobacteriaceae, are provided to filter the strain of interests in Browse page. User can view the detail of each strain by clicking strain name.

Search Database
Simple search and advanced search are both provided in the MPA server. Simple search supports fuzzy queries by inputting species name, genome id, or compound name, while advanced search offers large-scale sophisticated queries, where maximum 23 terms can be combined searched.
"Advanced Search" page allows user to search for phenotypes of maximum 23 fields by using “AND”. Phenotypic traits, such as “Ecosystem Category,” “Spore Formation,” and “Hemolysis Ability” have drop-down menu, user can search for the phenotype of interest. The rest phenotypes, such as “Country/Region,” “Cell Shape,” and “Enzyme” support fuzzy search.

Strain Detail View
There are seven sections in the detail page of strain, including Overview, Polyphasic phenotypes, Gene-related phenotypes, Protein-related phenotypes, Compound-related phenotypes, Gene, and References.
Information, including type strain, assembly accession, assembly level, NCBI lineage, GTDB lineage, statistics of polyphasic phenotypes, statistics of functional phenotypes and so on, is provided in the overview section. User can go to the related strain in BacDive or PATRIC by click the database name in cross links.
MPA shows the statistics of both polyphsic phenotypes and functional phenotypes in overview section. The sunburst chart for the statistics of polyphsic phenotypes shows proportion of phenotypes of each strain. If user put the mouse above the phenotype of interest, the count of phenotype in this part will be shown.
The same with polyphasic phenotypes, the count of phenotypes will display if user put mouse above the bar. For the statistics of functional phenotypes, user can select or unslect the phenotype by clicking each phenotype in the legend. For example, we unselect GO Terms, the GO Tearms turn into gray in the legend, and the count of GO Terms will not display in the bar chart. In addition, user can choose the specific count range of phenotypes.
Phenotype categories, such as Ecology, Morphology, Phenotype categories, such as Ecology, Morphology, Physiology, Biochemistry, Enzymology, are included in the polyphasic phenotypes section of strain. The DOI of the literature will provided if the phenotype is curated from literature and user can go to the page of original literature by click the DOI.
Functional phenotypes includes gene-related phenotypes section, protein-related phenotypes section, and compound-related phenotypes section. Almost every key phenotype is hyperlinked for more detailed information, and by clicking on it user will get the desired information quickly.
Gene-related phenotypes section includes Gene ontology (GO) annotations and Antimicrobial resistance (AMR). User can go to the specific category of three category of GO term by click the legend. For example, we choose biological process, and 11 subcategories of biological process will display by tree chart. The number of GO term in each subcategory will be show if user click the name of the subcategory and the list of GO terms that belongs to this subcategory will be shown in the table below. User can go to related GO term in Gene Ontology database and related gene in NCBI Gene database by clicking the name of GO term and gene symbol, respectively.
The tab chart displays all AMR mechanism of the strain. User can view the drug class, resistant gene, and match level by clicking each tab. User can go to the drug class page in CARD database and gene page in NCBI Gene database by clicking drug class name and gene symbol, respectively.
Protein-related phenotypes section contains virulence factors, amino acid mutations, and orthologous groups. The tab chart displays all virulence factor class of the strain. User can view the virulence factor name, related genes, related functions and characteristics of all virulence factors in this class by clicking each tab. User can go to the virulence factor page in VFDB database or Victor database by clicking virulence factor name. Alternatively, for those virulence factors that have not been curated, user can go to related literatures in PubMed. In addition, user can go to the virulence factor page in VFDB database or Victor database by clicking related gene of virulence factor.
The heatmap chart displays the amino acid mutation and count of this mutation in the strain. User can show specific count of mutation in the strain by clicking the legend of Amino acid mutations or Number of Amino acid mutations>1. By clicking each spot in the heatmap, user can see the type of amino acid mutation and related count in the strain. In addition, the detail information of this mutation will display in the table below. User can go to the protein page in Uniprot database and gene page in NCBI Gene database by clicking mutation name and gene symbol, respectively.
The heatmap chart displays the classes of orthologous groups and protein count of this group in the strain. User can show specific protein count range of orthologous group in the strain by clicking the legend. By clicking each spot in the heatmap, user can see the type of orthologous group and related protein count in the strain. In addition, the detail information of this orthologous group will display in the table below. User can go to the related page in Uniprot database, Pfam database, and Tigrfam database by clicking protein name, Pfam domain name, and Tigrfam domain name, respectively.
Compound-related phenotypes section includes KEGG metabolites, MetaCyc metabolites, and secondary metabolite biosynthetic gene clusters (smBGCs). KEGG Metabolites and MetaCyc Metabolites both use heatmap chart to display compound and related number of pathway. User can show different range of pathway count in the strain by clicking the legend. By clicking each spot in the heatmap, user can see the name of metabolite and related pathway count in the strain. In addition, the detail information of this metabolite will display in the table below. User can go to the related metabolite page and pathway page of each database by clicking metabolite name and pathway name, respectively.
The tab chart displays all smBGC classes of the strain. User can view all smBGCs in this class by clicking each tab. User can go to the smBGCs page in AntiSMASH database by clicking smBGC name.
MPA provides the list of genes existed in the strain in the Gene section. User can get gene-related information including gene symbol, orientation, and description. User can use simple search which supports the fuzzy queries of gene symbol to search for the gene of interest. In addition, user can reach the detail of the corresponding gene in NCBI Gene database by clicking the gene symbol.
MPA provides the list of literatures for the curated phenotypes of the strain in the references section. User can get literature-related information including author, title, journal, and publication time. User can reach the page of original literature by clicking the literature information.

Phenotype Comparison
Phenotype Comparison page supports the comparison of phenotypes with a maximum of four strains in one table. MPA provides the drop-down menu for the name of genus, species, subspecies/variant, and strain. According to drop-down menu, user can choose strains of interests, and compare the differences of polyphasic phenotypes and functional phenotypes among strains.
41 phenotypes from either polyphasic phenotypes or functional phenotypes are displayed one by one. The resulting display supports "Hide empty items", "Hide same items", and "Show only selected items." If user select "Hide empty items", phenotypes that don’t exit in any of the compared strains are undisplayed. If user select "Hide same items", phenotypes that are the same among all compared strains are undisplayed. If user select "Show only selected items", user can expand and review the phenotype of interests.

Co-evolution
The co-evolution analysis results of Mycobacterium tuberculosis with virulence phenotypes are provide in the co-evolution page. The individual link and bulk download for TDA network enrichment patterns of M. tuberculosis and 36 VFs are also supplied.

Download
Download page provides the download of genomes, protein coding sequences, amino acid sequences, and supplementary materials of Mycobacteriaceae strains, and variant information of Mycobacterium tuberculosis in MPA. User can download the data of interest by clicking the download button.