eLMSG (an eLibrary of Microbial Systematics and Genomics) is a web database, which will integrate microbial systematics, genomics, and phenomics (polyphasic taxonomy related phenotypes). The taxonomic system of eLMSG is composed of all validly and some effectively published taxa (from phylum to genus). For species rank, the current version of eLMSG includes merely the type species of all genera. All other validly published species will be collected gradually in updated version. For each taxon, the Latin name, taxon ID (NCBI taxonomy), etymology, rank, lineage, dates of effective and/or valid publications, taxonomic description, type (type strain list will be present for species rank), and reference (including all references occurred in whole history of the taxon, like reclassification and emendation), will be presented. Besides all above-mentioned information, the species rank taxon contains other data including 16S rRNA gene (as taxonomic marker) and/or genomic sequences (from NCBI Assembly and JGI IMG databases). All public available genomic data of each type species, including type and non-type strains, were collected, and re-annotated by using same pipeline. All non-type strains were identified, by using average nucleotide identity (ANI) or 16S rRNA comparison methods. Furthermore, the pan-genomic data for species rank were computed based on gene family analysis. Finally, for all type species, the taxonomic phenotypic data were extracted from original publications and/or Bergey's Manual of Systematics of Archaea and Bacteria. All phenotypic data (about 60 traits) can be sorted as four categories: morphology, physiology, biochemistry and enzymology. The phenotypic data were organized into eLMSG as searchable and analyzable data records. The eLMSG is a comprehensive web platform, which will contribute to microbial systematics, comparative genomics, and evolutionary biology, and also an ideal reference database for microbiome research.
"The eLMSG system is a part of The National Omics Data Encyclopedia (NODE), which is a new generation of bio-omic data interchange and management platform. "
The NODE consists of three subsystems: data sharing, data submission and data management, satisfying the needs of public users, data owners, information infrastructure use and management. It has established a stratified system of data management (including project, experiment, sample, run, data, and analysis result), and a graded mechanism for data sharing (i.e. private, limited access, and open access). The NODE stores raw sequence data from "Next-Generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. NODE is a primary archive of high-throughput sequencing data and is part of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences. Data submitted to any of the three organizations are shared among them.