Description
With the increasing severity of water environment pollution and global climate change, understanding the diversity, function, and distribution of aquatic microorganisms has become increasingly important. Marine microorganisms are critical for the marine ecosystem and have significant effect on the health of marine environment. They also can directly drive the geochemical cycling therefore influence the Earth climate and atmosphere. Establishing a domestically user-friendly and high-quality hydrosphere microbiome platform can fill the research gap in this field and provide an easy path for researchers worldwide to share and analysis the microbiome data. Our first platform is focusing on oceanic database which can provide data, service and support for the relevant researches.
Contact information
Yinzhao Wang E-mail: wyz@sjtu.edu.cn
Liuyang Li E-mail: liuyangli@sjtu.edu.cn
Yaoxun Hu E-mail: 2452177401@qq.com
Citing MASH-Ocean
If you use MASH-Ocean in your work, please consider citing its manuscript:
Mash-Ocean 1.0: Interactive Platform for Investigating Microbial Diversity, Function,
and Biogeography with Marine Metagenomic Data. Yinzhao Wang#, Liuyang Li#, Qiang Li#,
Yaoxun Hu, Wenjie Li, Zhile Wu, Hungchia Huang, Zhenbo Lv, Wan Liu, Ruifang Cao, Guoping
Zhao*, Fengping Wang*, Guoqing Zhang*. — iMeta 3, no. 3 (2024): e201.
https://doi.org/10.1002/imt2.201.
Version update
Software information
Software | Version | Description |
---|---|---|
kraken2 | 2.1.2 | An ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences |
bracken | 2.6.1 | A highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample |
krona | 2.8.1 | Visualization tool of relative abundances and confidences of metagenomic classfications |
gtdbtk | 2.3.2 | Taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy |
eggnog-mapper | 2.1.8 | Fast functional annotation tool based on precomputed Orthologous Groups and phylogenies |
FastSpar | 1.0.0 | FastSpar is a C++ implementation of the SparCC algorithm for rapid and scalable correlation estimation of compositional data |
R | 4.1.0 | A programming language for for statistical computing and graphics |
Annotation database information
Database | Version | Description |
---|---|---|
GTDB | Release 214 | The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for 402,709 bacterial and archaeal genomes from domain to genus. |
eggNOG | version 5.0 | eggNOG 5.0 is a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses |
Kraken 2 and Bracken indexes | k2_pluspf_16gb_20220908 | The indexes for Kraken2 and Bracken using Refeq archaea, bacteria, viral, plasmid, human, UniVec_Core, protozoa and fungi with DB capped at 16 GB |
Python package/script information
Python Package | Version | Description |
---|---|---|
Python | 3.8.5 | A programming language for for statistical computing and graphics |
gtdb_to_ncbi_majority_vote.py | N/A | A tool to transform GTDB taxonomy to NCBI taxonomy (https://github.com/Ecogenomics/GTDBTk/blob/master/scripts/gtdb_to_ncbi_majority_vote.py) |
iDIRECT | N/A | Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity (https://github.com/nxiao6gt/iDIRECT/) |
Co-occurrence network information
Network Level | Permutations for FastSpar | Interaction strength cutoff for FastSpar | P value cutoff for FastSpar | Interaction strength cutoff for iDIRECT |
---|---|---|---|---|
Phylum | permutations=1000 | |r| > 0.1 | P < 0.01 | |r| > 0.1 |
Class | permutations=1000 | |r| > 0.1 | P < 0.01 | |r| > 0.1 |
Order | permutations=1000 | |r| > 0.1 | P < 0.01 | |r| > 0.1 |
Family | permutations=1000 | |r| > 0.2 | P < 0.01 | |r| > 0.1 |
Genus | permutations=1000 | |r| > 0.4 | P < 0.01 | |r| > 0.1 |
Species | permutations=1000 | |r| > 0.4 | P < 0.01 | |r| > 0.1 |
Sample collection information
Metagenomic data of MASH was collected from NCBI in September 2020. We used 73 keywords that encompass different types of biomes. Lables with "enriched", "metatranscription", "amplicon" and "DOE Joint Genome Institute (JGI)" and files with a very small size (below 200MB), were excluded to avoid data bias and authorization problems. In total, we obtained 2,147 samples from different environments for MASH database.
Video tutorial
Website visits
The following shows the comprehensive visit status of the website. If you want to see more information, please click the graph.