Description

With the increasing severity of water environment pollution and global climate change, understanding the diversity, function, and distribution of aquatic microorganisms has become increasingly important. Marine microorganisms are critical for the marine ecosystem and have significant effect on the health of marine environment. They also can directly drive the geochemical cycling therefore influence the Earth climate and atmosphere. Establishing a domestically user-friendly and high-quality hydrosphere microbiome platform can fill the research gap in this field and provide an easy path for researchers worldwide to share and analysis the microbiome data. Our first platform is focusing on oceanic database which can provide data, service and support for the relevant researches.

Contact information

Yinzhao Wang E-mail: wyz@sjtu.edu.cn

Liuyang Li E-mail: liuyangli@sjtu.edu.cn

Yaoxun Hu E-mail: 2452177401@qq.com

Citing MASH-Ocean

If you use MASH-Ocean in your work, please consider citing its manuscript:
Mash-Ocean 1.0: Interactive Platform for Investigating Microbial Diversity, Function, and Biogeography with Marine Metagenomic Data. Yinzhao Wang#, Liuyang Li#, Qiang Li#, Yaoxun Hu, Wenjie Li, Zhile Wu, Hungchia Huang, Zhenbo Lv, Wan Liu, Ruifang Cao, Guoping Zhao*, Fengping Wang*, Guoqing Zhang*. — iMeta 3, no. 3 (2024): e201. https://doi.org/10.1002/imt2.201.

Version update

Software information

Software Version Description
kraken2 2.1.2 An ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences
bracken 2.6.1 A highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample
krona 2.8.1 Visualization tool of relative abundances and confidences of metagenomic classfications
gtdbtk 2.3.2 Taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy
eggnog-mapper 2.1.8 Fast functional annotation tool based on precomputed Orthologous Groups and phylogenies
FastSpar 1.0.0 FastSpar is a C++ implementation of the SparCC algorithm for rapid and scalable correlation estimation of compositional data
R 4.1.0 A programming language for for statistical computing and graphics

Annotation database information

Database Version Description
GTDB Release 214 The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for 402,709 bacterial and archaeal genomes from domain to genus.
eggNOG version 5.0 eggNOG 5.0 is a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses
Kraken 2 and Bracken indexes k2_pluspf_16gb_20220908 The indexes for Kraken2 and Bracken using Refeq archaea, bacteria, viral, plasmid, human, UniVec_Core, protozoa and fungi with DB capped at 16 GB

Python package/script information

Python Package Version Description
Python 3.8.5 A programming language for for statistical computing and graphics
gtdb_to_ncbi_majority_vote.py N/A A tool to transform GTDB taxonomy to NCBI taxonomy (https://github.com/Ecogenomics/GTDBTk/blob/master/scripts/gtdb_to_ncbi_majority_vote.py)
iDIRECT N/A Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity (https://github.com/nxiao6gt/iDIRECT/)

Co-occurrence network information

Network Level Permutations for FastSpar Interaction strength cutoff for FastSpar P value cutoff for FastSpar Interaction strength cutoff for iDIRECT
Phylum permutations=1000 |r| > 0.1 P < 0.01 |r| > 0.1
Class permutations=1000 |r| > 0.1 P < 0.01 |r| > 0.1
Order permutations=1000 |r| > 0.1 P < 0.01 |r| > 0.1
Family permutations=1000 |r| > 0.2 P < 0.01 |r| > 0.1
Genus permutations=1000 |r| > 0.4 P < 0.01 |r| > 0.1
Species permutations=1000 |r| > 0.4 P < 0.01 |r| > 0.1

Sample collection information

Metagenomic data of MASH was collected from NCBI in September 2020. We used 73 keywords that encompass different types of biomes. Lables with "enriched", "metatranscription", "amplicon" and "DOE Joint Genome Institute (JGI)" and files with a very small size (below 200MB), were excluded to avoid data bias and authorization problems. In total, we obtained 2,147 samples from different environments for MASH database.

Video tutorial
Website visits

The following shows the comprehensive visit status of the website. If you want to see more information, please click the graph.

Useful Links