xMetaVar Tutorial

A step-by-step guide to analyzing your metagenomic variant data from start to finish.

Introduction

Welcome to the xMetaVar tutorial. This guide walks you through the full workflow — from uploading your raw data to exploring interactive visualizations. You’ll learn the required file formats, how to set analysis parameters, and how to interpret results to uncover meaningful insights.

Variant Calling

In this section, you'll perform variant calling — detecting genetic variations from your uploaded data. We’ll guide you through uploading files, selecting tools and parameters, running the analysis, and reviewing the results.

Prerequisites

Before you begin, please prepare your genomic sequencing files in FASTQ format (.fastq.gz, .fq.gz, .fastq, or .fq).

Step 1: Upload Data

Begin by providing your sequencing data and setting up initial quality control.

Quick Start Guide

Navigate to Variant Calling and select the sequencing type (e.g., Paired-End).

Upload your raw sequencing data (FASTQ files).

Enter the corresponding Sample IDs in the text area, one per line.

Choose whether to Skip QC. If not skipped, configure Trimmomatic parameters.

Detailed Information & Parameters

Step 2: Configure Tools

Configure the analysis pipeline by selecting the desired variant types and their corresponding detection tools.

Quick Start Guide

Select Variant Types

Begin by choosing the types of variants you wish to identify. You can select multiple options.

Configure SNP Calling

For SNP identification, select one of the available tools.

Configure INDEL Calling

QuickVariants is used for INDEL detection. This tool is selected by default.

Configure Structural Variant (SV) Calling

Select the specific types of Structural Variants to call. Each type is identified by a specialized tool.

About the Analysis Tools

To learn more about the specific variant calling tools used in this pipeline, such as MIDAS, GT-Pro, SGVFinder, and others, please visit the About Page for detailed descriptions and methodologies.

Step 3: Monitor and Review Results

After submitting a job, you will be automatically redirected to the results page to monitor its progress in real-time.

On the Results Page

Automatic Redirect

Upon successful job submission, the page will automatically navigate to a unique results page for your analysis.

Real-time Status Monitoring

Track the progress of your job as it moves through the pipeline. The status will update automatically.

Queued Running Succeeded Failed

Review Job Details

You can review the specific parameters and configuration used for this analysis run at any time on the results page.

Step 4: Download Results

Once the analysis is complete, you can browse and download your result files.

How to Download

Browse and Select

Use the interactive file tree to navigate through the output directories. Click on any individual file to download it directly.

Download All as ZIP

To get all result files at once, click the "Download All (.zip)" button to download a compressed archive of the entire results folder.

Analysis & Visualization

This section focuses on exploring and interpreting your results. You’ll learn how to access summary statistics, navigate the genome browser, and use visualization tools to reveal patterns, correlations, and other biological insights.

Step 1: Upload Data

Upload your pre-computed variant data to generate a suite of interactive visualizations.

Supported Variant Types

This platform supports visualization for various variant types derived from your analysis.

Single Nucleotide Polymorphism

SNP

Insertion / Deletion

INDEL

Structural Variation

General Workflow (INDEL example)

The process is similar for all variant types. The screenshot on the right shows the INDEL visualization upload as an example.

Select Variant Type

First, choose the variant type (e.g., INDEL) you wish to visualize.

Upload Required Files

Upload the specific files required for the selected variant type (see details below).

Configure & Submit

Select a primary grouping column from your metadata, then click the "Submit" button.

Required File Formats by Variant Type

All required files for visualization can be generated from the Variant Calling pipeline. Alternatively, you may upload your own data as long as it strictly adheres to the formats detailed below.

Step 2: Basic Statistics Overview

An interactive overview of your dataset, highlighting key metrics and general distribution patterns of the detected variants (e.g., INDELs).

Key Summary Metrics

This table provides a basic statistical summary, including totals for samples and unique variant sites, as well as counts for different variant types (e.g., insertions vs. deletions).

Sample & Type Distribution

The PCoA Plot visualizes sample relationships based on their overall variant profiles. The Pie Chart shows the global proportion of primary variant types (e.g., insertion vs. deletion sites).

Variant Counts by Species

This stacked bar chart displays the number of unique variant sites identified within each species&apo; genome, often stacked by sub-type (e.g., insertions and deletions).

Variant Counts by Sample

This chart displays the variant count for each individual sample, typically colored by group and stacked by sub-type, sorted to reveal patterns.

Step 3: Interactive Genome Browser

Explore the genomic context of your detected variants using the integrated JBrowse 2 viewer.

How to Navigate and Explore

Switch or Add Genomes

The browser initially displays a default genome. To explore a different species:

To add a new view: Navigate to File > Add > Linear genome view in the top-left menu.
To close the current view: Click the '✕' icon in the top-right corner of the genome view tab.

A dialog will then appear, allowing you to select any genome from the reference database.

Interpret the Tracks

Each genome view consists of several horizontal tracks:

Reference Sequence: The foundational DNA sequence.
Gene Annotations: Shows the locations of known genes and their features.
Your Variant Sites: A dedicated track displaying the positions of your detected variants (e.g., SNPs, INDELs).

Interact and Get Details

Use your mouse to zoom and pan across the genome. Click on any feature—be it a gene or a variant—to open a detailed information panel.

Step 4: Association Analysis Heatmap

Visualize the association between variant sites and sample phenotypes in an interactive heatmap.

Interpreting the Heatmap

Visualizing Associations

The heatmap displays the top variant sites (e.g., top 50 or 100 by p-value) on one axis and the selected phenotypes on the other. Each cell represents the association strength between a variant and a phenotype.

Color Interpretation

The color of each cell represents the effect size of the association. Typically, a divergent color scale (e.g., red-white-blue) is used, where red indicates a positive correlation, blue indicates a negative correlation, and white indicates little to no association.

Step 5: Biomarker Discovery

Identify and evaluate key genetic markers associated with host traits using microSLAM-based association modeling and machine learning (Random Forest) approaches.

Note: This analysis module is available only within the Integrative Analysis workflow.

Step 1: Running microSLAM Association Analysis

Parameter Selection

Select an outcome variable (e.g., disease status) and optional covariates to control confounding effects. microSLAM applies a mixed-effects model to test the association between gene presence/absence and host traits.

Note: Ensure variables contain no special characters or missing values. The outcome must be binary (e.g., Case/Control).

Step 2: Visualizing Association Results

Volcano Plot Interpretation

The volcano plot summarizes variant-trait associations across strains. Each point represents a variant site, plotted by effect size (x-axis) and –log₁₀(p-value) (y-axis). Sites with strong positive or negative associations appear toward the edges, highlighting significant microbial biomarkers.

Step 3: Reviewing Top Association Results

Result Table Overview

A ranked table lists the top 1,000 variant-level associations, including Effect Size (Beta), P-values, and Std. Error. These results enable quick prioritization of functional features for downstream evaluation.

Step 4: Biomarker Modeling & Interpretation

Feature Selection

Select the candidate feature set for Boruta-based biomarker discovery. You can restrict the analysis to statistically significant sites (Significant Candidates, P < 0.05; auto-scaled between 20–5,000 features) or broaden the search to the Top 5,000 ranked sites for more extensive screening.

Model Evaluation (ROC Curve)

Assess the discriminative power of selected features using a Random Forest classifier. The ROC curve shows the trade-off between sensitivity and specificity, with the AUC indicating overall model accuracy.

Feature Importance (SHAP Analysis)

SHAP (SHapley Additive exPlanations) analysis quantifies each feature’s contribution to prediction outcomes, offering interpretability and biological insight into the underlying mechanisms of disease association.

xMetaVar Tutorial

Introduction

Variant Calling

Prerequisites

Step 1: Upload Data

Quick Start Guide

Detailed Information & Parameters

Required Inputs: Raw Data Files & Sample IDs

Quality Control (QC) Process

Step 2: Configure Tools

Quick Start Guide

Select Variant Types

Configure SNP Calling

Configure INDEL Calling

Configure Structural Variant (SV) Calling

About the Analysis Tools

Step 3: Monitor and Review Results

On the Results Page

Automatic Redirect

Real-time Status Monitoring

Review Job Details

Step 4: Download Results

How to Download

Browse and Select

Download All as ZIP

Analysis & Visualization

Step 1: Upload Data

Supported Variant Types

General Workflow (INDEL example)

Select Variant Type

Upload Required Files

Configure & Submit

Required File Formats by Variant Type

Sample Metadata File (Common for all types)

Single Nucleotide Polymorphism (SNP)

Insertion / Deletion (INDEL)

Structural Variants (dSV/vSV)

Structural Variants (CNV)

Structural Variants (Inversion)

Step 2: Basic Statistics Overview

Key Summary Metrics

Sample & Type Distribution

Variant Counts by Species

Variant Counts by Sample

Step 3: Interactive Genome Browser

How to Navigate and Explore

Switch or Add Genomes

Interpret the Tracks

Interact and Get Details

Step 4: Association Analysis Heatmap

Interpreting the Heatmap

Visualizing Associations

Color Interpretation

Step 5: Biomarker Discovery

Step 1: Running microSLAM Association Analysis

Parameter Selection

Step 2: Visualizing Association Results

Volcano Plot Interpretation

Step 3: Reviewing Top Association Results

Result Table Overview

Step 4: Biomarker Modeling & Interpretation

Feature Selection

Model Evaluation (ROC Curve)

Feature Importance (SHAP Analysis)