xMetaVar Logo

Mobile Navigation Menu

A list of links to navigate to different pages of the website.

xMetaVar Tutorial

A step-by-step guide to analyzing your metagenomic variant data from start to finish.

Introduction

Welcome to the xMetaVar tutorial. This guide walks you through the full workflow — from uploading your raw data to exploring interactive visualizations. You’ll learn the required file formats, how to set analysis parameters, and how to interpret results to uncover meaningful insights.

Variant Calling

In this section, you'll perform variant calling — detecting genetic variations from your uploaded data. We’ll guide you through uploading files, selecting tools and parameters, running the analysis, and reviewing the results.

Step 1: Upload Data

Begin by providing your sequencing data and setting up initial quality control.

Quick Start Guide

1

Navigate to Variant Calling and select the sequencing type (e.g., Paired-End).

2

Upload your raw sequencing data (FASTQ files).

3

Enter the corresponding Sample IDs in the text area, one per line.

4

Choose whether to Skip QC. If not skipped, configure Trimmomatic parameters.

vc-upload

Detailed Information & Parameters

Step 2: Configure Tools

Configure the analysis pipeline by selecting the desired variant types and their corresponding detection tools.

Quick Start Guide

1

Select Variant Types

Begin by choosing the types of variants you wish to identify. You can select multiple options.

2

Configure SNP Calling

For SNP identification, select one of the available tools.

3

Configure INDEL Calling

QuickVariants is used for INDEL detection. This tool is selected by default.

4

Configure Structural Variant (SV) Calling

Select the specific types of Structural Variants to call. Each type is identified by a specialized tool.

vc-configure

Step 3: Monitor and Review Results

After submitting a job, you will be automatically redirected to the results page to monitor its progress in real-time.

On the Results Page

Automatic Redirect

Upon successful job submission, the page will automatically navigate to a unique results page for your analysis.

Real-time Status Monitoring

Track the progress of your job as it moves through the pipeline. The status will update automatically.

Queued Running Succeeded Failed

Review Job Details

You can review the specific parameters and configuration used for this analysis run at any time on the results page.

vc-results

Step 4: Download Results

Once the analysis is complete, you can browse and download your result files.

How to Download

1

Browse and Select

Use the interactive file tree to navigate through the output directories. Click on any individual file to download it directly.

2

Download All as ZIP

To get all result files at once, click the "Download All (.zip)" button to download a compressed archive of the entire results folder.

vc-download

Analysis & Visualization

This section focuses on exploring and interpreting your results. You’ll learn how to access summary statistics, navigate the genome browser, and use visualization tools to reveal patterns, correlations, and other biological insights.

Step 1: Upload Data

Upload your pre-computed variant data to generate a suite of interactive visualizations.

Supported Variant Types

This platform supports visualization for various variant types derived from your analysis.

Single Nucleotide Polymorphism
SNP
Insertion / Deletion
INDEL
Structural Variation
SV

General Workflow (INDEL example)

The process is similar for all variant types. The screenshot on the right shows the INDEL visualization upload as an example.

1

Select Variant Type

First, choose the variant type (e.g., INDEL) you wish to visualize.

2

Upload Required Files

Upload the specific files required for the selected variant type (see details below).

3

Configure & Submit

Select a primary grouping column from your metadata, then click the "Submit" button.

vis-upload

Required File Formats by Variant Type

All required files for visualization can be generated from the Variant Calling pipeline. Alternatively, you may upload your own data as long as it strictly adheres to the formats detailed below.

Step 2: Basic Statistics Overview

An interactive overview of your dataset, highlighting key metrics and general distribution patterns of the detected variants (e.g., INDELs).

1

Key Summary Metrics

This table provides a basic statistical summary, including totals for samples and unique variant sites, as well as counts for different variant types (e.g., insertions vs. deletions).

vis-basic-1
2

Sample & Type Distribution

The PCoA Plot visualizes sample relationships based on their overall variant profiles. The Pie Chart shows the global proportion of primary variant types (e.g., insertion vs. deletion sites).

vis-basic-2
3

Variant Counts by Species

This stacked bar chart displays the number of unique variant sites identified within each species&apo; genome, often stacked by sub-type (e.g., insertions and deletions).

vis-basic-3
4

Variant Counts by Sample

This chart displays the variant count for each individual sample, typically colored by group and stacked by sub-type, sorted to reveal patterns.

vis-basic-4

Step 3: Interactive Genome Browser

Explore the genomic context of your detected variants using the integrated JBrowse 2 viewer.

How to Navigate and Explore

1

Switch or Add Genomes

The browser initially displays a default genome. To explore a different species:
  • To add a new view: Navigate to File > Add > Linear genome view in the top-left menu.
  • To close the current view: Click the '✕' icon in the top-right corner of the genome view tab.
A dialog will then appear, allowing you to select any genome from the reference database.
2

Interpret the Tracks

Each genome view consists of several horizontal tracks:
  • Reference Sequence: The foundational DNA sequence.
  • Gene Annotations: Shows the locations of known genes and their features.
  • Your Variant Sites: A dedicated track displaying the positions of your detected variants (e.g., SNPs, INDELs).
3

Interact and Get Details

Use your mouse to zoom and pan across the genome. Click on any feature—be it a gene or a variant—to open a detailed information panel.

vis-jbrowse2
vis-jbrowse1

Step 4: Association Analysis Heatmap

Visualize the association between variant sites and sample phenotypes in an interactive heatmap.

Interpreting the Heatmap

Visualizing Associations

The heatmap displays the top variant sites (e.g., top 50 or 100 by p-value) on one axis and the selected phenotypes on the other. Each cell represents the association strength between a variant and a phenotype.

Color Interpretation

The color of each cell represents the effect size of the association. Typically, a divergent color scale (e.g., red-white-blue) is used, where red indicates a positive correlation, blue indicates a negative correlation, and white indicates little to no association.

vis-heatmap

Step 5: Biomarker Discovery

Identify and evaluate key genetic markers associated with host traits using microSLAM-based association modeling and machine learning (Random Forest) approaches.

Note: This analysis module is available only within the Integrative Analysis workflow.

Step 1: Running microSLAM Association Analysis

Parameter Selection

Select an outcome variable (e.g., disease status) and optional covariates to control confounding effects. microSLAM applies a mixed-effects model to test the association between gene presence/absence and host traits.

Note: Ensure variables contain no special characters or missing values. The outcome must be binary (e.g., Case/Control).

microSLAM parameter selection

Step 2: Visualizing Association Results

Volcano Plot Interpretation

The volcano plot summarizes variant-trait associations across strains. Each point represents a variant site, plotted by effect size (x-axis) and –log₁₀(p-value) (y-axis). Sites with strong positive or negative associations appear toward the edges, highlighting significant microbial biomarkers.

microSLAM volcano plot

Step 3: Reviewing Top Association Results

Result Table Overview

A ranked table lists the top 1,000 variant-level associations, including Effect Size (Beta), P-values, and Std. Error. These results enable quick prioritization of functional features for downstream evaluation.

association results table

Step 4: Biomarker Modeling & Interpretation

Feature Selection

Select the candidate feature set for Boruta-based biomarker discovery. You can restrict the analysis to statistically significant sites (Significant Candidates, P < 0.05; auto-scaled between 20–5,000 features) or broaden the search to the Top 5,000 ranked sites for more extensive screening.

Model Evaluation (ROC Curve)

Assess the discriminative power of selected features using a Random Forest classifier. The ROC curve shows the trade-off between sensitivity and specificity, with the AUC indicating overall model accuracy.

Feature Importance (SHAP Analysis)

SHAP (SHapley Additive exPlanations) analysis quantifies each feature’s contribution to prediction outcomes, offering interpretability and biological insight into the underlying mechanisms of disease association.

ROC curve performance
SHAP feature importance