I-SVVS

Integrative Stochastic Variational Variable Selection
for Multi-Omics Microbiome Data Analysis

See I-SVVS in Action

Learn about our groundbreaking research and I-SVVS package introduction

Research Overview

Comprehensive introduction to I-SVVS methodology and publication

Key Innovation

Understanding the breakthrough in multi-omics microbiome analysis

Package Introduction

Overview of the I-SVVS software package and its capabilities

Novel Features

Multi-Omics Integration

Hierarchical Dirichlet Process
  • DMM: Infinite Dirichlet Multinomial Mixture for microbiome data
  • GMM: Infinite Gaussian Mixture for metabolome data
  • HDP: Shared cluster modeling across data types

Stochastic Variational Inference

Advanced Optimization
  • SVI: Scalable approximate posterior inference
  • Natural Gradients: Faster convergence than standard methods
  • Mini-batch: Memory-efficient processing

Intelligent Variable Selection

Indicator Variables
  • φ Variables: Bernoulli indicators for feature importance
  • Sparse Learning: Automatic noise filtering
  • Core Set: Minimal representative feature identification

Infinite Mixture Models

Stick-Breaking Process
  • Auto K: Automatic cluster number estimation
  • Truncation: Variational stick-breaking approximation
  • DP Prior: Non-parametric cluster allocation

Computational Efficiency

Algorithmic Innovation
  • O(log n): Sublinear complexity for large datasets
  • Taylor Expansion: Analytical tractability
  • Vectorization: Optimized NumPy/SciPy implementation

Empirical Validation

Rigorous Testing
  • ARI Scores: Superior clustering accuracy metrics
  • Cross-Domain: Agricultural, clinical, disease research
  • Benchmarking: Outperforms iClusterPlus, Clusternomics

How I-SVVS Works

Algorithmic Architecture

I-SVVS implements a hierarchical Bayesian framework combining infinite mixture models with stochastic variational inference for scalable multi-omics analysis.

Data Layer

🧬 Microbiome
Xmicro ∈ ℕN×S
Count data (OTUs)
⚗️ Metabolome
Xmetab ∈ ℝN×M
Continuous data
Hierarchical Integration

Model Layer

📊 Infinite DMM
π ~ GEM(ν)
αk ~ Dir(ζ)
Dirichlet-Multinomial
🌐 HDP Bridge
G0 ~ DP(γ, H)
Gj ~ DP(α, G0)
Shared Clusters
📈 Infinite GMM
μk, Σk ~ NIW
β ~ Dir(η)
Gaussian Mixture
Variational Optimization

Inference Layer

Stochastic VI
𝓛[q] = 𝔼[log p] - 𝔼[log q]
ELBO Maximization
Cluster Assignment

Output Layer

🎯 Clusters
Zi = argmaxk rik
Assignments
Selected Features
S* = {j: 𝔼[φj] > τ}
Biomarkers
1

Data Preprocessing & Initialization

O(N·S)

Input Validation

  • validate_input(X_micro, X_metab)
  • Compositional data normalization
  • Missing value imputation strategies

Parameter Initialization

  • K_max = 10 (truncation level)
  • ν = 0.1 (DP concentration)
  • α, β ~ Dir(ζ, η)
# Initialize variational parameters
r_ik = softmax(randn(N, K))
f_ij = sigmoid(randn(N, S))
λ_star, ι_star = init_globals()
2

Stochastic Variational Inference

O(log N)

ELBO Optimization

  • 𝓛[q] = 𝔼[log p(X,Z)] - 𝔼[log q(Z)]
  • Natural gradient ascent
  • Mini-batch subsampling

Variable Updates

  • Local: r_ik, f_ij
  • Global: λ*, ι*, ξ*
  • Step size: ρ_t = (τ + t)^(-κ)
Core Update Equations:
r_ik ∝ exp(𝔼[log π_k] + 𝔼[log p(x_i|θ_k)])
f_ij ∝ σ(𝔼[log ε_j] - 𝔼[log(1-ε_j)])
3

Feature Selection & Model Selection

O(S·K)

Indicator Variables

  • φ_ij ~ Bernoulli(ε_j)
  • Automatic relevance determination
  • Sparsity-inducing priors

Cluster Estimation

  • Stick-breaking weights: π_k
  • Truncation optimization
  • Convergence diagnostics
Selection Criterion:
Select top-K features where:
𝔼[φ_j] = Σ_i f_ij / N > threshold
4

Results & Post-processing

O(N·K)

Cluster Assignment

  • z_i = argmax_k r_ik
  • Uncertainty quantification
  • Silhouette analysis

Output Generation

  • Selected feature rankings
  • Cluster visualization (t-SNE/UMAP)
  • Performance metrics (ARI, NMI)
Output Formats:
clusters.csv - Cluster assignments
features.csv - Selected biomarkers
metrics.json - Performance stats

🎥 Learn about I-SVVS Research

Watch our comprehensive introduction to the methodology and package!

Watch Introduction Video

Performance Benchmarks

50x
Faster Computation
2.18
Hours vs 2.35 Days
16,944
Species Analyzed
377
Soybean Samples

Published Research

I-SVVS: Integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data

Authors: Tung Dang, Yushiro Fuji, Kie Kumaishi, Erika Usui, Shungo Kobori, Takumi Sato, Yusuke Toda, Kengo Sakurai, Yuji Yamasaki, Hisashi Tsujimoto, Masami Yokota Hirai, Yasunori Ichihashi, Hiroyoshi Iwata

Journal: Briefings in Bioinformatics, 2025, 26(3) | Impact Factor: 7.9

DOI: 10.1093/bib/bbaf132

Abstract: High-dimensional multi-omics microbiome data plays an important role in elucidating microbial communities' interactions with their hosts and environment. This study proposes I-SVVS, a novel framework that addresses specific Bayesian mixture models for integrated analysis of microbiome and metabolome data. The I-SVVS approach uses an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data, expected to reduce computational time and improve clustering accuracy. Three datasets from soybean, mice, and humans demonstrated that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets.

Key Achievement: 50x faster computation - 2.18 hours vs 2.35 days (Clusternomics) for soybean dataset analysis

Stochastic variational variable selection for high-dimensional microbiome data

Authors: Tung Dang, Kie Kumaishi, Erika Usui, Shungo Kobori, Takumi Sato, Yusuke Toda, Kengo Sakurai, Yuji Yamasaki, Hisashi Tsujimoto, Masami Yokota Hirai, Yasunori Ichihashi, Hiroyoshi Iwata

Journal: Microbiome, 2022 | Impact Factor: 16.837

DOI: 10.1186/s40168-022-01439-0

Focus: Foundation methodology for single-omics microbiome analysis using SVVS algorithm. This work established the core algorithmic framework that enabled the development of I-SVVS for multi-omics integration.

Breakthrough: First method capable of analyzing 50,000+ microbial species with 1,000+ samples

Research Impact

24.7
Combined Impact Factor
Top-Tier
Journal Publications
Multi-Institution
Collaboration

Get Started

# Clone the repository
git clone https://github.com/tungtokyo1108/I-SVVS.git
cd I-SVVS

# Install dependencies
pip install -r requirements.txt

# Run integrated analysis
python src/Integrated_SVVS.py --microbiome data/datasetA_microbiome.csv \
                              --metabolome data/datasetA_metabolome.csv \
                              --output results/

# Run microbiome-only analysis
python src/DMM_SVVS.py --data data/datasetA_microbiome.csv

# Run metabolome-only analysis  
python src/GMM_SVVS.py --data data/datasetA_metabolome.csv

Available Datasets

Dataset A (Included)

Soybean root microbiome and metabolome data with drought stress analysis

Dataset B

Human gut microbiome data - follow instructions from haddad_osa GitHub repo

Dataset C

Large-scale human microbiome data - follow MicrobiomeHD GitHub repo instructions

View on GitHub