Benchmarks

Overview

Comprehensive benchmarks comparing Mycelia's various approaches on different datasets.

HPC CI Status

HPC Coverage HPC Tests HPC Benchmarks

The lightweight GitHub Actions CI on master remains the default merge gate. Extended HPC validation is published separately from ci/hpc/run_hpc_ci.sh via bash ci/hpc/publish_hpc_results.sh, which updates the hpc-results branch with:

  • latest-hpc-results.json for the full machine-readable run summary
  • latest-tests.json for the Shields HPC test badge endpoint
  • latest-benchmarks.json for the Shields HPC benchmark badge endpoint
  • latest-meta.json for commit, timestamp, and cluster metadata

The raw branch history keeps one archived directory per commit so published status can be traced back to a specific HPC run without committing bulky logs or benchmark artifacts to the main repository history.

Standard Assembler Fixtures

The short-read assembler comparison benchmark now includes two deterministic fixtures that can be regenerated locally without external downloads:

FixtureTypeDescriptionGeneration
synthetic_isolate_5386IsolateSingle 5.4 kb synthetic genome for short-read assembly sanity checksPure Julia, fixed seed
synthetic_metagenome_pairMetagenomeTwo-genome low-complexity community with uneven coveragePure Julia, fixed seeds

Run the comparison benchmark with:

julia --project=. benchmarking/assembler_comparison_standard_fixtures.jl

This benchmark compares Mycelia.Rhizomorph.assemble_genome, run_megahit, and run_metaspades on the same generated FASTQ inputs and writes the run plan plus results as CSV files.

Standardized Test Datasets

To ensure rigorous validation across platforms, Mycelia uses the following gold-standard communities:

Mock Communities (Physical & Sequencing)

SourceProductComplexityDescription
ZymoD6331MediumGut Microbiome Standard (21 strains)
ZymoD6300LowMicrobial Community Standard (8 bacteria, 2 yeast)
ATCCMSA-1002Medium20 Strain Even Mix
ATCCMSA-1003Medium20 Strain Staggered Mix
NISTRM 8376HighMicrobial Pathogen DNA Standard

Benchmarking Challenges (Synthetic)

Simulation Targets

For internal testing, we target the following simulation profiles:

  • Depth: Low (10x), Medium (100x), High (1000x)
  • Diversity: Isolate, Defined Community (10), Complex Community (100+)
  • Abundance: Even, Random, Log-normal (staggered)

Coming Soon

Detailed benchmark results including:

  • Runtime comparisons
  • Memory usage analysis
  • Assembly quality metrics
  • Accuracy assessments

For current benchmarking code and data, see the benchmarking directory in the repository.