Probabilistic Assembly with Mycelia

Mycelia's probabilistic assembly approach combines probabilistic modelling with biological insight to deliver assemblies with confidence intervals.

๐Ÿงฌ What is Probabilistic Assembly?

Traditional Assembly

Traditional assemblers make deterministic decisions at each step:

  • Fixed k-mer size throughout assembly
  • Binary decisions: keep or discard sequences
  • Quality scores often ignored after initial filtering
  • Heuristic-based error correction

Probabilistic Assembly

Mycelia's probabilistic approach treats assembly as a statistical inference problem:

  • Dynamic k-mer progression based on data characteristics
  • Probabilistic path selection using quality scores
  • Maximum likelihood error correction
  • Self-optimizing parameters through machine learning
Traditional:  Reads โ†’ Graph โ†’ Heuristic Cleaning โ†’ Assembly
Probabilistic: Reads โ†’ Quality-Aware Graph โ†’ Statistical Inference โ†’ Optimal Assembly

๐ŸŽฏ Why Use Probabilistic Assembly?

1. High Accuracy

  • Preserves quality information throughout assembly
  • Statistically principled error correction

2. Self-Optimizing

  • No manual parameter tuning required
  • Automatically selects optimal k-mer sizes
  • Adapts to your data's characteristics

3. Quality-Aware

  • First assembler to preserve per-base quality scores
  • Handles varying coverage gracefully

๐Ÿš€ Quick Start (5 Minutes)

Get your first assembly running in under 5 minutes:

# 1. Install Mycelia (one-time setup)
import Pkg
Pkg.add(url="https://github.com/cjprybol/Mycelia.git")

# 2. Load the package
import Mycelia

# 3. Assemble your genome
assembly = Mycelia.assemble_genome("my_reads.fastq")

# 4. Check your results
println("Assembly complete! $(assembly.num_contigs) contigs, N50: $(assembly.n50)")

# 5. Save the assembly
Mycelia.write_fasta(assembly.contigs, "my_assembly.fasta")

That's it! Mycelia automatically:

  • โœ“ Detects your data type
  • โœ“ Selects optimal parameters
  • โœ“ Performs quality-aware assembly
  • โœ“ Validates results

๐Ÿ—บ๏ธ Choose Your Path

By Data Type

graph TD
    A[What type of data do you have?] --> B[FASTA only<br/>No quality scores]
    A --> C[FASTQ<br/>With quality scores]
    A --> D[Mixed<br/>Short + long reads]
    
    B --> E[K-mer Graphs<br/>โ†’ Tutorial 1]
    C --> F[Qualmer Graphs<br/>โ†’ Tutorial 2]
    D --> G[Hybrid Assembly<br/>โ†’ Tutorial 3]

By Experience Level

๐ŸŒฑ Beginner - "I just want it to work"

โ†’ Start with Assembly in 5 Minutes

  • Automatic parameter selection
  • Simple one-function interface
  • Clear output interpretation

๐ŸŒฟ Intermediate - "I want to understand and optimize"

โ†’ Continue to Understanding Assembly Methods

  • Compare different approaches
  • Tune for your specific needs
  • Interpret quality metrics

๐ŸŒณ Advanced - "I want full control"

โ†’ Explore Advanced Assembly Theory

  • Custom graph algorithms
  • Machine learning integration
  • Novel method development

๐Ÿ“Š Each Method

Intelligent Assembly

  • โœ… Automatic parameter optimization
  • โœ… Good for unknown data characteristics
  • โœ… Balances speed and accuracy

Iterative Assembly

  • โœ… Best for low-quality data
  • โœ… Refines assembly through iterations
  • โฑ๏ธ Takes more time

Reinforcement Learning Assembly

  • โœ… Learns from experience
  • โœ… Adapts to new data types
  • ๐Ÿงช Experimental feature

๐Ÿ› ๏ธ Next Steps

Ready to Start?

  1. Install Mycelia - Multiple installation options
  2. Run the 5-minute tutorial - See it in action
  3. Choose your workflow - Find the best approach

Need Help?

Want to Learn More?

๐ŸŒŸ Why Mycelia?

Scientific Innovation

  • Novel 6-graph hierarchy - Unifies multiple assembly approaches
  • Quality preservation - First to maintain quality throughout assembly
  • Principled algorithms - Based on proven statistical methods

Practical Benefits

  • No parameter tuning - It should just work
  • Transparent results - Data-driven accuracy and completeness
  • Active development - Regular updates and improvements

Open Source

  • Free to use - MIT licensed
  • Community driven - Contributions welcome
  • Transparent - All algorithms documented

Ready to experience the future of genome assembly? Get started now โ†’