Getting Started with Mycelia

Welcome to Mycelia, an experimental Julia package for bioinformatics and computational biology. This guide will help you install Mycelia and explore its current capabilities.

What is Mycelia?

Mycelia is a research-oriented bioinformatics package that currently provides:

Working Features

  • Basic Sequence I/O: FASTA/FASTQ file reading and writing
  • Read Simulation: PacBio and Nanopore read simulators
  • Tool Integration: Wrappers for established assemblers (MEGAHIT, SPAdes, hifiasm)
  • K-mer Analysis: Canonical k-mer counting and distance calculations
  • HPC Support: SLURM job submission and rclone integration

Experimental Features

  • Novel Assembly Algorithms: Graph-based approaches incorporating quality scores
  • Pangenome Analysis: K-mer based comparative genomics
  • Quality Control: Integration with external QC tools

Planned Features

  • Annotation: Gene prediction integration
  • Phylogenetics: Tree construction from pangenome data
  • Visualization: Interactive genomic plots

Prerequisites

Julia Installation

Mycelia is tested against Julia LTS and release. Install Julia using juliaup:

# Install juliaup
curl -fsSL https://install.julialang.org | sh

# Install latest lts Julia
juliaup add lts
juliaup default lts

# Install latest release Julia
juliaup add release
juliaup default release

System Dependencies

Mycelia integrates with external bioinformatics tools. For full functionality, you'll need conda. If no conda environment is available, Mycelia will install it's own environment using Conda.jl.

HPC Environment Setup

If you're using Mycelia on HPC systems, you may need to reset the LD_LIBRARY_PATH to avoid conflicts with visualization libraries:

# Launch Julia with clean library path
export LD_LIBRARY_PATH="" && julia

For Jupyter kernels, add this to your kernel.json:

{
  "env": {
    "LD_LIBRARY_PATH": ""
  }
}

Installation

For Users

Install Mycelia directly from GitHub:

import Pkg
Pkg.add(url="https://github.com/cjprybol/Mycelia.git")

For Developers

Clone and develop the package:

import Pkg
Pkg.develop(url="git@github.com:cjprybol/Mycelia.git")

Your First Analysis

Let's walk through a complete workflow using small test datasets included with Mycelia.

1. Load Mycelia

import Mycelia

2. Download Test Data

Download a reference genome for testing:

# Download a small viral genome (phiX174)
genome_file = Mycelia.download_genome_by_accession(accession="NC_001422.1")

# Or create a random test sequence
test_genome = Mycelia.random_fasta_record(moltype=:DNA, L=10000)
Mycelia.write_fasta(outfile="test_genome.fasta", records=[test_genome])

# Simulate reads from the genome
reads_file = Mycelia.simulate_pacbio_reads(fasta="test_genome.fasta", quantity="20x")
Mycelia.write_fastq("test_reads.fastq", reads)

3. Quality Control (Using External Tools)

Filter reads using integrated external tools:

# Filter long reads with filtlong
Mycelia.qc_filter_long_reads_filtlong(
    input_file="test_reads.fastq",
    output_file="filtered_reads.fastq",
    min_length=1000,
    min_mean_q=7
)

# Note: Native quality assessment functions are planned but not yet implemented

4. K-mer Analysis

Analyze k-mer composition:

# Count canonical k-mers
import Kmers
kmer_counts = Mycelia.count_canonical_kmers(Kmers.DNAKmer{21}, "test_reads.fastq")
println("Unique k-mers: $(length(kmer_counts))")

# Note: K-mer spectrum analysis and plotting functions are planned

5. Genome Assembly

Assemble using external tools through Mycelia wrappers:

# Use MEGAHIT for assembly (works with short reads)
Mycelia.assemble_metagenome_megahit(
    reads=["test_reads.fastq"],
    output_dir="megahit_assembly"
)

# Or use experimental graph-based assembly (research feature)
# Note: This is experimental and may not produce optimal results
# assembly = Mycelia.assemble_with_graph_framework("test_reads.fastq")

# Assembly evaluation functions are planned but not yet implemented

6. Comparative Analysis (Experimental)

Compare multiple genomes using k-mer analysis:

# Compare two genomes
genome_files = ["genome1.fasta", "genome2.fasta"]
pangenome_result = Mycelia.analyze_pangenome_kmers(
    genome_files,
    kmer_type=Kmers.DNAKmer{21}
)
println("Core k-mers: $(length(pangenome_result.core_kmers))")
println("Unique k-mers: $(sum(length(v) for v in values(pangenome_result.unique_kmers_by_genome)))")

# Note: Gene prediction and visualization functions are planned

What's Next?

Explore the available features and help improve the package:

Available Tutorials

  • See the tutorials section for numbered examples
  • Focus on working examples that demonstrate current capabilities

Research Features

  • Explore the experimental graph-based assembly algorithms
  • Test the quality-aware k-mer (qualmer) graph implementation
  • Try the reinforcement learning guided assembly (very experimental)

Contributing

  • Report issues or feature requests on GitHub
  • Help implement planned features
  • Improve documentation for existing functions

Note on CLI Tools

Command-line interface tools are planned but not yet implemented.

API Reference

Browse the complete API documentation for detailed function references and examples.

Getting Help

If you encounter issues:

  1. Check the troubleshooting guide
  2. Browse example workflows
  3. Report bugs on GitHub Issues

Memory and Performance

For large-scale analyses:

# Check memory requirements
estimated_memory = Mycelia.estimate_memory_usage(input_file="large_reads.fastq")
println("Estimated memory needed: $(estimated_memory) GB")

# Monitor memory during analysis
Mycelia.with_memory_monitoring() do
    result = Mycelia.assemble_genome("large_reads.fastq")
end

Contributing

Mycelia is open-source and welcomes contributions! See our development guide for details.


Ready to dive deeper? Check out our workflow tutorials for complete biological analyses with real datasets.