Getting Started with Mycelia
Welcome to Mycelia, an experimental Julia package for bioinformatics and computational biology. This guide will help you install Mycelia and explore its current capabilities.
What is Mycelia?
Mycelia is a research-oriented bioinformatics package that currently provides:
Working Features
- Basic Sequence I/O: FASTA/FASTQ file reading and writing
- Read Simulation: PacBio and Nanopore read simulators
- Tool Integration: Wrappers for established assemblers (MEGAHIT, SPAdes, hifiasm)
- K-mer Analysis: Canonical k-mer counting and distance calculations
- HPC Support: SLURM job submission and rclone integration
Experimental Features
- Novel Assembly Algorithms: Graph-based approaches incorporating quality scores
- Pangenome Analysis: K-mer based comparative genomics
- Quality Control: Integration with external QC tools
Planned Features
- Annotation: Gene prediction integration
- Phylogenetics: Tree construction from pangenome data
- Visualization: Interactive genomic plots
Prerequisites
Julia Installation
Mycelia is tested against Julia LTS and release. Install Julia using juliaup:
# Install juliaup
curl -fsSL https://install.julialang.org | sh
# Install latest lts Julia
juliaup add lts
juliaup default lts
# Install latest release Julia
juliaup add release
juliaup default release
System Dependencies
Mycelia integrates with external bioinformatics tools. For full functionality, you'll need conda. If no conda environment is available, Mycelia will install it's own environment using Conda.jl.
HPC Environment Setup
If you're using Mycelia on HPC systems, you may need to reset the LD_LIBRARY_PATH
to avoid conflicts with visualization libraries:
# Launch Julia with clean library path
export LD_LIBRARY_PATH="" && julia
For Jupyter kernels, add this to your kernel.json:
{
"env": {
"LD_LIBRARY_PATH": ""
}
}
Installation
For Users
Install Mycelia directly from GitHub:
import Pkg
Pkg.add(url="https://github.com/cjprybol/Mycelia.git")
For Developers
Clone and develop the package:
import Pkg
Pkg.develop(url="git@github.com:cjprybol/Mycelia.git")
Your First Analysis
Let's walk through a complete workflow using small test datasets included with Mycelia.
1. Load Mycelia
import Mycelia
2. Download Test Data
Download a reference genome for testing:
# Download a small viral genome (phiX174)
genome_file = Mycelia.download_genome_by_accession(accession="NC_001422.1")
# Or create a random test sequence
test_genome = Mycelia.random_fasta_record(moltype=:DNA, L=10000)
Mycelia.write_fasta(outfile="test_genome.fasta", records=[test_genome])
# Simulate reads from the genome
reads_file = Mycelia.simulate_pacbio_reads(fasta="test_genome.fasta", quantity="20x")
Mycelia.write_fastq("test_reads.fastq", reads)
3. Quality Control (Using External Tools)
Filter reads using integrated external tools:
# Filter long reads with filtlong
Mycelia.qc_filter_long_reads_filtlong(
input_file="test_reads.fastq",
output_file="filtered_reads.fastq",
min_length=1000,
min_mean_q=7
)
# Note: Native quality assessment functions are planned but not yet implemented
4. K-mer Analysis
Analyze k-mer composition:
# Count canonical k-mers
import Kmers
kmer_counts = Mycelia.count_canonical_kmers(Kmers.DNAKmer{21}, "test_reads.fastq")
println("Unique k-mers: $(length(kmer_counts))")
# Note: K-mer spectrum analysis and plotting functions are planned
5. Genome Assembly
Assemble using external tools through Mycelia wrappers:
# Use MEGAHIT for assembly (works with short reads)
Mycelia.assemble_metagenome_megahit(
reads=["test_reads.fastq"],
output_dir="megahit_assembly"
)
# Or use experimental graph-based assembly (research feature)
# Note: This is experimental and may not produce optimal results
# assembly = Mycelia.assemble_with_graph_framework("test_reads.fastq")
# Assembly evaluation functions are planned but not yet implemented
6. Comparative Analysis (Experimental)
Compare multiple genomes using k-mer analysis:
# Compare two genomes
genome_files = ["genome1.fasta", "genome2.fasta"]
pangenome_result = Mycelia.analyze_pangenome_kmers(
genome_files,
kmer_type=Kmers.DNAKmer{21}
)
println("Core k-mers: $(length(pangenome_result.core_kmers))")
println("Unique k-mers: $(sum(length(v) for v in values(pangenome_result.unique_kmers_by_genome)))")
# Note: Gene prediction and visualization functions are planned
What's Next?
Explore the available features and help improve the package:
Available Tutorials
- See the tutorials section for numbered examples
- Focus on working examples that demonstrate current capabilities
Research Features
- Explore the experimental graph-based assembly algorithms
- Test the quality-aware k-mer (qualmer) graph implementation
- Try the reinforcement learning guided assembly (very experimental)
Contributing
- Report issues or feature requests on GitHub
- Help implement planned features
- Improve documentation for existing functions
Note on CLI Tools
Command-line interface tools are planned but not yet implemented.
API Reference
Browse the complete API documentation for detailed function references and examples.
Getting Help
If you encounter issues:
- Check the troubleshooting guide
- Browse example workflows
- Report bugs on GitHub Issues
Memory and Performance
For large-scale analyses:
# Check memory requirements
estimated_memory = Mycelia.estimate_memory_usage(input_file="large_reads.fastq")
println("Estimated memory needed: $(estimated_memory) GB")
# Monitor memory during analysis
Mycelia.with_memory_monitoring() do
result = Mycelia.assemble_genome("large_reads.fastq")
end
Contributing
Mycelia is open-source and welcomes contributions! See our development guide for details.
Ready to dive deeper? Check out our workflow tutorials for complete biological analyses with real datasets.