API Documentation

Welcome to the Mycelia API documentation! This guide organizes both implemented functions and planned features by biological workflows. Mycelia provides substantial functionality for bioinformatics analysis with extensive tool integration, while continuing to expand with experimental algorithms and additional features.

๐Ÿงฌ Quick Start

New to Mycelia? Start with our workflow-based guides:

๐Ÿ“‹ By Workflow Stage

Follow the typical bioinformatics analysis workflow:

1. Data Acquisition & Simulation

Download genomic data from public databases and simulate synthetic datasets for testing.

Working Functions: download_genome_by_accession, simulate_pacbio_reads, simulate_nanopore_reads

2. Quality Control & Preprocessing

Assess and improve sequencing data quality before analysis.

Working Functions: analyze_fastq_quality, calculate_gc_content, assess_duplication_rates, qc_filter_short_reads_fastp, qc_filter_long_reads_filtlong, trim_galore_paired Planned: filter_by_quality, per-base quality visualization

3. Sequence Analysis & K-mers

Analyze sequence composition, count k-mers, and extract genomic features.

Working Functions: count_canonical_kmers, jaccard_distance, kmer_counts_to_js_divergence Planned: kmer_frequency_spectrum, estimate_genome_size

4. Genome Assembly (planned)

Assemble genomes from sequencing reads using various approaches.

Working Functions: assemble_metagenome_megahit, assemble_metagenome_metaspades (external tools) Experimental: Graph-based assembly framework Planned: assemble_genome, polish_assembly

5. Assembly Validation

Validate and assess the quality of genome assemblies.

Working Functions: assess_assembly_quality, validate_assembly, run_quast, run_busco, run_mummer, CheckM/CheckM2 integration Planned: Mauve integration

6. Gene Annotation

Predict genes and assign functional annotations.

Working Functions: Pyrodigal, BLAST+, MMSeqs2, TransTerm, tRNAscan-SE, MLST integrations Planned: GO term analysis, Reactome pathway analysis, PDB integration via UniRef annotations

7. Comparative Genomics (planned)

Compare genomes, build pangenomes, and construct phylogenetic trees.

Working Functions: analyze_pangenome_kmers, build_genome_distance_matrix Planned: construct_phylogeny, calculate_synteny

8. Visualization & Reporting

Create plots, figures, and reports for analysis results.

Working Functions: plot_kmer_frequency_spectra, visualize_genome_coverage, plot_embeddings, plot_taxa_abundances, coverage plots, taxonomic visualizations Planned: Per-base quality plots, assembly statistics visualization, phylogenetic tree plotting

๐Ÿ“ By Data Type

Working with specific file formats and data structures:

<!โ€“ Data type documentation planned for future releases

    โ€“>

    FASTA/FASTQ Files (planned)

    Reading, writing, and manipulating sequence files.

    Assembly Files (planned)

    Working with contigs, scaffolds, and assembly statistics.

    Annotation Files (planned)

    Handling GFF3, GenBank, and other annotation formats.

    Alignment Files (planned)

    Processing BAM/SAM files and alignment results.

    Phylogenetic Trees (planned)

    Tree construction, manipulation, and visualization.

    ๐ŸŽฏ By Analysis Goal

    Cross-cutting concerns and specific use cases:

    Basic Workflows

    Complete examples for common analysis tasks.

    Advanced Usage

    Complex workflows and optimization techniques.

    Function Index

    Alphabetical listing of all functions with brief descriptions.

    Parameter Guide

    Common parameters and their usage across functions.

    ๐Ÿ” Finding What You Need

    By Task

    • "I want to assemble a genome" โ†’ Genome Assembly (planned)
    • "I need to validate my assembly" โ†’ Assembly Validation (planned)
    • "I want to compare genomes" โ†’ Comparative Genomics (planned)
    • "I need to check data quality" โ†’ Quality Control

    By Data Type

    • "I have FASTQ files" โ†’ FASTA/FASTQ Files (planned)
    • "I have assembly contigs" โ†’ Assembly Files (planned)
    • "I have gene annotations" โ†’ Annotation Files (planned)

    By Experience Level

    ๐Ÿ’ก Usage Patterns

    Function Documentation Format

    Each function is documented with:

    """
        function_name(required_param, optional_param="default")
    
    Brief description of what the function does.
    
    ## Purpose
    When and why to use this function in your workflow.
    
    ## Arguments
    - `required_param`: Description and expected data type
    - `optional_param`: Description, default value, and alternatives
    
    ## Returns
    Description of return value and structure.
    
    ## Examples

    julia

    Basic usage

    result = function_name("input.fasta")

    Advanced usage with options

    result = functionname("input.fasta", optionalparam="custom_value", threads=4)

    
    ## Related Functions
    - [`related_function`](@ref) - What it does
    - [`workflow_next_step`](@ref) - Next step in workflow
    
    ## Performance Notes
    - Memory usage: ~X GB for typical datasets
    - Runtime: ~X minutes for Y-sized genomes
    - Scaling: Linear/quadratic with input size
    
    ## See Also
    - [Workflow Guide](../workflows/relevant-workflow.md)
    - [Data Type Guide](../data-types/relevant-type.md)
    """

    Cross-References

    Functions are linked to:

    • Workflow context - Where they fit in analysis pipelines
    • Related functions - What to use before/after
    • Data types - What formats they accept/produce
    • Examples - Real usage scenarios

    ๐Ÿš€ Integration with Tutorials

    This API documentation integrates with the tutorial system:

    • Tutorials show complete workflows with explanation
    • API docs provide detailed function reference
    • Examples bridge the gap with focused use cases

    For hands-on learning, see the Tutorials which use these functions in complete bioinformatics workflows.


    This documentation is automatically generated from function docstrings and organized for biological workflows. Functions are tested through the tutorial system to ensure accuracy.