Sequence Analysis & K-mers
Functions for analyzing sequence composition, counting k-mers, and extracting genomic features from sequencing data.
Overview
Sequence analysis forms the foundation of many bioinformatics algorithms. Mycelia provides comprehensive tools for:
- K-mer counting and analysis using efficient algorithms
- Genome size estimation from k-mer frequency spectra
- Error detection and correction using k-mer patterns
- Sequence composition analysis and bias detection
- Memory-efficient processing of large datasets
Common Workflows
1. Basic K-mer Analysis
# Count k-mers in sequencing reads
kmer_counts = Mycelia.count_kmers("reads.fastq", k=21)
# Analyze k-mer frequency spectrum
spectrum = Mycelia.kmer_frequency_spectrum(kmer_counts)
2. Genome Size Estimation
# Estimate genome size from k-mer spectrum
genome_size = Mycelia.estimate_genome_size_from_kmers(kmer_counts, coverage_peak=30)
3. Contamination Detection
# Detect contamination using k-mer profiles
contamination = Mycelia.detect_contamination_kmers("reads.fastq", expected_profile)
K-mer Counting
Basic K-mer Counting
Mycelia.count_kmers
— Functioncount_kmers(
_::Type{Kmers.Kmer{A<:BioSequences.DNAAlphabet, K}},
sequence::BioSequences.LongSequence
) -> Any
Count the frequency of each k-mer in a DNA sequence.
Arguments
::Type{Kmers.Kmer{A,K}}
: K-mer type with alphabet A and length Ksequence::BioSequences.LongSequence
: Input DNA sequence to analyze
Returns
A sorted dictionary mapping each k-mer to its frequency count in the sequence.
Type Parameters
A <: BioSequences.DNAAlphabet
: DNA alphabet typeK
: Length of k-mers
count_kmers(
_::Type{Kmers.Kmer{A<:BioSequences.RNAAlphabet, K}},
sequence::BioSequences.LongSequence
) -> Any
Count the frequency of each k-mer in an RNA sequence.
Arguments
Kmer
: Type parameter specifying the k-mer length K and RNA alphabetsequence
: Input RNA sequence to analyze
Returns
Dict{Kmers.Kmer, Int}
: Sorted dictionary mapping each k-mer to its frequency count
count_kmers(
_::Type{Kmers.Kmer{A<:BioSequences.AminoAcidAlphabet, K}},
sequence::BioSequences.LongSequence
) -> OrderedCollections.OrderedDict{K, Int64} where K<:(Kmers.Kmer{BioSequences.AminoAcidAlphabet, _A, _B} where {_B, _A})
Count the frequency of amino acid k-mers in a biological sequence.
Arguments
Kmers.Kmer{A,K}
: Type parameter specifying amino acid alphabet (A) and k-mer length (K)sequence
: Input biological sequence to analyze
Returns
A sorted dictionary mapping each k-mer to its frequency count in the sequence.
count_kmers(
_::Type{KMER_TYPE},
record::Union{FASTX.FASTA.Record, FASTX.FASTQ.Record}
) -> Any
Count the frequency of amino acid k-mers in a biological sequence.
Arguments
Kmers.Kmer{A,K}
: Type parameter specifying amino acid alphabet (A) and k-mer length (K)sequence
: Input biological sequence to analyze
Returns
A sorted dictionary mapping each k-mer to its frequency count in the sequence.
count_kmers(
_::Type{KMER_TYPE},
records::AbstractArray{T<:Union{FASTX.FASTA.Record, FASTX.FASTQ.Record}, 1}
) -> Any
Count k-mers across multiple sequence records and return a sorted frequency table.
Arguments
KMER_TYPE
: Type parameter specifying the k-mer length (e.g.,DNAKmer{3}
for 3-mers)records
: Vector of FASTA/FASTQ records to analyze
Returns
Dict{KMER_TYPE, Int}
: Sorted dictionary mapping k-mers to their frequencies
count_kmers(
_::Type{KMER_TYPE},
sequences::Union{FASTX.FASTA.Reader, FASTX.FASTQ.Reader}
) -> Any
Counts k-mer occurrences in biological sequences from a FASTA/FASTQ reader.
Arguments
KMER_TYPE
: Type parameter specifying the k-mer length and encoding (e.g.,DNAKmer{4}
for 4-mers)sequences
: A FASTA or FASTQ reader containing the biological sequences to analyze
Returns
A dictionary mapping k-mers to their counts in the input sequences
count_kmers(
_::Type{KMER_TYPE},
fastx_files::AbstractArray{T<:AbstractString, 1}
) -> Any
Count k-mers across multiple FASTA/FASTQ files and merge the results.
Arguments
KMER_TYPE
: Type parameter specifying the k-mer length (e.g.,DNAKmer{4}
for 4-mers)fastx_files
: Vector of paths to FASTA/FASTQ files
Returns
Dict{KMER_TYPE, Int}
: Dictionary mapping k-mers to their total counts across all files
count_kmers(
_::Type{KMER_TYPE},
fastx_file::AbstractString
) -> Any
Count k-mers in a FASTA/FASTQ file and return their frequencies.
Arguments
KMER_TYPE
: Type parameter specifying the k-mer type (e.g.,DNAKmer{K}
)fastx_file
: Path to input FASTA/FASTQ file
Returns
Dict{KMER_TYPE, Int}
: Dictionary mapping each k-mer to its frequency
Mycelia.count_canonical_kmers
— Functioncount_canonical_kmers(_::Type{KMER_TYPE}, sequences) -> Any
Count canonical k-mers in biological sequences. A canonical k-mer is the lexicographically smaller of a DNA sequence and its reverse complement, ensuring strand-independent counting.
Arguments
KMER_TYPE
: Type parameter specifying the k-mer size and structuresequences
: Iterator of biological sequences to analyze
Returns
Dict{KMER_TYPE,Int}
: Dictionary mapping canonical k-mers to their counts
Mycelia.fasta_list_to_dense_kmer_counts
— Functionfasta_list_to_dense_kmer_counts(
;
fasta_list,
k,
alphabet,
temp_dir_parent,
count_element_type,
result_file,
force,
cleanup_temp
)
Create a dense k-mer counts table for a set of FASTA files, with disk-backed temporary storage, custom element type, robust error handling, and optional output file caching.
Mycelia.fasta_list_to_sparse_kmer_counts
— Functionfasta_list_to_sparse_kmer_counts(
;
fasta_list,
k,
alphabet,
temp_dir_parent,
count_element_type,
rarefaction_data_filename,
rarefaction_plot_basename,
show_rarefaction_plot,
result_file,
out_dir,
force,
rarefaction_plot_kwargs...
)
Create a sparse kmer counts table (SparseMatrixCSC) from a list of FASTA files using a 3-pass approach. Pass 1 (Parallel): Counts kmers per file and writes to temporary JLD2 files. Pass 2 (Serial): Aggregates unique kmers, max count, nnz per file, and rarefaction data from temp files. Generates and saves a k-mer rarefaction plot. Pass 3 (Parallel): Reads temporary counts again to construct the final sparse matrix.
Optionally, a results filename can be provided to save/load the output. If the file exists and force
is false, the result is loaded and returned. If force
is true or the file does not exist, results are computed and saved.
Output Directory Behavior
- All auxiliary output files (e.g., rarefaction data, plots) are written to a common output directory.
- By default, this is:
- The value of
out_dir
if provided. - Otherwise, the directory containing
result_file
(if provided and has a directory component). - Otherwise, the current working directory (
pwd()
).
- The value of
- If you provide an absolute path for an output file (e.g.
rarefaction_data_filename
), that path is used directly. - If both
out_dir
and a relative filename are given, the file is written toout_dir
.
Arguments
fasta_list::AbstractVector{<:AbstractString}
: A list of paths to FASTA files.k::Integer
: The length of the kmer.alphabet::Symbol
: The alphabet type (:AA, :DNA, :RNA).temp_dir_parent::AbstractString
: Parent directory for creating the temporary working directory. Defaults toBase.tempdir()
.count_element_type::Union{Type{<:Unsigned}, Nothing}
: Optional. Specifies the unsigned integer type for the counts. Ifnothing
(default), the smallestUInt
type capable of holding the maximum observed count is used.rarefaction_data_filename::AbstractString
: Filename for the TSV output of rarefaction data. If a relative path, will be written toout_dir
.rarefaction_plot_basename::AbstractString
: Basename for the output rarefaction plots. If a relative path, will be written toout_dir
.show_rarefaction_plot::Bool
: Whether to display the rarefaction plot after generation. Defaults totrue
.result_file::Union{Nothing, AbstractString}
: Optional. If provided, path to a file to save/load the full results (kmers, counts, etc) as a JLD2 file.out_dir::Union{Nothing, AbstractString}
: Optional. Output directory for auxiliary outputs. Defaults as described above.force::Bool
: If true, recompute and overwrite the output file even if it exists. Defaults tofalse
.rarefaction_plot_kwargs...
: Keyword arguments to pass toplot_kmer_rarefaction
for plot customization.
Returns
NamedTuple{(:kmers, :counts, :rarefaction_data_path)}
:kmers
: A sortedVector
of unique kmer objects.counts
: ASparseArrays.SparseMatrixCSC{V, Int}
storing kmer counts.rarefaction_data_path
: Path to the saved TSV file with rarefaction data.
Raises
ErrorException
: If inputfasta_list
is empty, alphabet is invalid, or required Kmer/counting functions are not found.
Example: K-mer Counting Comparison
# Compare different k-mer counting approaches
reads_file = "reads.fastq"
# Dense counting (stores all possible k-mers)
dense_counts = Mycelia.count_kmers(reads_file, k=15, method="dense")
println("Dense matrix size: $(size(dense_counts))")
# Sparse counting (only observed k-mers)
sparse_counts = Mycelia.count_kmers(reads_file, k=21, method="sparse")
println("Unique k-mers: $(length(sparse_counts))")
# Canonical k-mers (combines forward and reverse complement)
canonical_counts = Mycelia.count_canonical_kmers(reads_file, k=21)
println("Canonical k-mers: $(length(canonical_counts))")
Advanced K-mer Analysis
<!– buildkmergraph, findkmeroverlaps, extractkmerpaths, analyzekmerconnectivity not yet implemented as documented –>
Mycelia.build_kmer_graph_next
— Functionbuild_kmer_graph_next(
kmer_type,
observations::AbstractVector{<:Union{FASTX.FASTA.Record, FASTX.FASTQ.Record}};
graph_mode
) -> MetaGraphsNext.MetaGraph{Int64, Graphs.SimpleGraphs.SimpleDiGraph{Int64}, Label, VertexData, EdgeData, Nothing, WeightFunction, Float64} where {Label, VertexData, EdgeData, WeightFunction}
Create a next-generation, type-stable k-mer graph using MetaGraphsNext.
This implementation uses canonical k-mers as vertices with strand-aware edges that respect biological transition constraints for both single-strand and double-strand sequences.
Arguments
kmer_type
: Type of k-mer (e.g., DNAKmer{K})observations
: Vector of FASTA/FASTQ recordsgraph_mode
:SingleStrand
for directional sequences,DoubleStrand
for DNA (default)
Returns
MetaGraphsNext.MetaGraph
with canonical vertices and strand-aware edges
Details
- Vertices: Always canonical k-mers (lexicographically smaller of kmer/reverse_complement)
- Edges: Strand-aware transitions that respect biological constraints
- SingleStrand mode: Only forward-strand transitions allowed
- DoubleStrand mode: Both forward and reverse-complement transitions allowed
Mycelia.build_stranded_kmer_graph
— Functionbuild_stranded_kmer_graph(
kmer_type,
observations::AbstractVector{<:Union{FASTX.FASTA.Record, FASTX.FASTQ.Record}}
) -> MetaGraphs.MetaDiGraph{T, Float64} where T<:Integer
Create a weighted, strand-specific kmer (de bruijn) graph from a set of kmers and a series of sequence observations in FASTA format.
Mycelia.build_directed_kmer_graph
— Functionbuild_directed_kmer_graph(; fastq, k, plot)
Constructs a directed graph representation of k-mer transitions from FASTQ sequencing data.
Arguments
fastq
: Path to input FASTQ filek
: K-mer size (default: 1). Must be odd and prime. If k=1, optimal size is auto-determinedplot
: Boolean to display quality distribution plot (default: false)
Returns
MetaDiGraph with properties:
- assembly_k: k-mer size used
- kmer_counts: frequency of each k-mer
- transition_likelihoods: edge weights between k-mers
- kmermeanquality, kmertotalquality: quality metrics
- branchingnodes, unbranchingnodes: topological classification
- likelyvalidkmer_indices: k-mers above mean quality threshold
- likelysequencingartifact_indices: potential erroneous k-mers
Note
For DNA assembly, quality scores are normalized across both strands.
Example: K-mer Graph Construction
# Build k-mer overlap graph
kmer_graph = Mycelia.build_kmer_graph(
sequences,
k=31,
min_overlap=20,
min_coverage=5
)
# Analyze graph properties
graph_stats = Mycelia.analyze_kmer_connectivity(kmer_graph)
println("Graph nodes: $(graph_stats.n_nodes)")
println("Graph edges: $(graph_stats.n_edges)")
println("Connected components: $(graph_stats.n_components)")
K-mer Frequency Analysis
Frequency Spectra
<!– kmerfrequencyspectrum, analyzespectrumpeaks, fitspectrummodel not yet implemented as documented –>
Mycelia.jellyfish_counts_to_kmer_frequency_histogram
— Functionjellyfish_counts_to_kmer_frequency_histogram(
jellyfish_counts_file
) -> Any
jellyfish_counts_to_kmer_frequency_histogram(
jellyfish_counts_file,
outfile
) -> Any
Convert a Jellyfish k-mer count file into a frequency histogram.
Arguments
jellyfish_counts_file::String
: Path to the gzipped TSV file containing Jellyfish k-mer countsoutfile::String=replace(jellyfish_counts_file, r"\.tsv\.gz$" => ".count_histogram.tsv")
: Optional output file path
Returns
String
: Path to the generated histogram file
Description
Processes a Jellyfish k-mer count file to create a frequency histogram where:
- Column 1: Number of k-mers that share the same count
- Column 2: The count they share
Uses system sorting with LC_ALL=C for optimal performance on large files.
Notes
- Requires gzip, sort, uniq, and sed command line tools
- Uses intermediate disk storage for sorting large files
- Skips processing if output file already exists
Mycelia.plot_kmer_frequency_spectra
— Functionplot_kmer_frequency_spectra(
counts;
log_scale,
kwargs...
) -> Plots.Plot
Plots a histogram of kmer counts against # of kmers with those counts
Returns the plot object for adding additional layers and saving
Creates a scatter plot visualizing the k-mer frequency spectrum - the relationship between k-mer frequencies and how many k-mers occur at each frequency.
Arguments
counts::AbstractVector{<:Integer}
: Vector of k-mer counts/frequencieslog_scale::Union{Function,Nothing} = log2
: Function to apply logarithmic scaling to both axes. Set tonothing
to use linear scaling.kwargs...
: Additional keyword arguments passed toStatsPlots.plot()
Returns
Plots.Plot
: A scatter plot object that can be further modified or saved
Details
The x-axis shows k-mer frequencies (how many times each k-mer appears), while the y-axis shows how many distinct k-mers appear at each frequency. Both axes are log-scaled by default using log2.
Example: K-mer Spectrum Analysis
# Generate and analyze k-mer frequency spectrum
kmer_counts = Mycelia.count_kmers("reads.fastq", k=21)
spectrum = Mycelia.kmer_frequency_spectrum(kmer_counts)
# Identify characteristic peaks
peaks = Mycelia.analyze_spectrum_peaks(spectrum)
println("Coverage peak at frequency: $(peaks.main_peak)")
println("Error peak at frequency: $(peaks.error_peak)")
# Fit mathematical model to spectrum
model = Mycelia.fit_spectrum_model(spectrum, model_type="negative_binomial")
println("Model fit R²: $(model.r_squared)")
Genome Characteristics from K-mers
<!– estimategenomesizefromkmers, estimatecoveragefromkmers, detectrepetitivekmers, calculategenome_complexity not yet implemented as documented –>
Example: Genome Size Estimation
# Estimate genome size using k-mer spectrum
kmer_counts = Mycelia.count_kmers("reads.fastq", k=21)
spectrum = Mycelia.kmer_frequency_spectrum(kmer_counts)
# Find coverage peak
coverage_peak = Mycelia.find_coverage_peak(spectrum)
println("Estimated coverage: $(coverage_peak)x")
# Calculate genome size
total_kmers = sum(spectrum.frequencies .* spectrum.counts)
genome_size = total_kmers ÷ coverage_peak
println("Estimated genome size: $(genome_size) bp")
# Alternative method with error correction
corrected_estimate = Mycelia.estimate_genome_size_from_kmers(
kmer_counts,
error_correction=true,
ploidy=1
)
println("Corrected genome size: $(corrected_estimate.size) bp")
println("Confidence interval: $(corrected_estimate.confidence_interval)")
Error Detection and Correction
K-mer Based Error Detection
<!– identifyerrorkmers, correctsequencingerrors, validateerrorcorrection not yet implemented as documented –>
Example: Error Detection
# Identify likely error k-mers
error_kmers = Mycelia.identify_error_kmers(
kmer_counts,
min_coverage=3,
max_coverage=100
)
println("Potential error k-mers: $(length(error_kmers))")
println("Error rate estimate: $(length(error_kmers) / length(kmer_counts) * 100)%")
# Correct errors in reads
corrected_reads = Mycelia.correct_sequencing_errors(
"reads.fastq",
error_kmers,
correction_method="consensus"
)
# Validate correction effectiveness
validation = Mycelia.validate_error_correction(
"reads.fastq",
corrected_reads,
reference_genome="reference.fasta"
)
println("Error reduction: $(validation.error_reduction_percent)%")
Sequence Composition Analysis
Nucleotide Composition
<!– calculatenucleotidefrequencies, analyzedinucleotidefrequencies, calculatecodonusage, detectsequencebias not yet implemented as documented –>
Example: Comprehensive Composition Analysis
# Analyze nucleotide composition
composition = Mycelia.calculate_nucleotide_frequencies("sequences.fasta")
println("GC content: $(composition.gc_content)%")
println("AT content: $(composition.at_content)%")
# Dinucleotide analysis
dinuc_freq = Mycelia.analyze_dinucleotide_frequencies("sequences.fasta")
println("CpG frequency: $(dinuc_freq.CG)")
println("Expected CpG: $(dinuc_freq.expected_CG)")
println("CpG O/E ratio: $(dinuc_freq.CG_oe_ratio)")
# Detect composition bias
bias_analysis = Mycelia.detect_sequence_bias(composition)
if bias_analysis.bias_detected
println("Sequence bias detected:")
println(" Type: $(bias_analysis.bias_type)")
println(" Strength: $(bias_analysis.bias_strength)")
end
Codon Usage Analysis
<!– calculatecodonusage, analyzecodonbias, comparecodonusage, optimizecodonusage not yet implemented as documented –>
Example: Codon Usage Analysis
# Analyze codon usage in coding sequences
cds_file = "coding_sequences.fasta"
codon_usage = Mycelia.calculate_codon_usage(cds_file, genetic_code="standard")
# Calculate codon bias metrics
bias_metrics = Mycelia.analyze_codon_bias(codon_usage)
println("Effective Number of Codons (ENC): $(bias_metrics.enc)")
println("Codon Adaptation Index (CAI): $(bias_metrics.cai)")
println("Frequency of Optimal Codons (FOP): $(bias_metrics.fop)")
# Compare with reference organism
reference_usage = Mycelia.load_reference_codon_usage("escherichia_coli")
comparison = Mycelia.compare_codon_usage(codon_usage, reference_usage)
println("Similarity to E. coli: $(comparison.similarity_score)")
Memory-Efficient Processing
Large Dataset Handling
<!– streamkmercounting, processkmersinchunks, parallelkmeranalysis, memoryefficient_counting not yet implemented as documented –>
Example: Large File Processing
# Process large files with memory constraints
large_file = "large_dataset.fastq"
# Stream processing for memory efficiency
kmer_counts = Mycelia.stream_kmer_counting(
large_file,
k=21,
chunk_size=100000,
memory_limit_gb=8
)
# Parallel processing for speed
parallel_counts = Mycelia.parallel_kmer_analysis(
large_file,
k=21,
n_workers=8,
merge_strategy="sum"
)
Optimized Data Structures
<!– createsparsekmermatrix, buildcompressedkmerindex, usebloomfiltercounting, implementcountminsketch not yet implemented as documented –>
Example: Memory Optimization
# Use probabilistic data structures for very large datasets
bloom_filter = Mycelia.create_kmer_bloom_filter(
estimated_kmers=1_000_000_000,
false_positive_rate=0.01
)
# Streaming k-mer presence testing
for read in Mycelia.stream_fastq("huge_dataset.fastq")
for kmer in Mycelia.extract_kmers(read, k=21)
if Mycelia.probably_present(bloom_filter, kmer)
# Process likely present k-mer
Mycelia.process_kmer(kmer)
end
end
end
Comparative K-mer Analysis
Multi-Sample Analysis
<!– comparekmerprofiles, buildkmerdistancematrix, clusterbykmersimilarity, identifysamplespecific_kmers not yet implemented as documented –>
Example: Multi-Sample K-mer Comparison
# Compare k-mer profiles across multiple samples
sample_files = ["sample1.fastq", "sample2.fastq", "sample3.fastq"]
# Build k-mer profiles for each sample
profiles = [Mycelia.count_kmers(file, k=21) for file in sample_files]
# Calculate pairwise distances
distance_matrix = Mycelia.build_kmer_distance_matrix(profiles, metric="jaccard")
println("Pairwise k-mer distances:")
display(distance_matrix)
# Cluster samples by k-mer similarity
clusters = Mycelia.cluster_by_kmer_similarity(profiles, method="hierarchical")
println("Sample clusters: $(clusters)")
Contamination Detection
<!– detectcontaminationkmers, identifyforeignkmers, classifykmersources, removecontaminatingkmers not yet implemented as documented –>
Example: Contamination Detection
# Detect contamination using k-mer profiles
sample_kmers = Mycelia.count_kmers("sample.fastq", k=21)
reference_kmers = Mycelia.count_kmers("reference_genome.fasta", k=21)
# Identify foreign k-mers
foreign_kmers = Mycelia.identify_foreign_kmers(
sample_kmers,
reference_kmers,
min_abundance=5
)
contamination_rate = length(foreign_kmers) / length(sample_kmers)
println("Contamination rate: $(contamination_rate * 100)%")
# Classify contamination sources
contamination_sources = Mycelia.classify_kmer_sources(
foreign_kmers,
database_kmers=["human", "bacterial", "viral"]
)
for (source, proportion) in contamination_sources
println("$(source): $(proportion * 100)%")
end
Visualization and Reporting
K-mer Plots
<!– plotkmerspectrum, plotkmercomposition, plotgenomesizeestimation, createkmer_dashboard not yet implemented as individual functions –>
Example: K-mer Visualization
# Create comprehensive k-mer analysis plots
kmer_counts = Mycelia.count_kmers("reads.fastq", k=21)
# K-mer frequency spectrum
spectrum_plot = Mycelia.plot_kmer_spectrum(kmer_counts,
title="21-mer Frequency Spectrum",
log_scale=true)
# Genome size estimation plot
size_plot = Mycelia.plot_genome_size_estimation(kmer_counts,
show_confidence_interval=true)
# Combined dashboard
kmer_dashboard = Mycelia.create_kmer_dashboard(kmer_counts)
Mycelia.save_plot(kmer_dashboard, "kmer_analysis.png")
Performance Considerations
Algorithm Selection
- Dense matrices: Use for small k (k ≤ 12) or small genomes
- Sparse matrices: Use for large k (k ≥ 15) or large genomes
- Streaming: Use for memory-constrained environments
- Parallel processing: Use for time-critical applications
Memory Usage
- k=15: ~1 GB for dense counting
- k=21: ~17 GB for dense counting (use sparse)
- k=31: Sparse counting only
- Large genomes: Consider streaming or chunked processing
Computational Complexity
- Time: O(n) for sequence length n
- Space: O(4^k) for dense, O(unique k-mers) for sparse
- Parallel: Near-linear speedup for independent samples
Common Issues and Solutions
Memory Limitations
# Handle memory constraints
if Mycelia.estimate_memory_usage(file, k=21) > Mycelia.available_memory()
# Use streaming approach
kmer_counts = Mycelia.stream_kmer_counting(file, k=21, chunk_size=50000)
else
# Use standard approach
kmer_counts = Mycelia.count_kmers(file, k=21)
end
Parameter Selection
# Choose optimal k-mer size
optimal_k = Mycelia.select_optimal_k(
genome_size_estimate=5_000_000,
error_rate=0.01,
coverage=30
)
println("Recommended k-mer size: $(optimal_k)")
Related Functions
Data Structures
KmerCounts
- K-mer count data structureKmerSpectrum
- Frequency spectrum representationKmerGraph
- K-mer overlap graph
File I/O
read_fastq
- Read sequence filessave_kmer_counts
- Save k-mer countsload_kmer_counts
- Load saved counts
Related Workflows
Previous Steps
- Quality Control - Preprocess reads before k-mer analysis
Next Steps
- Genome Assembly (planned) - Use k-mer analysis for assembly
- Comparative Genomics (planned) - Compare k-mer profiles
See Also
- Tutorial 3: K-mer Analysis
- Assembly Workflows (planned)
- Performance Optimization