Assembly with canu(2)¶

Generate corrected reads¶

The command to run the canu correction is:

canu -correct

with the following parameters:

What?	parameter	Our value
The input read file	-nanopore-raw	~/workdir/data_artic/basecall_small_porechopped.fastq.gz
The output directory	-d	~/workdir/assembly/small_correct
Prefix for output files	-p	assembly
Use a grid engine	useGrid	false
Genome Size	genomeSize	30k
Minimum Read Length	minReadLength	300
Minimum Overlap Length	minOverlapLength	20 or try out different value
Optional: Coverage of corrected reads	corOutCoverage	something smaller than our coverage (~600)
Optional: Min coverage for corrected reads	corMinCoverage	0 to get all
Optional: Correction Sensitivity	corMhapSensitivity	normal

The corOutCoverage parameter defines to which coverage the reads are corrected, longest reads are corrected first. It is advisable to set this parameter high, to get more sequences into the assembly. corMinCoverage set to low value, will report low covered reads as well and corMhapSensitivity=normal is advised for higher coverage.

The complete command is:

canu -correct -d ~/workdir/assembly/small_correct -p assembly useGrid=false -nanopore-raw ~/workdir/data_artic/basecall_small_porechopped.fastq.gz genomeSize=30k minReadLength=300 minOverlapLength=20

Get error statistics¶

Let’s get some error statistics for the corrected reads. Map the corrected reads to the Wuhan reference:

minimap2 -a ~/workdir/wuhan.fasta ~/workdir/assembly/small_correct/assembly.correctedReads.fasta.gz | samtools view -b - | samtools sort - > ~/workdir/assembly/small_correct/corrected_reads_vs_wuhan.sorted.bam

Then run qualimap:

qualimap bamqc -bam ~/workdir/assembly/small_correct/corrected_reads_vs_wuhan.sorted.bam -nw 5000 -nt 14 -c -outdir ~/workdir/assembly/small_correct/qualimap/

Then open the results in a web browser:

firefox ~/workdir/assembly/small_correct/qualimap/qualimapReport.html

Inspect the results, how much did our error rate decrease?

Generate and assemble trimmed reads¶

The trimming stage identifies unsupported regions in the input and trims or splits reads to their longest supported range. The assembly stage makes a final pass to identify sequencing errors; constructs the best overlap graph (BOG); and outputs contigs, an assembly graph, and summary statistics.

Now run the trimming and assembly step using the following command:

canu -trim-assemble

You need to define the following parameters:

-nanopore-corrected <corrected reads file>

The output directory should be named:

~/workdir/assembly/small_assembly/

In addition, we need some further parameters:

useGrid=false (we don't have a cluster)
minReadLength=<minimum read length>
minOverlapLength=<minimum overlap length>
genomeSize=<size of the target genome, i.e. 50k>

Use the same parameters as before (although you could use different settings here).

Go to the next page, if you need help.

References¶

Canu https://github.com/marbl/canu

Minimap2 https://github.com/lh3/minimap2

QualiMap http://qualimap.bioinfo.cipf.es/doc_html/index.html

samtools http://www.htslib.org