Basecalling with Guppy¶
Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies’ basecalling algorithms, and several bioinformatic post-processing features. It is provided as binaries to run on Windows, OS X and Linux platforms, as well as being integrated with MinKNOW, the Oxford Nanopore device control software.
Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy. Furthermore, Guppy now performs modified basecalling (5mC, 6mA and CpG) from the raw signal data, producing an additional FAST5 file of modified base probabilities.
The command we are using for basecalling with Guppy is:
guppy_basecaller_cpu
Let’s have a look at the usage message for guppy_basecaller_cpu:
guppy_basecaller_cpu --help
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 3.1.5+781ed57
Usage:
With config file:
guppy_basecaller_cpu -i <input path> -s <save path> -c <config file> [options]
With flowcell and kit name:
guppy_basecaller_cpu -i <input path> -s <save path> --flowcell <flowcell name>
--kit <kit name>
List supported flowcells and kits:
guppy_basecaller_cpu --print_workflows
Beside the path of our fast5 files (-i), the basecaller requires an output path (-s) and a config file or the flowcell/kit combination. In order to get a list of possible flowcell/kit combinations and config files, we use:
guppy_basecaller_cpu --print_workflows
flowcell kit barcoding config_name
FLO-MIN110 SQK-CAS109 dna_r10_450bps_hac
FLO-MIN110 SQK-CS9109 dna_r10_450bps_hac
FLO-MIN110 SQK-DCS108 dna_r10_450bps_hac
FLO-MIN110 SQK-DCS109 dna_r10_450bps_hac
FLO-MIN110 SQK-LRK001 dna_r10_450bps_hac
FLO-MIN110 SQK-LSK108 dna_r10_450bps_hac
FLO-MIN110 SQK-LSK109 dna_r10_450bps_hac
FLO-MIN110 SQK-LSK109-XL dna_r10_450bps_hac
FLO-MIN110 SQK-LSK110 dna_r10_450bps_hac
FLO-MIN110 SQK-LWP001 dna_r10_450bps_hac
FLO-MIN110 SQK-PCS108 dna_r10_450bps_hac
FLO-MIN110 SQK-PCS109 dna_r10_450bps_hac
FLO-MIN110 SQK-PRC109 dna_r10_450bps_hac
FLO-MIN110 SQK-PSK004 dna_r10_450bps_hac
FLO-MIN110 SQK-RAD002 dna_r10_450bps_hac
FLO-MIN110 SQK-RAD003 dna_r10_450bps_hac
FLO-MIN110 SQK-RAD004 dna_r10_450bps_hac
FLO-MIN110 SQK-RAS201 dna_r10_450bps_hac
FLO-MIN110 SQK-RLI001 dna_r10_450bps_hac
FLO-MIN110 VSK-VBK001 dna_r10_450bps_hac
FLO-MIN110 VSK-VSK001 dna_r10_450bps_hac
FLO-MIN110 VSK-VSK002 dna_r10_450bps_hac
FLO-MIN110 SQK-16S024 included dna_r10_450bps_hac
FLO-MIN110 SQK-PCB109 included dna_r10_450bps_hac
FLO-MIN110 SQK-RBK001 included dna_r10_450bps_hac
FLO-MIN110 SQK-RBK004 included dna_r10_450bps_hac
FLO-MIN110 SQK-RLB001 included dna_r10_450bps_hac
FLO-MIN110 SQK-LWB001 included dna_r10_450bps_hac
FLO-MIN110 SQK-PBK004 included dna_r10_450bps_hac
FLO-MIN110 SQK-RAB201 included dna_r10_450bps_hac
FLO-MIN110 SQK-RAB204 included dna_r10_450bps_hac
FLO-MIN110 SQK-RPB004 included dna_r10_450bps_hac
FLO-MIN110 VSK-VMK001 included dna_r10_450bps_hac
FLO-MIN110 VSK-VMK002 included dna_r10_450bps_hac
FLO-PRO001 SQK-LSK109 dna_r9.4.1_450bps_hac_prom
FLO-PRO001 SQK-LSK109-XL dna_r9.4.1_450bps_hac_prom
FLO-PRO001 SQK-DCS109 dna_r9.4.1_450bps_hac_prom
FLO-PRO001 SQK-PCS109 dna_r9.4.1_450bps_hac_prom
FLO-PRO001 SQK-PRC109 dna_r9.4.1_450bps_hac_prom
FLO-PRO001 SQK-PCB109 included dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-LSK109 dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-LSK109-XL dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-DCS109 dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-PCS109 dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-PRC109 dna_r9.4.1_450bps_hac_prom
FLO-PRO002 SQK-PCB109 included dna_r9.4.1_450bps_hac_prom
FLO-MIN111 SQK-CAS109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-CS9109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-DCS108 dna_r10.3_450bps_hac
FLO-MIN111 SQK-DCS109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-LRK001 dna_r10.3_450bps_hac
FLO-MIN111 SQK-LSK108 dna_r10.3_450bps_hac
FLO-MIN111 SQK-LSK109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-LSK109-XL dna_r10.3_450bps_hac
FLO-MIN111 SQK-LSK110 dna_r10.3_450bps_hac
FLO-MIN111 SQK-LWP001 dna_r10.3_450bps_hac
FLO-MIN111 SQK-PCS108 dna_r10.3_450bps_hac
FLO-MIN111 SQK-PCS109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-PRC109 dna_r10.3_450bps_hac
FLO-MIN111 SQK-PSK004 dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAD002 dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAD003 dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAD004 dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAS201 dna_r10.3_450bps_hac
FLO-MIN111 SQK-RLI001 dna_r10.3_450bps_hac
FLO-MIN111 VSK-VBK001 dna_r10.3_450bps_hac
FLO-MIN111 VSK-VSK001 dna_r10.3_450bps_hac
FLO-MIN111 VSK-VSK002 dna_r10.3_450bps_hac
FLO-MIN111 SQK-16S024 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-PCB109 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RBK001 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RBK004 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RLB001 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-LWB001 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-PBK004 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAB201 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RAB204 included dna_r10.3_450bps_hac
FLO-MIN111 SQK-RPB004 included dna_r10.3_450bps_hac
FLO-MIN111 VSK-VMK001 included dna_r10.3_450bps_hac
FLO-MIN111 VSK-VMK002 included dna_r10.3_450bps_hac
FLO-FLG001 SQK-CAS109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-CS9109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-DCS108 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-DCS109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LRK001 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LSK108 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LSK109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LSK109-XL dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LSK110 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LWP001 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PCS108 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PCS109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PRC109 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PSK004 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAD002 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAD003 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAD004 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAS201 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RLI001 dna_r9.4.1_450bps_hac
FLO-FLG001 VSK-VBK001 dna_r9.4.1_450bps_hac
FLO-FLG001 VSK-VSK001 dna_r9.4.1_450bps_hac
FLO-FLG001 VSK-VSK002 dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-16S024 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PCB109 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RBK001 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RBK004 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RLB001 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-LWB001 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-PBK004 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAB201 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RAB204 included dna_r9.4.1_450bps_hac
FLO-FLG001 SQK-RPB004 included dna_r9.4.1_450bps_hac
FLO-FLG001 VSK-VMK001 included dna_r9.4.1_450bps_hac
FLO-FLG001 VSK-VMK002 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-CAS109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-CS9109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-DCS108 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-DCS109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LRK001 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LSK108 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LSK109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LSK109-XL dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LSK110 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LWP001 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PCS108 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PCS109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PRC109 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PSK004 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAD002 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAD003 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAD004 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAS201 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RLI001 dna_r9.4.1_450bps_hac
FLO-MIN106 VSK-VBK001 dna_r9.4.1_450bps_hac
FLO-MIN106 VSK-VSK001 dna_r9.4.1_450bps_hac
FLO-MIN106 VSK-VSK002 dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-16S024 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PCB109 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RBK001 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RBK004 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RLB001 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LWB001 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-PBK004 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAB201 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RAB204 included dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-RPB004 included dna_r9.4.1_450bps_hac
FLO-MIN106 VSK-VMK001 included dna_r9.4.1_450bps_hac
FLO-MIN106 VSK-VMK002 included dna_r9.4.1_450bps_hac
FLO-PRO001 SQK-RNA002 rna_r9.4.1_70bps_hac_prom
FLO-PRO002 SQK-RNA002 rna_r9.4.1_70bps_hac_prom
FLO-FLG001 SQK-RNA001 rna_r9.4.1_70bps_hac
FLO-FLG001 SQK-RNA002 rna_r9.4.1_70bps_hac
FLO-MIN106 SQK-RNA001 rna_r9.4.1_70bps_hac
FLO-MIN106 SQK-RNA002 rna_r9.4.1_70bps_hac
FLO-MIN107 SQK-RNA001 rna_r9.4.1_70bps_hac
FLO-MIN107 SQK-RNA002 rna_r9.4.1_70bps_hac
FLO-MIN107 SQK-DCS108 dna_r9.5_450bps
FLO-MIN107 SQK-DCS109 dna_r9.5_450bps
FLO-MIN107 SQK-LRK001 dna_r9.5_450bps
FLO-MIN107 SQK-LSK108 dna_r9.5_450bps
FLO-MIN107 SQK-LSK109 dna_r9.5_450bps
FLO-MIN107 SQK-LSK308 dna_r9.5_450bps
FLO-MIN107 SQK-LSK309 dna_r9.5_450bps
FLO-MIN107 SQK-LSK319 dna_r9.5_450bps
FLO-MIN107 SQK-LWP001 dna_r9.5_450bps
FLO-MIN107 SQK-PCS108 dna_r9.5_450bps
FLO-MIN107 SQK-PCS109 dna_r9.5_450bps
FLO-MIN107 SQK-PSK004 dna_r9.5_450bps
FLO-MIN107 SQK-RAD002 dna_r9.5_450bps
FLO-MIN107 SQK-RAD003 dna_r9.5_450bps
FLO-MIN107 SQK-RAD004 dna_r9.5_450bps
FLO-MIN107 SQK-RAS201 dna_r9.5_450bps
FLO-MIN107 SQK-RLI001 dna_r9.5_450bps
FLO-MIN107 VSK-VBK001 dna_r9.5_450bps
FLO-MIN107 VSK-VSK001 dna_r9.5_450bps
FLO-MIN107 VSK-VSK002 dna_r9.5_450bps
FLO-MIN107 SQK-LWB001 included dna_r9.5_450bps
FLO-MIN107 SQK-PBK004 included dna_r9.5_450bps
FLO-MIN107 SQK-RAB201 included dna_r9.5_450bps
FLO-MIN107 SQK-RAB204 included dna_r9.5_450bps
FLO-MIN107 SQK-RBK001 included dna_r9.5_450bps
FLO-MIN107 SQK-RBK004 included dna_r9.5_450bps
FLO-MIN107 SQK-RLB001 included dna_r9.5_450bps
FLO-MIN107 SQK-RPB004 included dna_r9.5_450bps
FLO-MIN107 VSK-VMK001 included dna_r9.5_450bps
FLO-MIN107 VSK-VMK002 included dna_r9.5_450bps
FLO-PRO111 SQK-LSK109 dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-LSK109-XL dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-LSK110 dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-DCS109 dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-PCS109 dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-PRC109 dna_r10.3_450bps_hac_prom
FLO-PRO111 SQK-PCB109 included dna_r10.3_450bps_hac_prom
Our dataset was generated using the FLO-MIN106 flowcell, and the LSK109 kit, pick the appropriate model.
Note: You need to add “.cfg” to the model name in the command line or the basecaller exits with an error because the .cfg could not be found.
Try to get the basecalling running, use the following optional arguments:
--compress_fastq Compress fastq output files with gzip.
--num_callers arg Number of parallel basecallers to create.
--cpu_threads_per_caller arg Number of CPU worker threads per basecaller.
Important: You have 14 CPUs in your virtual machine. The number of callers multiplied with the number of CPU threads per caller should not exceed 14.
The output directory should be named:
~/workdir/data_artic/basecall_tiny/
If you are stuck, you can skip to the next page and get help with the command.