Length filtering¶
First of all, if not active, activate the artic-ncov2019 conda environment:
conda activate artic-ncov2019
The we will use the command:
artic guppyplex
to perform a length filtering on the basecalled data and combine all reads into one single file:
usage: artic guppyplex [-h] [-q] --directory directory
[--max-length max_length] [--min-length min_length]
[--quality quality] [--sample sample]
[--skip-quality-check] [--prefix PREFIX]
[--output output]
optional arguments:
-h, --help show this help message and exit
-q, --quiet Do not output warnings to stderr
--directory directory
Basecalled and demultiplexed (guppy) results directory
--max-length max_length
remove reads greater than read length
--min-length min_length
remove reads less than read length
--quality quality remove reads against this quality filter
--sample sample sampling frequency for random sample of sequence to
reduce excess
--skip-quality-check Do not filter on quality score (speeds up)
--prefix PREFIX Prefix for guppyplex files
--output output FASTQ file to write
Task: Use artic guppyplex
to filter for reads with a minimum size of 400 and a maximum size of 700. Your output files should be named:
~/workdir/data_artic/basecall_filtered_01.fastq
Do the filtering for the first (01) of the 5 datasets only and stick to that dataset for the next parts. When you are quick, you can repeat the procedure for the remaining datasets later.
References¶
ARTIC bioinformatics SOP https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html