This tutorial describes how to produce zOTUs and an abundance table from the raw sequenced reads generated for ITS2 region.
The raw data are used in in FAST pipeline.
wget https://github.com/ZeweiSong/FAST_example/archive/v1.2.tar.gz tar xf v1.2.tar.gz
To automate this tutorial we suggest to set an environment that includes:
Install:
vsearch
- Github link [10.7717/peerj.2584]trimmomatic
- Installation instructions [10.1093/bioinformatics/btu170]ITSx
- Installation instructions [10.1111/2041-210X.12073]HMMER
- Installation instructions [10.1093/nar/gkt263]ITS2 TAG.ME model
- Download modelsfind FAST_example-1.2/ITS2/ -name "*.fastq" -exec mv {} . \; ls *_R1.fastq | cut -d "_" -f1 > list VSEARCH=/path/to/your/vsearch TRIMMOMATIC=/path/to/your/trimmomatic.jar ITSX=/path/to/your/ITSx
PS: Your hmmer directory must me exported in your PATH variable.
Trim reads to remove low quality regions.
Edit parameters in the script:
-threads
- Number of CPUs to use.TRAILING:3
- Trim the 3' region until reaches the Phred score of 3.LEADING:3
- Trim the 5' region until reaches the Phred score of 3.SLIDINGWINDOW:4:20
- Slide a window of size 4. Inside the window the mean Phred quality must to be 20 or more. If not, trim the read at that point.MINLEN:50
- Drop trimmed reads smaller than 50 bases.ILLUMINACLIP
- Find and clip Illumina adapters. The FASTA files containing some adapters sequences are found in a directory in your Trimmomatic folder.for file in `cat list`; do java -jar $TRIMMOMATIC PE -threads 10 -phred33 $file\_R1.fastq $file\_R2.fastq $file\_R1.paired.fastq.gz $file\_R1.unpaired.fastq.gz $file\_R2.paired.fastq.gz $file\_R2.unpaired.fastq.gz ILLUMINACLIP:/Users/gabriel/Downloads/Trimmomatic-0.38/adapters/adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:50; done
Merge overlapping reads to create a contiguous sequence that represents the whole amplicon.
--fastq_mergepairs
: Call the merging function using FASTQ files, and the first argument is the forward read.
--reverse
: Argument to declare the reverse FASTQ file.
--fastaout
: Output file name in FASTA format.
--relabel
: Format the FASTA headers with the declared label. This is important to generate the zOTU table with multiple samples.
After the merging, will concatenate all FASTA files into one merged.fasta
.
for file in `cat list`; do $VSEARCH --fastq_mergepairs $file\_R1.paired.fastq.gz --reverse $file\_R2.paired.fastq.gz --fastaout $file\_merged.fasta --relabel $file\:; done cat *_merged.fasta > merged.fasta
Select unique sequences.
--derep_fulllength
: Call the dereplication function that will select one unique sequence to represent all the identical sequences (100% identity and 100% coverage).
--sizeout
: Add the number of identical sequences to the FASTA header.
--output
: Output file name in FASTA format.
$VSEARCH --derep_fulllength merged.fasta --sizeout --output derep.fasta
Perform denoising according to the UNOISE version 3 algorithm by Robert Edgar.
--cluster_unoise
: Call the denoising function on the input FASTA file.
--consout
: Output consensus FASTA file.
$VSEARCH --cluster_unoise derep.fasta --consout denoise.fasta
Detect chimeras present in the FASTA file.
--uchime3_denovo
: Call the function on the input FASTA file.
--nonchimeras
: Output FASTA file containing the non-chimeric sequences.
--label
: Relabel the FASTA headers with the prefix zOTU
.
$VSEARCH --uchime3_denovo denoise.fasta --nonchimeras uchim.fasta --label zOTU
Map all sequences to the zOTU database to count them, and save the counts in a Table Separated Values zOTU file.
--usearch_global
: Call the function to map the merged.fasta
file in the database.
--db
: Define the uchim.fasta
file as the reference database.
--id
: Set 97% identity as threshold for mapping.
--otutabout
: Output TSV file.
$VSEARCH --usearch_global merged.fasta -db uchim.fasta --id 0.97 --otutabout zotu_table.txt
Use ITSx tool to extract the ITS regions from the zOTU sequences.
-i
: Input FASTA file containing the zOTU sequences.
-o
: Prefix to name the output files.
-p
: Directory containing the HMM profiles to search. This directory can be found inside your ITSx folder.
-t
: Restrict the search to Fungi models.
$ITSX -i uchim.fasta -o zOTU -p /path/to/ITSfolder/ITSx_db/HMMs/ -t F
Perfom the taxanomic classification using TAG.ME in R.
library(tagme) taxonomy = tagmeFromFasta(file = "zOTU.ITS2.fasta", db = "/path/to/your/TAG.ME/ITS2/") Loading required package: randomForest randomForest 4.6-14 Type rfNews() to see new features/changes/bug fixes. Loading required package: seqinr Starting Species... 51 assigned Starting Genus... 101 assigned Starting Family... 213 assigned Starting Order... 290 assigned Starting Class... 344 assigned Starting Phylum... 374 assigned Starting Domain... 571 assigned Printing Unassigned...