ITS analysis and Classification



This tutorial describes how to produce zOTUs and an abundance table from the raw sequenced reads generated for ITS2 region.

Download reads

The raw data are used in in FAST pipeline.

wget https://github.com/ZeweiSong/FAST_example/archive/v1.2.tar.gz
tar xf v1.2.tar.gz

Set environment

To automate this tutorial we suggest to set an environment that includes:

  • defining the list of files to be processed.
  • create aliases to the necessary tools.

Install:

find FAST_example-1.2/ITS2/ -name "*.fastq" -exec mv {} . \;
ls *_R1.fastq | cut -d "_" -f1 > list
VSEARCH=/path/to/your/vsearch
TRIMMOMATIC=/path/to/your/trimmomatic.jar
ITSX=/path/to/your/ITSx

PS: Your hmmer directory must me exported in your PATH variable.

Run quality trimming

Trim reads to remove low quality regions.
Edit parameters in the script:

  • -threads - Number of CPUs to use.
  • TRAILING:3 - Trim the 3' region until reaches the Phred score of 3.
  • LEADING:3 - Trim the 5' region until reaches the Phred score of 3.
  • SLIDINGWINDOW:4:20 - Slide a window of size 4. Inside the window the mean Phred quality must to be 20 or more. If not, trim the read at that point.
  • MINLEN:50 - Drop trimmed reads smaller than 50 bases.
  • ILLUMINACLIP - Find and clip Illumina adapters. The FASTA files containing some adapters sequences are found in a directory in your Trimmomatic folder.
for file in `cat list`; do java -jar $TRIMMOMATIC PE -threads 10 -phred33 $file\_R1.fastq $file\_R2.fastq $file\_R1.paired.fastq.gz $file\_R1.unpaired.fastq.gz $file\_R2.paired.fastq.gz $file\_R2.unpaired.fastq.gz ILLUMINACLIP:/Users/gabriel/Downloads/Trimmomatic-0.38/adapters/adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:50; done

Vsearch pipeline

Merge reads

Merge overlapping reads to create a contiguous sequence that represents the whole amplicon.
--fastq_mergepairs: Call the merging function using FASTQ files, and the first argument is the forward read.
--reverse: Argument to declare the reverse FASTQ file.
--fastaout: Output file name in FASTA format.
--relabel: Format the FASTA headers with the declared label. This is important to generate the zOTU table with multiple samples.

After the merging, will concatenate all FASTA files into one merged.fasta.

for file in `cat list`; do $VSEARCH --fastq_mergepairs $file\_R1.paired.fastq.gz --reverse $file\_R2.paired.fastq.gz --fastaout $file\_merged.fasta --relabel $file\:; done

cat *_merged.fasta > merged.fasta

Dereplicate

Select unique sequences.
--derep_fulllength: Call the dereplication function that will select one unique sequence to represent all the identical sequences (100% identity and 100% coverage).
--sizeout: Add the number of identical sequences to the FASTA header.
--output: Output file name in FASTA format.

$VSEARCH --derep_fulllength merged.fasta --sizeout --output derep.fasta

Denoise

Perform denoising according to the UNOISE version 3 algorithm by Robert Edgar.
--cluster_unoise: Call the denoising function on the input FASTA file.
--consout: Output consensus FASTA file.

$VSEARCH --cluster_unoise derep.fasta --consout denoise.fasta

Remove chimeras

Detect chimeras present in the FASTA file.
--uchime3_denovo: Call the function on the input FASTA file.
--nonchimeras: Output FASTA file containing the non-chimeric sequences.
--label: Relabel the FASTA headers with the prefix zOTU.

$VSEARCH  --uchime3_denovo denoise.fasta --nonchimeras uchim.fasta --label zOTU

Create a zOTU Count Table

Map all sequences to the zOTU database to count them, and save the counts in a Table Separated Values zOTU file.
--usearch_global: Call the function to map the merged.fasta file in the database.
--db: Define the uchim.fasta file as the reference database.
--id: Set 97% identity as threshold for mapping.
--otutabout: Output TSV file.

$VSEARCH --usearch_global merged.fasta -db uchim.fasta --id 0.97 --otutabout zotu_table.txt

Extract ITS regions

Use ITSx tool to extract the ITS regions from the zOTU sequences.
-i: Input FASTA file containing the zOTU sequences.
-o: Prefix to name the output files.
-p: Directory containing the HMM profiles to search. This directory can be found inside your ITSx folder.
-t: Restrict the search to Fungi models.

$ITSX -i uchim.fasta -o zOTU -p /path/to/ITSfolder/ITSx_db/HMMs/ -t F

TAG.ME

Perfom the taxanomic classification using TAG.ME in R.

library(tagme)

taxonomy = tagmeFromFasta(file = "zOTU.ITS2.fasta", db = "/path/to/your/TAG.ME/ITS2/")

Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Loading required package: seqinr

Starting Species... 51  assigned

Starting Genus... 101  assigned

Starting Family... 213  assigned

Starting Order... 290  assigned

Starting Class... 344  assigned

Starting Phylum... 374  assigned

Starting Domain... 571  assigned

Printing Unassigned...