16s datasets are great for identifying microbial taxa in a sample and quantifying abundance of those microbes but they’re not very helpful for understanding what functions the microbes are performing. PICRUST helps us predict these functions. PICRUSt uses an extended ancestral-state reconstruction algorithm to identify a closely related microbe with known full genome sequence to each OTU. It uses the full genome sequence information to predict which gene families are present in the microbial community.
Here is the github site: http://picrust.github.io/picrust/index.html Here is the link to the publication: http://www.nature.com/nbt/journal/v31/n9/abs/nbt.2676.html
Get closed reference otu table
PICRUST can only work with OTUS that were picked by closed reference OTU picking.
filter_otus_from_otu_table.py -i ~/16s_analysis/otu_table_mc2_w_tax_no_pynast_failures_no_chimeras_frequency_filtered.biom -o ~/16s_analysis/closed_otu_table.biom --negate_ids_to_exclude -e /usr/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
How many closed ref OTUs?
biom summarize-table -i closed_otu_table.biom
normalize by picrust’s method
normalize_by_copy_number.py -i ~/16s_analysis/closed_otu_table.biom -o ~/16s_analysis/closed_otu_table_norm.biom
The output of PICRUST is the same format as an OTU table but instead of OTUS, KEGG Orthology terms are used. For a tab delimited output:
predict_metagenomes.py -f -i ~/16s_analysis/closed_otu_table_norm.biom -o ~/16s_analysis/metagenome_predictions.txt -a nsti_per_sample.txt
For a .biom output:
predict_metagenomes.py -i ~/16s_analysis/closed_otu_table_norm.biom -o ~/16s_analysis/metagenome_predictions.biom -a nsti_per_sample.txt
We can take a look at the tab delimited otu table
head metagenome_predictions.txt
And count the rows to see how many KO were predicted
wc -l metagenome_predictions.txt
What do you think about the number of KOs predicted considering the number of OTUs we’re using in this analysis?
Collapse to higher KO hierarchy term
categorize_by_function.py -f -i metagenome_predictions.biom -c KEGG_Pathways -l 3 -o metagenome_predictions.L3.txt
We also want this output in .biom format to use in R
categorize_by_function.py -i metagenome_predictions.biom -c KEGG_Pathways -l 3 -o metagenome_predictions.L3.biom
Move predicted_metagenomes.L3.biom to your desktop to use in R
R SCRIPT
library("phyloseq"); packageVersion("phyloseq")
library("DESeq2")
packageVersion("DESeq2")
library("ggplot2")
theme_set(theme_bw())
library('biom')
x = read_biom("metagenome_predictions.L3.biom")
otumat = as(biom_data(x), "matrix")
OTU = otu_table(otumat, taxa_are_rows=TRUE)
mapping <- import_qiime_sample_data(mapfilename = 'R_map.txt')
phylo <- merge_phyloseq(OTU, mapping)
phylo
###############################################################################
###Test for DE KO terms between individuals that got sick and those that didn't
###############################################################################
final_pheno = phyloseq_to_deseq2(phylo, ~ Final_phenotype)
final_pheno_results = DESeq(final_pheno, test="Wald")
final_pheno_res = results(final_pheno_results)
summary(final_pheno_res)
head(final_pheno_res)
alpha = 0.05
final_pheno_sigtab = final_pheno_res[which(final_pheno_res$padj < alpha), ]
final_pheno_sigtab= cbind(as(final_pheno_sigtab, "data.frame"), as(tax_table(phylo)[rownames(final_pheno_sigtab), ], "matrix"))
head(final_pheno_sigtab)
final_pheno_sigtab
write.table(final_pheno_sigtab, "Final_pheno_L3.txt", sep="\t")