This is the first section of the phyloseq lab. In this section you will learn how to import data from a common format and how to manipulate, investigate, and merge data together into a single “experiment-level” object. You will also learn how to save this object into a compressed and self-describing file, that you can easily and quickly load into a new R session without repeating the entire parsing and organizing process.
The phyloseq object that you create and save in this section will be used in the next section; but don’t worry! If you have trouble or don’t finish a copy of the object we are trying to make in this tutorial is included with these materials, the file named: mp-phyloseq-lab-00.RDS
A secondary goal of this section is for you to become familiar or even comfortable reading and using the documentation of R functions that accompanies every good R package. Don’t give up too quickly! Every time you get stuck and start searching for an example you are practicing how to get around R on your own. Each of the functions we will use are well-documented, and the tutorial text provides each main function that you will use.
In programming there are many ways to accomplish a task. I have provided an answer key in the form of a working example of code that accomplishes the task being asked of you. If your version of the code does not match exactly, that is okay, yours is still correct as long as it works! Please try not to use the answer key until you have exhausted your other options, including an earnest attempt at exploring the documentation of the functions mentioned in each prompt, as well as your own trial-and-error attempts to accomplish the task.
wget
, tar
, etc..tgz
), or uncompressed file. Note this location or save the path as an object. If you do not have the relative path immediately available from the earlier QIIME lab, you can download the lab to the current directory from the following link: ftp://ftp.microbio.me/qiime/tutorial_files/moving_pictures_tutorial-1.9.0.tgz
Hint, you do not need to leave your R session to do this. R can initiate system commands (bash) using the system()
function. So in this case: system("wget ftp://ftp.microbio.me/qiime/tutorial_files/moving_pictures_tutorial-1.9.0.tgz")
system("tar -xvzf moving_pictures_tutorial-1.9.0.tgz")
There is a specific biom file we want to work with. The QIIME tutorial might have many.
otu_table_mc2_w_tax_no_pynast_failures.biom
. Hint: use the autocomplete feature of system paths by typing the tab button when the cursor is within a pair of quotation marks. The path within the tutorial is: moving_pictures_tutorial-1.9.0/illumina/precomputed-output/otus
fullInputPath
file.exists()
file.info()
library()
command.packageVersion()
command.We want to use a special import function that is not yet fully integrated into phyloseq. This is to accomodate the new version of the biom format that is based on HDF5 and has become the default output from QIIME.
biomformat::read_biom()
, and save the result as an object named biom1
. Make sure to specify the package with the ::
notation, because the older biom package also includes a read_biom()
function. Save this data, the output of biomformat::read_biom()
as biom1
. (The phyloseq package will be migrating to biomformat in the next release.)import_biom()
function can already properly interpret biom1
and convert it to proper phyloseq format. Save this, the output of import_biom()
as mp0
.tax_table(mp0) <- tax_table(mp0)[, 1:7]
This is to fix a weird importing artifact. But now you know how to do it :-)print()
function, or simply typing the object name and pressing enter. What does it show you?rank_names()
function. What are the taxonomic ranks available in this data?If not, it should be in: moving_pictures_tutorial-1.9.0/illumina/map.tsv
import_qiime_sample_data()
function, and save this as an object named qsd
.merge_phyloseq()
function.Read out the sample variables that are now present in your object.
Read phylogenetic tree. Add it to the data object.
treeFile1
read_tree()
function to import the tree. Save it as tree1
.mp2
. Again, to do this you’ll use merge_phyloseq()
with arguments mp1
and tree1
.The OTU centroids, often called the “representative sequences”, is the set of sequences that represent the center of an OTU in sequence space. There is one such sequence per OTU, and they are all different from one another. In principle they should be at least as dissimilar from one another as the OTU radius that you defined during OTU clustering. This is commonly 3% sequence dissimilarity, but this valuse is arbitrary, and you may select a different, smaller value in practce.
moving_pictures_tutorial-1.9.0/illumina/precomputed-output/otus/rep_set.fna
.Biostrings::readDNAStringSet()
function for reading fasta files, and save this DNAStringSet object as bs1
.names(bs1) <- gsub("\\s.+$", "", names(bs1))
bs1
to your data object using the merge_phyloseq()
function, and save the result as mp3
.ntaxa()
function on the unmerged objects and on mp3
.intersect(taxa_names(object1), taxa_names(object2))
. Note that if there are many taxa, you might want to save the result of intersect()
as an object and query it in pieces, or its length()
, rather than fill up your terminal with OTU identifiers.You’ve got an object you want to use for later analysis (or many of them!); AND it is well-organized and self-describing with a tree, sequences, taxonomy, sample data, and of course the counts of each OTU in each sample (OTU table). Before doing further analysis, let’s end this session by saving our hard-earned object as a compressed, serialized R object. These objects
Save the file as mp-phyloseq-lab.RDS, using the saveRDS()
function.
mapply()
function to loop over two vectors vectors simultaneously, one with the objects (a list) and the other a character vector of the file names you will store.mget()
and paste0()
functions to store a list of the 4 mp*
objects we created is a helpful fist step.If you want to store many objects as one file, or you want to store all the objects at the end of your session, then you may find that save()
or save.image()
to be easier to use than saveRDS()
. The files are usually given the extension .RData
instead of .RDS
, and the reverse operation (loading the stored data into R session) is accomplished with the load()
function, rather than readRDS()
.
Try for yourself!
This is an R Markdown document (or the HTML result of one). Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
You can also embed plots, for example: