Table of contents

General

The R and adegenet exercise will be an introduction to the R statistical computing language for those who do not yet have a lot of experience with it. We will discuss the basic features of R, how to do calculations, what objects are, and how to manipulate them. We will also cover how to get data into and out of R and how to create basic plots. In the second part of this tutorial you will use R and the package adegenet to perform a Principal Components Analysis as an example of applying R to genomic data. More details on PCA and how to apply it to genomic data will be covered in another exercise using the program smartPCA.

The tutorial will use the integrated development environment RStudio and the R script R_tutorial.R. Most of the code in R_tutorial.R does not depend on any other files, but the PCA using adegenet requires data that need to be loaded into R (sample info, genotype data). We will go through the code line by line and discuss what is happening and how you can make use of it when writing your own code. A presentation will accompany the exercise and provide additional information. You can choose to download the files from this page and go through the exercise using RStudio (or whichever way you prefer to use R) locally, or you can access RStudio installed on the server via your browser.

Going Further

Many R-packages are available for analysing genetic data, especially in the bioconductor project. Please be aware that bioconductor packages need to be installed differently to the packages from the Comprehensive R Archive Network (CRAN). You can find instructions here. On CRAN there are very helpful “task views” on specific topics, e.g. statistical genetics or phylogenetics, describing the most important packages and what they can be used for. Further learning material on the general use of R from other workshops is available here.

 

Survey

Please let us know how you got along with the exercise in this very short opinion poll.