R for Bioinformatic Analyses

Hannah Tavalire and Bill Cresko - University of Oregon

January 2019 - Cesky Krumlov

Lecture 1 - Using R for Biostatistical Analyses

But first a beautiful chair

Before we talk about how amazing R is…

  • navigate into this directory: ~/workshop_materials/evomics_stats_2019/
  • type ‘git pull’

Why use R?

  • R is a statistical programming language (derived from S)
  • Superb data management & graphics capabilities
  • You can write your own functions
  • Powerful and flexible
  • Runs on all computer platforms
  • Well established system of packages and documentation
  • Active development and dedicated community
  • Can use a nice GUI front end such as Rstudio
  • Reproducibility
    • keep your scripts to see exactly what was done
    • distribute these with your data
    • embed your R analyses in polished RMarkdown files
  • FREE

R resources

Running R

  • Need to make sure that you have R installed
  • Run R from the command line
    • just type R
    • can run it locally as well as on clusters
  • Install a R Integrated Development Environment (IDE)
    • RStudio: http://www.rstudio.com
    • Makes working with R much easier, particularly for a new R user
    • Run on Windows, Mac or Linux OS
    • We’re running as a server on the AWS instances

RStudio

Exercise 1.1 - Exploring RStudio

  • Open RStudio by adding :8787 to your AMI url
  • Take a few minutes to familiarize yourself with the Rstudio environment by locating the following features:
    • See what types of new files can be made in Rstudio by clicking the top left icon- open a new R script.
    • The windows clockwise from top left are: the code editor, the workspace and history, the plots and files window, and the R console.
    • In the plots and files window, click on the packages and help tabs to see what they offer.
  • Now open the file called Exercises_for_R_Lectures.Rmd in /workshop_materials/evomics_stat_2019/03.Exercises/
    • This file will serve as your digital notebook for parts of the workshop and contains the other exercises.

Introduction to RMarkdown

RMarkdown

Exercise 1.2 - Intro to RMarkdown Files

  • Take a few minutes to familiarize yourself with RMarkdown files by completing exercise 1.2 in your exercises document.

BASICS of R

BASICS of R

  • Commands can be submitted through
    • terminal, console or scripts
    • can be embedded as code chunks in RMarkdown
  • On these slides evaluating code chunks and showing output
    • shown here after the two # symbols
    • the number of output items is in []
  • R follows the normal priority of mathematical evaluation (PEDMAS)

BASICS of R

Input code chunk and then output

## [1] 16

Input code chunk and then output

## [1] 16

Assigning Variables

  • A better way to do this is to assign variables
  • Variables are assigned values using the <- operator.
  • Variable names must begin with a letter, but other than that, just about anything goes.
  • Do keep in mind that R is case sensitive.

Assigning Variables

## [1] 6
## [1] 4

These do not work

Arithmetic operations on functions

  • Arithmetic operations can be performed easily on functions as well as numbers.
## [1] 14
## [1] 144
## [1] 2.484907

Arithmetic operations on functions

  • Note that the last of these - log - is a built in function of R, and therefore the object of the function needs to be put in parentheses
  • These parentheses will be important, and we’ll come back to them later when we add arguments after the object in the parentheses
  • The outcome of calculations can be assigned to new variables as well, and the results can be checked using the print command

Arithmetic operations on functions

## [1] 67
## [1] 69022864

STRINGS

  • Operations can be performed on character variables as well
  • Note that “characters” need to be set off by quotation marks to differentiate them from numbers
  • The c stands for concatenate
  • Note that we are using the same variable names as we did previously, which means that we’re overwriting our previous assignment
  • A good rule of thumb is to use new names for each variable, and make them short but still descriptive

STRINGS

## [1] "I Love"
## [1] "Biostatistics"
## [1] "I Love"        "Biostatistics"

VECTORS

  • In general R thinks in terms of vectors
    • a list of characters, factors or numerical values (“I Love”)
    • it will benefit any R user to try to write scripts with that in mind
    • it will simplify most things
  • Vectors can be assigned directly using the ‘c()’ function and then entering the exact values.

VECTORS

##  [1]  2  3  4  2  1  2  4  5 10  8  9
##  [1]  5  6  7  5  4  5  7  8 13 11 12

FACTORS

  • The vector x is now what is called a list of character values (“I Love”).
  • Sometimes we would like to treat the characters as if they were units for subsequent calculations.
  • These are called factors, and we can redefine our character variables as factors.
  • This might seem a bit strange, but it’s important for statistical analyses where we might want to see the mean or variance for two different treatments.

FACTORS

## [1] I Love
## Levels: I Love
  • Note that factor levels are reported alphabetically

FACTORS

  • We can also determine how R “sees” a variable using str() or class() functions.
  • This is a useful check when importing datasets or verifying that you assigned a class correctly
##  chr "I Love"
## [1] "character"

Types or ‘classes’ of vectors of data