Introduction to R

Karin, Madee, Kalle, Nikos

January 10, 2020

Content

  • about R and RStudio
  • data handling:
    • data collection and organization
    • manipulation and analysis
  • visualisation
    • interpretation and presentation

About R and RStudio

Goals and purposes with using R

  • powerful, flexible/customisable
  • easy for sharing and developping
  • reproducible science: scripts and markdowns

R help & the comunity

  • Stackoverflow
  • man pages and examples
  • cheat sheets

Connecting to the R server

In you own computer’s browser, open a new tab and type your AMI address followed by “:8787”, e.g. 

ec2-…compute-1.amazonaws.com:8787

username: evomics
password: genomics2020

scripts and markdowns

Markup languages (HTML, LaTeX, markdown) are used to format text. Both exercises and presentations during this lab have been produced using the rmarkdown package. The code for these are available in the files Intro_to_R.Rmd and Intro_to_R_EXERCISE.Rmd. For longer-form writing, including reports and papers, do have a look at the bookdown package.

Data handling: collecting and organisation

Conventions about data format

  • Observations are entered in rows
  • Variables are entered in columns
  • A column of data should contain only one data type

Data handling: collecting and organisation

Best practice

  • Store a copy of data in nonproprietary formats
  • Leave an uncorrected file when doing analyses
  • Maintain effective metadata about the data
  • Create folders for relevant task to keep the overview over your analyses

Error messages

  • Warnings should be acknowledged but can be ignored.
  • Specific hint? e.g. 
    • wrong data format, characters/factors instead of numbers
    • library not loaded, package not installed
  • Try running example code, and check the example input
  • Google is your friend

Getting started: Analyses

How R works: Operators

Arithmetic (basic math): +, -, *, /, ^, …

## [1] 39
## [1] 14
## [1] 144

How R works: Operators

  • Arithmetic (basic math): '+' '-' '*' '/' '^' etc.
  • Relational: '<' '>' '==' '!='
  • Logical: '!' '&' '&&' '|' '||'
  • Assignment: '<-' '='

How R works: Operators

Assignment (<-, =)

## [1] 39

How R works: Functions

function(data, additional arguments)

## [1] 39
## [1] 39

How R works: Putting it all together

## [1] 12 13 14
## [1] 39

Functions to start with: Importing data and inspecting it

BREAK for exercise: read in your files

Open the Intro_to_R_EXERCISE.html file in your browser.
Read through the introduction and complete the “Load data” exercise (until “Inspect and modify the data frames”).

Data structures

Data structures: vector

generating a vector

##  num [1:3] 12 13 14
## [1] 3
##  chr [1:4] "platypus" "nudibranch" "potoo" "peacock mantis shrimp"
## [1] "peacock mantis shrimp"

Data structures: vector

modifying a vector:

## [1] 12 13
## [1] "platypus"   "nudibranch" "potoo"

Data structures: data.frame

generating a data frame

## 'data.frame':    5 obs. of  3 variables:
##  $ day : int  6 7 8 9 10
##  $ A.M.: Factor w/ 2 levels "L","P": 1 1 2 1 2
##  $ P.M.: Factor w/ 2 levels "L","P": 1 2 2 2 2
##   day A.M. P.M.
## 1   6    L    L
## 2   7    L    P
## 3   8    P    P
## 4   9    L    P
## 5  10    P    P

Data structures: data.frame

modifying data frames: adding to a data frame

##   day A.M. P.M. night
## 1   6    L    L     P
## 2   7    L    P     P
## 3   8    P    P     P
## 4   9    L    P     P
## 5  10    P    P     P
## 6  11    L    P     P
## 7  12    P    P  <NA>

Data structures: data.frame

modifying data frames: removing from a data frame
data.frame[rows, columns]

##   day A.M. night PM
## 1   6    L     P  L
## 2   7    L     P  P
## 3   8    P     P  P
## 4   9    L     P  P
## 5  10    P     P  P
## 6  11    L     P  P

Data structures: data.frame

Select parts of a data frame

## [1] "P" "P" "P" "P" "P" "P"
##   night
## 1     P
## 2     P
## 3     P
## 4     P
## 5     P
## 6     P

Data structures: data.frame

Select parts of a data frame
data.frame$column, data.frame[rows, columns]

##   A.M. night PM
## 1    L     P  L
## 3    P     P  P
## 5    P     P  P

Data structures: data.frame

Select parts of a data frame
data.frame$column, data.frame[rows, columns]

##   day A.M. night PM
## 1   6    L     P  L
## 2   7    L     P  P
## 4   9    L     P  P
## 6  11    L     P  P
##   day A.M. night PM
## 1   6    L     P  L
## 2   7    L     P  P
## 3   9    L     P  P
## 4  11    L     P  P

Data structures: data.frame

modifying data frames
rownames and column names

## [1] "day"   "AM"    "night" "PM"
## [1] "1" "2" "3" "4" "5" "6"

Data structures: data.frame

##     date session1 session2 session3
## mon    6        L        L        P
## tue    7        L        P        P
## wed    8        P        P        P
## thu    9        L        P        P
## fri   10        P        P        P
## sat   11        L        P        P

Functions to start with: Importing data and inspecting it

BREAK for exercise: look at your files

Continue with your exercise. Inspect and modify the files you loaded previously (stop at the visualisation).

tidyverse functionalities

tidyverse functionalities

joining data frames

##    date session1 session2 session3          faculty
## 1     6        L        L        P    Sonya Dyhrman
## 2     7        L        P        P        Mike Zody
## 3     7        L        P        P      Sophie Shaw
## 4     8        P        P        P        Mike Zody
## 5     9        L        P        P Antoine Limasset
## 6     9        L        P        P  Camille Marchet
## 7     9        L        P        P      Sergey Nurk
## 8    10        P        P        P Malachi Griffith
## 9    11        L        P        P     Evan Eichler
## 10   11        L        P        P      Guy Leonard
## 11   11        L        P        P  Josephine Paris

tidyverse functionalities

joining data frames

##    date session1 session2 session3          faculty
## 1     6        L        L        P    Sonya Dyhrman
## 2     7        L        P        P        Mike Zody
## 3     7        L        P        P      Sophie Shaw
## 4     8        P        P        P        Mike Zody
## 5     9        L        P        P Antoine Limasset
## 6     9        L        P        P  Camille Marchet
## 7     9        L        P        P      Sergey Nurk
## 8    10        P        P        P Malachi Griffith
## 9    11        L        P        P     Evan Eichler
## 10   11        L        P        P      Guy Leonard
## 11   11        L        P        P  Josephine Paris
## 12   13     <NA>     <NA>     <NA>       Brian Haas
## 13   17     <NA>     <NA>     <NA>      Chris Wheat
## 14   16     <NA>     <NA>     <NA> Christa Schleper

tidyverse functionalities

ggplot2

tidyverse functionalities

reshape2: melt

##   date variable value
## 1    6 session1     L
## 2    7 session1     L
## 3    8 session1     P
## 4    9 session1     L
## 5   10 session1     P
## 6   11 session1     L

tidyverse functionalities

tidyr: pivot_longer

## # A tibble: 6 x 3
##    date name     value
##   <dbl> <chr>    <chr>
## 1     6 session1 L    
## 2     6 session2 L    
## 3     6 session3 P    
## 4     7 session1 L    
## 5     7 session2 P    
## 6     7 session3 P

tidyverse functionalities

ggplot2

tidyverse functionalities

ggplot2

ggplot2

Additional resources:
https://www.r-graph-gallery.com/
https://ggplot2.tidyverse.org/
https://r4ds.had.co.nz/data-visualisation.html

Bonus tutorials

Flow control with while loop

See for loops, if-else statements

##     date session1 session2 session3 cumulative_working_hours
## mon    6        L        L        P                        9
## tue    7        L        P        P                       18
## wed    8        P        P        P                       27
## thu    9        L        P        P                       36
## fri   10        P        P        P                       45
## sat   11        L        P        P                       54

Custom functions

## [1] 8

Pipes

Three ways of doing the same thing

##   date
## 1    8
## 2   10
##   date
## 1    8
## 2   10
##   date
## 1    8
## 2   10

git

  • Git is a tool for version control, not specific to R
  • It makes collaborative software development and tracking software changes a lot easier
  • Many bioinformatics tools have their code published and versioned on GitHub/GitLab/BitBucket/etc
  • git clone URL copies a git repo to your machine
  • git add filename adds the latest version of file filename to your repo
  • git commit -m "My first commit message" creates a new “version” containing the files you’ve added