Content
- about R and RStudio
- data handling:
- data collection and organization
- manipulation and analysis
- visualisation
- interpretation and presentation
man
pages and examplesIn you own computer’s browser, open a new tab and type your AMI address followed by “:8787”, e.g.
ec2-…compute-1.amazonaws.com:8787
username: evomics
password: genomics2020
Markup languages (HTML, LaTeX, markdown) are used to format text. Both exercises and presentations during this lab have been produced using the rmarkdown package. The code for these are available in the files Intro_to_R.Rmd and Intro_to_R_EXERCISE.Rmd. For longer-form writing, including reports and papers, do have a look at the bookdown package.
Conventions about data format
Best practice
Arithmetic (basic math): +, -, *, /, ^, …
## [1] 39
## [1] 14
## [1] 144
'+' '-' '*' '/' '^'
etc.'<' '>' '==' '!='
'!' '&' '&&' '|' '||'
'<-' '='
Assignment (<-, =)
## [1] 39
function(data, additional arguments)
## [1] 39
## [1] 39
## [1] 12 13 14
## [1] 39
Open the Intro_to_R_EXERCISE.html file in your browser.
Read through the introduction and complete the “Load data” exercise (until “Inspect and modify the data frames”).
generating a vector
## num [1:3] 12 13 14
## [1] 3
## chr [1:4] "platypus" "nudibranch" "potoo" "peacock mantis shrimp"
## [1] "peacock mantis shrimp"
modifying a vector:
## [1] 12 13
## [1] "platypus" "nudibranch" "potoo"
generating a data frame
# data <- read.csv('~/Path/to/file.csv')
data <- data.frame(day = 6:10, A.M. = c("L", "L", "P", "L", "P"), P.M. = c("L", "P",
"P", "P", "P"))
str(data) #dim(data)
## 'data.frame': 5 obs. of 3 variables:
## $ day : int 6 7 8 9 10
## $ A.M.: Factor w/ 2 levels "L","P": 1 1 2 1 2
## $ P.M.: Factor w/ 2 levels "L","P": 1 2 2 2 2
## day A.M. P.M.
## 1 6 L L
## 2 7 L P
## 3 8 P P
## 4 9 L P
## 5 10 P P
modifying data frames: adding to a data frame
data <- rbind(data, c(11, "L", "P")) # add row
# the column day is now a character
data["night"] <- "P" # add column 'night', fill with 'P'
library(dplyr)
data <- bind_rows(data, c(day = 12, A.M. = "P", P.M. = "P")) # add row with dplyr bind_rows
data
## day A.M. P.M. night
## 1 6 L L P
## 2 7 L P P
## 3 8 P P P
## 4 9 L P P
## 5 10 P P P
## 6 11 L P P
## 7 12 P P <NA>
modifying data frames: removing from a data frame
data.frame[rows, columns]
data <- data[-7, ] # remove row 7
data["PM"] <- data$P.M. # create column PM
data$P.M. <- NULL # remove column P.M.
data
## day A.M. night PM
## 1 6 L P L
## 2 7 L P P
## 3 8 P P P
## 4 9 L P P
## 5 10 P P P
## 6 11 L P P
Select parts of a data frame
## [1] "P" "P" "P" "P" "P" "P"
## night
## 1 P
## 2 P
## 3 P
## 4 P
## 5 P
## 6 P
Select parts of a data frame
data.frame$column, data.frame[rows, columns]
## A.M. night PM
## 1 L P L
## 3 P P P
## 5 P P P
Select parts of a data frame
data.frame$column, data.frame[rows, columns]
## day A.M. night PM
## 1 6 L P L
## 2 7 L P P
## 4 9 L P P
## 6 11 L P P
## day A.M. night PM
## 1 6 L P L
## 2 7 L P P
## 3 9 L P P
## 4 11 L P P
modifying data frames
rownames and column names
## [1] "day" "AM" "night" "PM"
data <- data[, c(1, 2, 4, 3)]
colnames(data) <- c("date", "session1", "session2", "session3")
rownames(data)
## [1] "1" "2" "3" "4" "5" "6"
## date session1 session2 session3
## mon 6 L L P
## tue 7 L P P
## wed 8 P P P
## thu 9 L P P
## fri 10 P P P
## sat 11 L P P
Continue with your exercise. Inspect and modify the files you loaded previously (stop at the visualisation).
joining data frames
library(tidyverse)
data$date <- as.numeric(data$date)
# join data sets
schedule <- left_join(data, speakers)
schedule
## date session1 session2 session3 faculty
## 1 6 L L P Sonya Dyhrman
## 2 7 L P P Mike Zody
## 3 7 L P P Sophie Shaw
## 4 8 P P P Mike Zody
## 5 9 L P P Antoine Limasset
## 6 9 L P P Camille Marchet
## 7 9 L P P Sergey Nurk
## 8 10 P P P Malachi Griffith
## 9 11 L P P Evan Eichler
## 10 11 L P P Guy Leonard
## 11 11 L P P Josephine Paris
joining data frames
## date session1 session2 session3 faculty
## 1 6 L L P Sonya Dyhrman
## 2 7 L P P Mike Zody
## 3 7 L P P Sophie Shaw
## 4 8 P P P Mike Zody
## 5 9 L P P Antoine Limasset
## 6 9 L P P Camille Marchet
## 7 9 L P P Sergey Nurk
## 8 10 P P P Malachi Griffith
## 9 11 L P P Evan Eichler
## 10 11 L P P Guy Leonard
## 11 11 L P P Josephine Paris
## 12 13 <NA> <NA> <NA> Brian Haas
## 13 17 <NA> <NA> <NA> Chris Wheat
## 14 16 <NA> <NA> <NA> Christa Schleper
ggplot2
reshape2: melt
## date variable value
## 1 6 session1 L
## 2 7 session1 L
## 3 8 session1 P
## 4 9 session1 L
## 5 10 session1 P
## 6 11 session1 L
tidyr: pivot_longer
## # A tibble: 6 x 3
## date name value
## <dbl> <chr> <chr>
## 1 6 session1 L
## 2 6 session2 L
## 3 6 session3 P
## 4 7 session1 L
## 5 7 session2 P
## 6 7 session3 P
ggplot2
ggplot2
Additional resources:
https://www.r-graph-gallery.com/
https://ggplot2.tidyverse.org/
https://r4ds.had.co.nz/data-visualisation.html
See for loops, if-else statements
data["cumulative_working_hours"] <- NA
n <- 0
k <- dim(data)[1]
while (n < k) {
n <- n + 1
data$cumulative_working_hours[n] <- 9 * n
}
data
## date session1 session2 session3 cumulative_working_hours
## mon 6 L L P 9
## tue 7 L P P 18
## wed 8 P P P 27
## thu 9 L P P 36
## fri 10 P P P 45
## sat 11 L P P 54
## [1] 8
Three ways of doing the same thing
data_all_P <- filter_at(data, vars(starts_with("session")), all_vars(. == "P"))
select(data_all_P, date)
## date
## 1 8
## 2 10
## date
## 1 8
## 2 10
## date
## 1 8
## 2 10
git clone
URL copies a git repo to your machinegit add
filename adds the latest version of file filename to your repogit commit -m "My first commit message"
creates a new “version” containing the files you’ve added