R?RstudioR analyses in polished RMarkdown filesR resourcesRR installed
RRStudioRStudioRStudio by adding :8787 to your AWS urlRstudio environment by locating the following features:
Rstudio by clicking the top left icon- open a new R script.RMarkdownRMarkdownR code into descriptive files
R chunks into Rmarkdown documentsRMarkdown FilesRRcode chunks in RMarkdown# symbols[]R follows the normal priority of mathematical evaluation (PEDMAS)RInput code chunk and then output
## [1] 16
Input code chunk and then output
## [1] 16
<- operator.R is case sensitive.## [1] 6
## [1] 4
These do not work
## [1] 14
## [1] 144
## [1] 2.484907
log - is a built in function of R, and therefore the object of the function needs to be put in parenthesesprint command## [1] 67
## [1] 69022864
c stands for concatenate## [1] "I Love"
## [1] "Biostatistics"
## [1] "I Love" "Biostatistics"
R thinks in terms of vectors
R user to try to write scripts with that in mind## [1] 2 3 4 2 1 2 4 5 10 8 9
## [1] 5 6 7 5 4 5 7 8 13 11 12
x is now what is called a list of character values (“I Love”).factors, and we can redefine our character variables as factors.## [1] I Love
## Levels: I Love
## chr "I Love"
## [1] "character"
R “sees” a variable using str() or class() functions.int stands for integers
dbl stands for doubles, or real numbers
chr stands for character vectors, or strings
dttm stands for date-times (a date + a time)
lgl stands for logical, vectors that contain only TRUE or FALSE
fctr stands for factors, which R uses to represent categorical variables with fixed possible values
date stands for dates
FALSETRUENA which is ‘not available’.R numbers are doubles by default.NANaN which is ‘not a number’Inf-InfMany functions exist to operate on vectors.
mean(n)
median(n)
var(n)
log(n)
exp(n)
sqrt(n)
sum(n)
length(n)
sample(n, replace = T) #has an additional argument (replace=T)R and it is easy enough to write your own functions if none already exist to do what you want to do.seqsample## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
## [15] 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7
## [29] 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [43] 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5
## [57] 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
## [71] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3
## [85] 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7
## [99] 9.8 9.9 10.0
## [1] 10.0 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0 8.9 8.8 8.7
## [15] 8.6 8.5 8.4 8.3 8.2 8.1 8.0 7.9 7.8 7.7 7.6 7.5 7.4 7.3
## [29] 7.2 7.1 7.0 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0 5.9
## [43] 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 4.9 4.8 4.7 4.6 4.5
## [57] 4.4 4.3 4.2 4.1 4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1
## [71] 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0 1.9 1.8 1.7
## [85] 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
## [99] 0.2 0.1 0.0
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
x <- rnorm(n = 10000, mean = 0, sd = 10)
y <- sample(1:10000, 10000, replace = T)
xy <- cbind(x, y)
plot(xy)dnorm() generates the probability density, which can be plotted using the curve() function.add=TRUEx <- rnorm(1000, 0, 100)
hist(x, xlim = c(-500, 500))
curve(50000 * dnorm(x, 0, 100), xlim = c(-500, 500), add = TRUE,
col = "Red")Rhist function.plot function (as well as a number of more sophisticated plotting functions).high level plotting function, which sets the stageLow level plotting functions will tweak the plots and make them beautifulseq_1 <- seq(0, 10, by = 0.1)
plot(seq_1, xlab = "space", ylab = "function of space", type = "p",
col = "red")par(mfrow = c(2, 2))
plot(seq_1, xlab = "time", ylab = "p in population 1", type = "p",
col = "red")
plot(seq_2, xlab = "time", ylab = "p in population 2", type = "p",
col = "green")
plot(seq_square, xlab = "time", ylab = "p2 in population 2",
type = "p", col = "blue")
plot(seq_square_new, xlab = "time", ylab = "p in population 1",
type = "l", col = "yellow")Complete Exercises 1.3-1.8
RRR you can generate your own random data set drawn from nearly any distribution very easily.mydata <- data.frame(habitat, temp, elevation)
row.names(mydata) <- c("Reedy Lake", "Pearcadale", "Warneet",
"Cranbourne", "Lysterfield", "Red Hill", "Devilbend", "Olinda")
head(mydata)## habitat temp elevation
## Reedy Lake mixed 3.4 0.0
## Pearcadale wet 3.4 9.2
## Warneet wet 8.4 3.8
## Cranbourne wet 3.0 5.0
## Lysterfield dry 5.6 5.6
## Red Hill dry 8.1 4.1
R is being able to import data from an external source
R.R look in the CWD, whereas a full path can also be usedwrite.csv(YourFile, "yourfile.csv", quote = F, row.names = T,
sep = ",")
write.table(YourFile, "yourfile.txt", quote = F, row.names = T,
sep = "\t")Complete Exercises 1.9-1.10
knit button to render markdown> "You know the greatest danger facing us is ourselves, an irrational fear of the unknown.
But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood."
>
> --- Captain James T. Kirk“You know the greatest danger facing us is ourselves, an irrational fear of the unknown. But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood.”
— Captain James T. Kirk
\[ \large a^x, \sqrt[n]{x}, \vec{\jmath}, \tilde{\imath}\]
\[ \large \alpha, \beta, \gamma\]
\[ \large\approx, \neq, \nsim \]
\[\large \partial, \mathbb{R}, \flat\]
Binomial sampling equation
\[\large f(k) = {n \choose k} p^{k} (1-p)^{n-k}\]
Poisson Sampling Equation
\[\large Pr(Y=r) = \frac{e^{-\mu}\mu^r}{r!}\]
\[\iint xy^2\,dx\,dy =\frac{1}{6}x^2y^3\]
$$ \begin{matrix}
-2 & 1 & 0 & 0 & \cdots & 0 \\
1 & -2 & 1 & 0 & \cdots & 0 \\
0 & 1 & -2 & 1 & \cdots & 0 \\
0 & 0 & 1 & -2 & \ddots & \vdots \\
\vdots & \vdots & \vdots & \ddots & \ddots & 1 \\
0 & 0 & 0 & \cdots & 1 & -2
\end{matrix} $$\[ \begin{matrix} -2 & 1 & 0 & 0 & \cdots & 0 \\ 1 & -2 & 1 & 0 & \cdots & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 \\ 0 & 0 & 1 & -2 & \ddots & \vdots \\ \vdots & \vdots & \vdots & \ddots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & 1 & -2 \end{matrix} \]
This equation, \(y=\frac{1}{2}\), is included inline
Whereas this equation \[y=\frac{1}{2}\] is put on a separate line
Say you perform an experiment on two different strains of stickleback fish, one from an ocean population (RS) and one from a freshwater lake (BP) by making them microbe free. Microbes in the gut are known to interact with the gut epithelium in ways that lead to a proper maturation of the immune system.
You carry out an experiment by treating multiple fish from each strain so that some of them have a conventional microbiota, and some are inoculated with only one bacterial species. You then measure the levels of gene expression in the stickleback gut using RNA-seq. You suspect that the sex of the fish might be important so you track it too.
Hadley Wickham and others have written R packages to modify data
These packages do many of the same things as base functions in R
However, they are specifically designed to do them faster and more easily
Wickham also wrote the package GGPlot2 for elegant graphics creations
GG stands for ‘Grammar of Graphics’
dplyr for vectorsfilter().arrange().select().mutate().summarise().filter(), arrange() & select()mutate() & transmutate()This function will add a new variable that is a function of other variable(s)
This function will replace the old variable with the new variable
group_by( ) & summarize( )This first function allows you to aggregate data by values of categorical variables (factors)
Once you have done this aggregation, you can then calculate values (in this case the mean) of other variables split by the new aggregated levels of the categorical variable
group_by( ) & summarize( )xxx
“Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space”
— Edward Tufte
Draw graphical elements clearly, minimizing clutter
Represent magnitudes honestly and accurately
“Graphical excellence begins with telling the truth about the data” – Tufte 1983
geom_bar functiongeom_bar functionNow try this…
geom_bar functionand this…
geom_bar functionand finally this…
geom_histogram and geom_freqpoly functionWith this function you can make a histogram
geom_histogram and geom_freqpoly functionThis allows you to make a frequency polygram
geom_boxplot functionBoxplots are very useful for visualizing data
geom_boxplot functiongeom_boxplot functionggplot(data = mpg, mapping = aes(x = reorder(class, hwy, FUN = median),
y = hwy)) + geom_boxplot() + coord_flip()geom_point & geom_smooth functionsgeom_point & geom_smooth functionsgeom_point & geom_smooth functionsgeom_point & geom_smooth functionsggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy), method = "loess")ggplot(data = mpg, aes(displ, hwy)) + geom_point(aes(color = class)) +
geom_smooth(se = FALSE, method = "loess") + labs(title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov")---
title: Flexdashboard Options"
output:
flexdashboard::flex_dashboard:
vertical_layout: 'fill' or 'scroll'
or
orientation: 'rows'
---