R
is…R
?Rstudio
R
analyses in polished RMarkdown
filesR
resourcesR
R
installed
R
RStudio
RStudio
RStudio
by adding :8787
to your AMI urlRstudio
environment by locating the following features:
Rstudio
by clicking the top left icon- open a new R script.RMarkdown
RMarkdown
R code
into descriptive files to keep your life organized
R chunks
into Rmarkdown
documentsRMarkdown
FilesR
R
code chunks
in RMarkdown#
symbols[]
R
follows the normal priority of mathematical evaluation (PEDMAS)R
Input code chunk and then output
## [1] 16
Input code chunk and then output
## [1] 16
<-
operator.R
is case sensitive.## [1] 6
## [1] 4
These do not work
## [1] 14
## [1] 144
## [1] 2.484907
log
- is a built in function of R
, and therefore the object of the function needs to be put in parenthesesprint
command## [1] 67
## [1] 69022864
c
stands for concatenate
## [1] "I Love"
## [1] "Biostatistics"
## [1] "I Love" "Biostatistics"
R
thinks in terms of vectors
R
user to try to write scripts with that in mind## [1] 2 3 4 2 1 2 4 5 10 8 9
## [1] 5 6 7 5 4 5 7 8 13 11 12
x
is now what is called a list of character values (“I Love”).factors
, and we can redefine our character variables as factors.## [1] I Love
## Levels: I Love
R
“sees” a variable using str()
or class()
functions.## chr "I Love"
## [1] "character"
int
stands for integers
dbl
stands for doubles, or real numbers
chr
stands for character vectors, or strings
dttm
stands for date-times (a date + a time)
lgl
stands for logical, vectors that contain only TRUE or FALSE
fctr
stands for factors, which R uses to represent categorical variables with fixed possible values
date
stands for dates
FALSE
TRUE
NA
which is ‘not available’.R
numbers are doubles by default.NA
NaN
which is ‘not a number’Inf
-Inf
Many functions exist to operate on vectors.
mean(n)
median(n)
var(n)
log(n)
exp(n)
sqrt(n)
sum(n)
length(n)
sample(n, replace = T) #has an additional argument (replace=T)
R
and it is easy enough to write your own functions if none already exist to do what you want to do.seq
sample
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
## [15] 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7
## [29] 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [43] 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5
## [57] 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
## [71] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3
## [85] 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7
## [99] 9.8 9.9 10.0
## [1] 10.0 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0 8.9 8.8 8.7
## [15] 8.6 8.5 8.4 8.3 8.2 8.1 8.0 7.9 7.8 7.7 7.6 7.5 7.4 7.3
## [29] 7.2 7.1 7.0 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0 5.9
## [43] 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 4.9 4.8 4.7 4.6 4.5
## [57] 4.4 4.3 4.2 4.1 4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1
## [71] 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0 1.9 1.8 1.7
## [85] 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
## [99] 0.2 0.1 0.0
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
Complete Exercises 1.3-1.6
x <- rnorm(n = 10000, mean = 0, sd = 10)
y <- sample(1:10000, 10000, replace = T)
xy <- cbind(x, y)
plot(xy)
dnorm()
generates the probability density, which can be plotted using the curve()
function.add=TRUE
x <- rnorm(1000, 0, 100)
hist(x, xlim = c(-500, 500))
curve(50000 * dnorm(x, 0, 100), xlim = c(-500, 500), add = TRUE,
col = "Red")
R
hist
function.plot
function (as well as a number of more sophisticated plotting functions).high level
plotting function, which sets the stageLow level
plotting functions will tweak the plots and make them beautifulseq_1 <- seq(0, 10, by = 0.1)
plot(seq_1, xlab = "space", ylab = "function of space", type = "p",
col = "red")
par(mfrow = c(2, 2))
plot(seq_1, xlab = "time", ylab = "p in population 1", type = "p",
col = "red")
plot(seq_2, xlab = "time", ylab = "p in population 2", type = "p",
col = "green")
plot(seq_square, xlab = "time", ylab = "p2 in population 2",
type = "p", col = "blue")
plot(seq_square_new, xlab = "time", ylab = "p in population 1",
type = "l", col = "yellow")
Complete Exercises 1.7-1.8
R
R
R
you can generate your own random data set drawn from nearly any distribution very easily.mydata <- data.frame(habitat, temp, elevation)
row.names(mydata) <- c("Reedy Lake", "Pearcadale", "Warneet",
"Cranbourne", "Lysterfield", "Red Hill", "Devilbend", "Olinda")
head(mydata)
## habitat temp elevation
## Reedy Lake mixed 3.4 0.0
## Pearcadale wet 3.4 9.2
## Warneet wet 8.4 3.8
## Cranbourne wet 3.0 5.0
## Lysterfield dry 5.6 5.6
## Red Hill dry 8.1 4.1
R
is being able to import data from an external source
R
.R
look in the PWD, whereas a full path can also be usedwrite.csv(YourFile, "yourfile.csv", quote = F, row.names = T,
sep = ",")
write.table(YourFile, "yourfile.txt", quote = F, row.names = T,
sep = "\t")
:8787
to access R studio
againgit pull
s today, as to not overwrite your work!
mv
commandR
FunctionsR
can guess what you mean because of order…## [1] 17.600298 1.043059 -2.164877 5.635400 4.439286 -9.417456
## [7] -26.140246 3.868278 -7.574435 12.027035
R
and get something you really didn’t want…## [1] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
R
Functions## [1] 6.869129 10.663631 5.367006 19.060287 10.631596 13.703436 5.277918
## [8] 4.030967 11.677516 7.926794
## [1] 6.869129 10.663631 5.367006 19.060287 10.631596 13.703436 5.277918
## [8] 4.030967 11.677516 7.926794
R
, that allows you to analyze just a subset of the data.Complete Exercises 1.9-1.10
git pull
knit
button to render markdown> "You know the greatest danger facing us is ourselves, an irrational fear of the unknown.
But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood."
>
> --- Captain James T. Kirk
“You know the greatest danger facing us is ourselves, an irrational fear of the unknown. But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood.”
— Captain James T. Kirk
-list_element
-sub_list_element #double tab to indent
-sub_list_element #double tab to indent
-sub_list_element #double tab to indent
-list_element
-sub_list_element #double tab to indent
# note the space after each dash- this is important!
\[ \large a^x, \sqrt[n]{x}, \vec{\jmath}, \tilde{\imath}\]
\[ \large \alpha, \beta, \gamma\]
\[ \large\approx, \neq, \nsim \]
\[\large \partial, \mathbb{R}, \flat\]
Binomial sampling equation
\[\large f(k) = {n \choose k} p^{k} (1-p)^{n-k}\]
Poisson Sampling Equation
\[\large Pr(Y=r) = \frac{e^{-\mu}\mu^r}{r!}\]
\[\iint xy^2\,dx\,dy =\frac{1}{6}x^2y^3\]
$$ \begin{matrix}
-2 & 1 & 0 & 0 & \cdots & 0 \\
1 & -2 & 1 & 0 & \cdots & 0 \\
0 & 1 & -2 & 1 & \cdots & 0 \\
0 & 0 & 1 & -2 & \ddots & \vdots \\
\vdots & \vdots & \vdots & \ddots & \ddots & 1 \\
0 & 0 & 0 & \cdots & 1 & -2
\end{matrix} $$
\[ \begin{matrix} -2 & 1 & 0 & 0 & \cdots & 0 \\ 1 & -2 & 1 & 0 & \cdots & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 \\ 0 & 0 & 1 & -2 & \ddots & \vdots \\ \vdots & \vdots & \vdots & \ddots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & 1 & -2 \end{matrix} \]
This equation, \(y=\frac{1}{2}\), is included inline
Whereas this equation \[y=\frac{1}{2}\] is put on a separate line
Say you perform an experiment on two different strains of stickleback fish, one from an ocean population (RS) and one from a freshwater lake (BP) by making them microbe free. Microbes in the gut are known to interact with the gut epithelium in ways that lead to a proper maturation of the immune system.
You carry out an experiment by treating multiple fish from each strain so that some of them have a conventional microbiota, and some are inoculated with only one bacterial species. You then measure the levels of gene expression in the stickleback gut using RNA-seq. You suspect that the sex of the fish might be important so you track it too.
R
?Hadley Wickham and others have written R
packages to modify data
These packages do many of the same things as base functions in R
However, they are specifically designed to do them faster and more easily
Wickham also wrote the package GGPlot2
for elegant graphics creations
GG stands for ‘Grammar of Graphics’
dplyr
for vectorsfilter()
.arrange()
.select()
.mutate()
.summarise()
.filter()
, arrange()
& select()
mutate()
& transmutate()
This function will add a new variable that is a function of other variable(s)
This function will replace the old variable with the new variable
group_by( )
& summarize( )
This first function allows you to aggregate data by values of categorical variables (factors)
Once you have done this aggregation, you can then calculate values (in this case the mean) of other variables split by the new aggregated levels of the categorical variable
group_by( )
& summarize( )
xxx
“Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space”
— Edward Tufte
Draw graphical elements clearly, minimizing clutter
Represent magnitudes honestly and accurately
“Graphical excellence begins with telling the truth about the data” – Tufte 1983
xxx
geom_bar
functiongeom_bar
functionNow try this…
geom_bar
functionand this…
geom_bar
functionand finally this…
geom_histogram
and geom_freqpoly
functionWith this function you can make a histogram
geom_histogram
and geom_freqpoly
functionThis allows you to make a frequency polygram
geom_boxplot
functionBoxplots are very useful for visualizing data
geom_boxplot
functiongeom_boxplot
functionggplot(data = mpg, mapping = aes(x = reorder(class, hwy, FUN = median),
y = hwy)) + geom_boxplot() + coord_flip()
geom_point
& geom_smooth
functionsgeom_point
& geom_smooth
functionsgeom_point
& geom_smooth
functionsgeom_point
& geom_smooth
functionsggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy), method = "loess")
ggplot(data = mpg, aes(displ, hwy)) + geom_point(aes(color = class)) +
geom_smooth(se = FALSE, method = "loess") + labs(title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov")
---
title: Flexdashboard Options"
output:
flexdashboard::flex_dashboard:
vertical_layout: 'fill' or 'scroll'
or
orientation: 'rows'
---