To start this tutorial first you need to download and install R (https://cran.r-project.org/bin/windows/base/) and RStudio (https://posit.co/downloads/).
Knowledge of statistics theory in mathematics
You should have solid understanding of statistics in mathematics
Understanding of various type of graphs for data representation
Prior knowledge of any programming
TOLERANCE TO FAILURE!
# Notice: everything that is after a the numeral or hash symbol (#) is a comment in R
# and Sections can be created as follow:
# Title of the section ----
# 3 * 2 # Elements
What is an object in R?
x = 1 # Creating object x
y <- 2
z = 3
w = NA
Functions in r have the following syntax:
function_name(argument1, argument2, argument3, ...)
What is a function? and what is an argument?
# play with the function mean
a = mean(c(x, y, z, w), na.rm = T)
x + y # Sum x plus y
x * y
y
help(mean)
Generating strings of elements (Vectors)
b = c(1:5,NA)
mean(b, na.rm = TRUE)
Try other functions
sum() median() summary()
read.csv()
(Numeric, Integer, Complex, Logical, Character)
x = 10.5 # Asign a double value
x # print the value of x in the console
How do I know the type of element stored in the x
object?
class(x) # print the class of the variable or object x
typeof(x) # print the type of element of the variable or object x
y = 5
y = as.integer(5)
y = round(5.7)
Ask R, what class of object is y and what type of
element stored inside of it?
class(y)
typeof(y)
Other ways to generate integers
y = 4L
is.integer(y)
Check the following case:
y = "5"
is.integer(y) # Is "y" an integer?
Force the object y to be an integer
y = as.integer(y) # assign "y" as an integer
is.integer(y)
w = 17.5
w = "17.5"
is.character(w)
class(w)
v = as.double("5.27") # Force a numeric string to b a double
as.integer("Malaria")
y = 5.12L
Logic operators
!, !=, <, >, <=, >=, ==, |, &, ||, &&
Mathematical operators +, -, *, /, ^ or **
x = 1; y = 2
x==y
x!=y
!TRUE
TRUE & TRUE
class(z)
!z
as.integer(TRUE) # the numeric value of TRUE
as.integer(FALSE) # the numeric value of FALSE
x = as.character(3.14)
class(x)
fname = "Joe"; lname ="Smith"
Other types of elements are: formulas, complex numbers, functions, methods, …
paste & paste0
hname = paste(fname, lname, sep=" ") # Concatenate two strings
substr, sub & strsplit
substr("Mary has a little lamb", start=6, stop=8)
sub("has", "hasn`t", "Mary has a little lamb")
strsplit("Mary has a little lamb", "")
clean you enviroment
rm(list = ls())
Scalars, vectors, lists(*), data.frames, data.tables, tibble, arrays, Class S3 & S4 lists
Create your first vector
a = c(1,2,3,4,5)
What function is c()?
What class of object is a and what type of elements are
stored in a?
class(a)
typeof(a)
Other ways to create sequence of elements:
b = (1:20)
d = seq(from=3, to=21, by=3)
d = as.integer(seq(from=3, to=21, by=3))
e = as.double(rep(1:5, times=4))
f = rep(1:5, each=4)
g = rep(c("a","b","c","d","e"), each=4)
Create a sequence of random numbers
h = rnorm(20, mean = 20, sd=5)
mean(h)
Only between vectors of the same length, or whose length of the longest vector is a multiple of that of the shortest vector. Also, the vectors must contain elements of the same type and class (there are exceptions).
What is the size of the vectors a, b, and
d
length(a)
length(b)
length(d)
Perform mathematical operations with these three vectors
(a, b and d), and see what
happens
a + b
a * b
b / d
What is about characters? Inspect the function paste
j = c("a","b","c","d")
k = c("c","d")
paste(j, k, sep = "")
q = c(a, b)
i = c(a, g)
class(q)
class(i)
Use brackets to access positions within a vector
d
d[5]
h[5]
We can omit one or more elements
d[-5]
d[c(-1,-3)]
What happens if an index is out of the scope of the vector?
length(h)
h[21]
We can use a logical vector to access positions in another vector
Create a logical vector indicating which elements in the vector
h are greater than 20.
high = h > 20
high
Access the values in the vector h that are greater than
20 using brackets []
h[high]
h[h > 20]
Access values lower or equals than 20
low = h <= 20
low
h[low]
h[h <= 20]
puppy = c("Chimu", "Correa", "Frenchie", "3y")
names(puppy) = c("PetName", "OwnerLastName", "Breed", "Age")
What is the breed of the pet?
puppy["Breed"]
What is the name of the Pet?
puppy["PetName"]
All the elements must be of the same type
Build a matrix called em of 5 rows and 4 columns with
the elements of the vector e
em = matrix(e, nrow=5,ncol=4)
The matrix is filled from the columns
em
Build a matrix called fm of 4 rows and f columns with
the elements of the vector f
fm = matrix(f, nrow=4, ncol=5)
Mathematical operations with matrices
em + fm
Transpose the matrix fm by swapping the columns and rows
t(fm)
Try the sum again
em + t(fm)
By columns
cbind(em,t(fm)) # combines the columns of two matrices or vectors of the same dimension (equal number of rows)
By rows
rbind(em,t(fm)) # Combine the rows of two matrices or vectors of the same dimension (equal number of columns)
What class of object is generated after binding two vectors?
A = cbind(g,h)
typeof(A)
class(A)
Deconstruct a matrix to a vector
c(em)
Write the following code and discuss what happen
m = matrix(1:20, nrow = 5, ncol = 4)
m[9]
now try this:
m[1,]
then this:
m[,2]
Select the element of the second row and third column
m[2,3]
Commonly used to store tables, they are list of vectors of equal length stored in columns. All the elements of one specific column must be of the same type, but different columns can contain different type of elements
Create the data.frame df
df = data.frame(Sex = c(rep('F', 20), rep('M', 20)),
Height = c(rnorm(20, 160, 10), rnorm(20, 170, 15)))
View the data frame in a different window
View(df)
Try to access to the data in the data.frame df
df[2]
df[2][2]
df[[2]]
df[[2]][3]
df[3,]
df[3,"Height"]
df[["Height"]]
df$Height
Get all rows for Females or for Males
df[df[['Sex']] == 'F',]
df[df[['Sex']] == 'M',]
Get the mean height of females and males
mean(df[df[['Sex']] == 'F',][['Height']])
mean(df[df[['Sex']] == 'M',][['Height']])
if(!require(tidyverse)){
install.packages('tidyverse')
library(tidyverse)
}
if(!require(tidyr)){
install.packages('tidyr')
library(tidyr)
}
if(!require(dplyr)){
install.packages('dplyr')
library(dplyr)
}
if(!require(magrittr)){
install.packages('magrittr')
library(magrittr)
}
Check the following functions:
%>% %<>% filter()
select() mutate() group_by
summarize()
Filter Females
df %>% filter(Sex == 'F')
Calculate the mean and standard deviation of the height of Males and Females simultaneously
df %>% group_by(Sex) %>% summarize(average = mean(Height),
sd = sd(Height))
Check this website: http://www.sthda.com/english/wiki/ggplot2-essentials
if(!require(ggplot2)){
install.packages('ggplot2')
library(ggplot2)
}
Let’s create a histogram of the Height of Males and Females
df %>% ggplot(aes(x = Height, fill = Sex))+
geom_histogram(position = 'identity', binwidth = 5, alpha = .5)+
theme_bw()
df %>% ggplot(aes(x = Height, fill = Sex))+
geom_histogram(position = 'stack', binwidth = 5, alpha = .5)+
theme_bw()
Let’s create a boxplot of the Height of Males and Females
df %>% ggplot(aes(x = Sex, y = Height, fill = Sex))+
geom_boxplot()+
theme_bw()
Let’s create a jitter plot with a violin plot as background of the Height of Males and Females
df %>% ggplot(aes(x = Sex, y = Height, color = Sex))+
geom_violin()+
geom_jitter(width = .2)+
theme_bw()
Change the labels of the x axis and remove the legend
df %>% ggplot(aes(x = Sex, y = Height, color = Sex))+
geom_violin()+
geom_jitter(width = .2)+
scale_x_discrete(breaks = c('F', 'M'), labels = c('Females', 'Males'))+
theme_bw()+
theme(legend.position = 'none')
Generic object that contains different structures (vectors, matrices or data.frames)
B = list(g,h,d, df)
Sequence of characters (or even one character) that describes a certain pattern found in a text.
Check this website: https://www.datacamp.com/tutorial/regex-r-regular-expressions-guide
grep(), grepl() return the indices of
strings containing a match (grep()) or a logical vector
showing which strings contain a match (grepl()).
gsub() replace a detected match in each string with
a specified string.
if(!require(adegenet)){
install.packages("adegenet")
library(adegenet)
}else{
library(adegenet)
}
if(!require(ade4)){
install.packages("ade4")
library(ade4)
}else{
library(ade4)
}
if(!require(poppr)){
install.packages("poppr")
library(poppr)
}else{
library(poppr)
}
if(!require(dplyr)){
install.packages("dplyr")
library(dplyr)
}else{
library(dplyr)
}
if(!require(magrittr)){
install.packages("magrittr")
library(magrittr)
}else{
library(magrittr)
}
if(!require(tidyr)){
install.packages("tidyr")
library(tidyr)
}else{
library(tidyr)
}
if(!require(ggplot2)){
install.packages("ggplot2")
library(ggplot2)
}else{
library(ggplot2)
}
if(!require(cowplot)){
install.packages("cowplot")
library(cowplot)
}else{
library(cowplot)
}
if(!require(vegan)){
install.packages("vegan")
library(vegan)
}else{
library(vegan)
}
if(!require(parallel)){
install.packages("parallel")
library(parallel)
}else{
library(parallel)
}
if(!require(ape)){
install.packages("ape")
library(ape)
}else{
library(ape)
}
if(!require(pegas)){
install.packages("pegasn")
library(pegas)
}else{
library(pegas)
}
if(!require(RColorBrewer)){
install.packages("RColorBrewer")
library(RColorBrewer)
}else{
library(RColorBrewer)
}
if(!require(Hmisc)){
install.packages('Hmisc')
library(Hmisc)
}else{
library(Hmisc)
}
if(!require(ggpubr)){
install.packages('ggpubr')
library(ggpubr)
}else{
library(ggpubr)
}
if(!require(doMC)){
install.packages('doMC')
library(doMC)
}else{
library(doMC)
}
if(!require(svMisc)){
install.packages('svMisc')
library(svMisc)
}else{
library(svMisc)
}
if(!require(Biostrings)){
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
}else{
library(Biostrings)
}