Stats 1 - R Refresher Flashcards
How do assign a number to a variable in R? How can you manipulate these variables mathematically?
Store information in variables - a <- 4
You can perform several mathematical functions with this variable – a * a, squared, sqrt(a)
What are Vectors in R and how do you build them?
Vectors - Vector = Set of data –> Think of it as a row or column in a spreadsheet
Code
(Name) <- c(0, 1, 2, 3, 4)
Can you apply mathematical functions to a vector?
You can apply a multitude of mathematical functions to a vector –> e.g. mean(v), variance – var(v), median, sum (v), prod(v), length (v) – how many elements
What are the types of parentheses in R?

What are the different variable types?
There are 4 different variables types
- Integer
- Float/Numeric –> Real numbers
- String –> character/text
- Boolean –> True or False /1 and 0
How to ask ‘R’ what variable type you have?
To figure out the variable type –> class(v)
How can you convert between variable types?
Type of Conversions and Special Values:
- as.x() –> convert between variable types
- as.integer(3.1) –> 3
- as.roman(155) –> CLV
- as.character(155) –> ‘155’
Note that it is put between apostrophe’s –> indicates that it is now a character
- as.logical(5) –> True
Note that R maps all values not equal to 0 = True, whereas 0 = false
Note that ‘R’ –> uses E notation for scientific notation –> 1e4 or 5e-2
What are data structures in R?
Data structures = Different ways to store and manipulate data
Outline the data structure - Vector.
Vector - Fundamental data structure/object in R
Think of vectors as single rows or columns on a spreadsheet
Example
v1 <- c(0.02, 0.5, 1)
v2 <- c(“a”, “bc”, “def”, “ghij”)
v3 <- c(TRUE, TRUE, FALSE)
BUT! Vectors can only store data of one TYPE –> e.g. all numeric, all character, etc.
If you try to combine them –> it will homogenize them
v1 <- c(0.02, “Mary”, 1)
‘0.02’ ‘Mary’ ‘1’
The function c “coerces” arguments that are of mixed types (strings/text, real numbers, logical arguments, etc) to a common type.
Outline the data structure - Matrix & Array.
Matrix –> is a 2 dimensional vector (has both rows and columns)
mat1 <- matrix(1:25, 5, 5)
This creates a matrix with number 1 to 25 in a 5x5 grid.
dim(mat1) –> retrieves the dimensions of the matrix
Array –> can store data in more than two dimensions (e.g., a stack of 2-D matrices).
For example…
array <- (1:50, c(5, 5, 2))
array1 produces two 5x5 matrices stacked on top of each other –> with variables going from 1 to 50

In matrices and arrays, do the data variable types need to be homogenous?
Just like vectors the data types MUST be Homogenous!
R will automatically homogenize it if you don’t
Outline the data structure - Data Frame
How can you build a data frame?
Data Frame
Very important & useful data structure
Why??? –> Each Column can have different data types or the individual column itself can contain mixed data types!
Building Data frames
- Build the individual columns –> like vectors
Col1
Col2
Col3
- Joing these columns into the same ‘spreadsheet’ using the data.frame function
MyDF <- data.frame(Col1, Col2, Col3)
- Change the Column names using the following code –> No Spaces allowed
names(MyDF) <- c(“MyFirstColumn”, “MySecondColumn”, “My.Third.Column”)

How can you select specific columns in a data frame?
You can target specific columns instead of having to print the entire data frame –> using the $ symbol
MyDF$MyFirstColumn
or….
You can also access specific rows/columns using numerical indexing
MyDF[R,C] –> Number of row and column –> if left blank all rows/columns will be considered
Outline the data structure - Lists
How can you create a list?
List –> just a way to combine shit together –> simple ordered collection of objects
List is used to collect a group of data objects of different sizes and types (e.g., one whole data frame and one vector can both be in a single list)
Creating a list
MyList <- list(species=c(“Quercus robur”,”Fraxinus excelsior”), age=c(123, 84))
Data frames are more flexible and you can use them for more things, why use matrices at all?
Problem is that Data frames are slow when working with large numbers and performing mathematical calculations Hence, in such case you should convert it to a matrix
BUT! For statistical analysis, plotting, etc –> Data frames are more convenient
When you have a large data frame and you want to isolate select specific columns, what should you do?
Example with the Covid Data set
incomebased <- subset(coviddata, select = c(Income, TotalCases, TotalDeaths, location), stringsAsFactors = T)
Generic:
NewName <- subset(dataset, select = c(The exact names of the different columns)
How can you subset/isolate specifc categorical levels from within a data frame column?
Example –> want to seperate the high and low income data from the rest of the covid data set.
country < - subset (incomebased, incomebased$Income==”High” | incomebased$Income== “Low”)
Name <- subset (dataset name, Dataset$Column ==”Target”)
If multiple categorical levels want to be targetted use ‘|’.
How to organise your workflow?
Just keep that shit organised
Input files
- R script
- Text Data file
Output Files
- Graphics File
- Results output
- R data file

Difference between a relative and absolute path?
Relative Paths
- Relative path (read.csv(“../data/trees.csv”) signified by ../
- Relative path –> R to load data that lies in a different directory (folder) relative your current location
Absolute Path
- One that specifies the whole path on your computer
- Note –> Absolute paths are specific to a computer –> hence, they should be avoided
What is the *apply family in R?
- There are a family of functions called *apply in R that vectorize your code for you.
- For example, apply can be used when you want to apply a function to the rows or columns of a matrix
- Better used with matrices –> why? R will need to coerce the data frame to a matrix first.
Example
- Take the mean of each row –> RowMeans <- apply(M, 1, mean)
This code takes the mean from each row ‘1’ from the Matrix ‘M’
- Take the mean of each column ColMeans <- apply(M, 2, mean)
This code takes the mean from each column ‘2’ from the Matrix ‘M’
What are the basic plotting commands in R?
- Change marker colour -> col(colour)
- Change marker type -> pch(#)
- Change x-axis label –> xlab = “”
- Change y-axis label –> ylab = “”
- Change the main titlle –> main = “”
- Change border colour in histogram –> border = “”

Scatter plot Example
plot(log10(MyDF$Predator.mass),log10(MyDF$Prey.mass),pch=20)

Histogram Example
hist(log10(MyDF$Predator.mass), xlab = “log10(Predator Mass (g))”, ylab = “Count”)

How can we combine mutliple plots on the same page?
We can create two plots and use the par function to compare them
How do we do that?
You start your code with the following
- par(mfcol=c(2,1))
- par(mfg = c(1,1))
The first line tells R to create a multiplot –> 2,1 indicates the organisation – (Rows, Columns)
The second line tells R specifies the location of the plot e.g. the follow plot place in row 1 column 1

How can you overlay plots?
To overlay plots you don’t use the par function instead just input use the code for each histogram and define the colour and transparency –> R recognizes that it is all meant for the same plot using the overarhcing bracket
hist(log10(MyDF$Predator.mass), # Predator histogram
xlab=”log10(Body Mass (g))”, ylab=”Count”, col = rgb(1, 0, 0, 0.5), # Note ‘rgb’, fourth value is transparency
main = “Predator-prey size Overlap”)
hist(log10(MyDF$Prey.mass), col = rgb(0, 0, 1, 0.5), add = T) # Plot prey
legend(‘topleft’,c(‘Predators’,’Prey’), # Add legend
fill=c(rgb(1, 0, 0, 0.5), rgb(0, 0, 1, 0.5))) # Define legend colors

Boxplot example
Boxplots are useful for getting a visual summary of the distribution of your data.
boxplot(log10(MyDF$Predator.mass), xlab = “Location”, ylab = “log10(Predator Mass)”, main = “Predator mass”)

How to save your graphics to a pdf?
How to save your graphics to the correct location Graphics file?
Open your code with: pdf(“../results/Pred_Prey_Overlay.pdf”, 11.7, 8.3) # Open blank pdf page using a relative path + numbers are the page dimensions in inches
Follow this with any piece of graphics code…
End you code with –> graphics.off(); or dev.off() # Tells R where to stop including things into the pdf
How can you subset out a specific categorical level?
covid <- subset(covid, covid$Diabet_Cat != “High”)
! –> used to subset out