Stats 1 - R Refresher Flashcards

1
Q

How do assign a number to a variable in R? How can you manipulate these variables mathematically?

A

Store information in variables - a <- 4

You can perform several mathematical functions with this variable – a * a, squared, sqrt(a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Vectors in R and how do you build them?

A

Vectors - Vector = Set of data –> Think of it as a row or column in a spreadsheet

Code

(Name) <- c(0, 1, 2, 3, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you apply mathematical functions to a vector?

A

You can apply a multitude of mathematical functions to a vector –> e.g. mean(v), variance – var(v), median, sum (v), prod(v), length (v) – how many elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of parentheses in R?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the different variable types?

A

There are 4 different variables types

  1. Integer
  2. Float/Numeric –> Real numbers
  3. String –> character/text
  4. Boolean –> True or False /1 and 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to ask ‘R’ what variable type you have?

A

To figure out the variable type –> class(v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you convert between variable types?

A

Type of Conversions and Special Values:

  1. as.x() –> convert between variable types
  2. as.integer(3.1) –> 3
  3. as.roman(155) –> CLV
  4. as.character(155) –> ‘155’

Note that it is put between apostrophe’s –> indicates that it is now a character

  1. as.logical(5) –> True

Note that R maps all values not equal to 0 = True, whereas 0 = false

Note that ‘R’ –> uses E notation for scientific notation –> 1e4 or 5e-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are data structures in R?

A

Data structures = Different ways to store and manipulate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Outline the data structure - Vector.

A

Vector - Fundamental data structure/object in R

Think of vectors as single rows or columns on a spreadsheet

Example

v1 <- c(0.02, 0.5, 1)

v2 <- c(“a”, “bc”, “def”, “ghij”)

v3 <- c(TRUE, TRUE, FALSE)

BUT! Vectors can only store data of one TYPE –> e.g. all numeric, all character, etc.

If you try to combine them –> it will homogenize them

v1 <- c(0.02, “Mary”, 1)

‘0.02’ ‘Mary’ ‘1’

The function c “coerces” arguments that are of mixed types (strings/text, real numbers, logical arguments, etc) to a common type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Outline the data structure - Matrix & Array.

A

Matrix –> is a 2 dimensional vector (has both rows and columns)

mat1 <- matrix(1:25, 5, 5)

This creates a matrix with number 1 to 25 in a 5x5 grid.

dim(mat1) –> retrieves the dimensions of the matrix

Array –> can store data in more than two dimensions (e.g., a stack of 2-D matrices).

For example…

array <- (1:50, c(5, 5, 2))

array1 produces two 5x5 matrices stacked on top of each other –> with variables going from 1 to 50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In matrices and arrays, do the data variable types need to be homogenous?

A

Just like vectors the data types MUST be Homogenous!

R will automatically homogenize it if you don’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outline the data structure - Data Frame

How can you build a data frame?

A

Data Frame

Very important & useful data structure

Why??? –> Each Column can have different data types or the individual column itself can contain mixed data types!

Building Data frames

  1. Build the individual columns –> like vectors

Col1

Col2

Col3

  1. Joing these columns into the same ‘spreadsheet’ using the data.frame function

MyDF <- data.frame(Col1, Col2, Col3)

  1. Change the Column names using the following code –> No Spaces allowed

names(MyDF) <- c(“MyFirstColumn”, “MySecondColumn”, “My.Third.Column”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you select specific columns in a data frame?

A

You can target specific columns instead of having to print the entire data frame –> using the $ symbol

MyDF$MyFirstColumn

or….

You can also access specific rows/columns using numerical indexing

MyDF[R,C] –> Number of row and column –> if left blank all rows/columns will be considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Outline the data structure - Lists

How can you create a list?

A

List –> just a way to combine shit together –> simple ordered collection of objects

List is used to collect a group of data objects of different sizes and types (e.g., one whole data frame and one vector can both be in a single list)

Creating a list

MyList <- list(species=c(“Quercus robur”,”Fraxinus excelsior”), age=c(123, 84))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data frames are more flexible and you can use them for more things, why use matrices at all?

A

Problem is that Data frames are slow when working with large numbers and performing mathematical calculations  Hence, in such case you should convert it to a matrix

BUT! For statistical analysis, plotting, etc –> Data frames are more convenient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When you have a large data frame and you want to isolate select specific columns, what should you do?

A

Example with the Covid Data set

incomebased <- subset(coviddata, select = c(Income, TotalCases, TotalDeaths, location), stringsAsFactors = T)

Generic:

NewName <- subset(dataset, select = c(The exact names of the different columns)

17
Q

How can you subset/isolate specifc categorical levels from within a data frame column?

A

Example –> want to seperate the high and low income data from the rest of the covid data set.

country < - subset (incomebased, incomebased$Income==”High” | incomebased$Income== “Low”)

Name <- subset (dataset name, Dataset$Column ==”Target”)

If multiple categorical levels want to be targetted use ‘|’.

18
Q

How to organise your workflow?

A

Just keep that shit organised

Input files

  1. R script
  2. Text Data file

Output Files

  1. Graphics File
  2. Results output
  3. R data file
19
Q

Difference between a relative and absolute path?

A

Relative Paths

  • Relative path (read.csv(“../data/trees.csv”) signified by ../
  • Relative path –> R to load data that lies in a different directory (folder) relative your current location

Absolute Path

  • One that specifies the whole path on your computer
  • Note –> Absolute paths are specific to a computer –> hence, they should be avoided
20
Q

What is the *apply family in R?

A
  • There are a family of functions called *apply in R that vectorize your code for you.
  • For example, apply can be used when you want to apply a function to the rows or columns of a matrix
  • Better used with matrices –> why? R will need to coerce the data frame to a matrix first.

Example

  1. Take the mean of each row –> RowMeans <- apply(M, 1, mean)

This code takes the mean from each row ‘1’ from the Matrix ‘M’

  1. Take the mean of each column  ColMeans <- apply(M, 2, mean)

This code takes the mean from each column ‘2’ from the Matrix ‘M’

21
Q

What are the basic plotting commands in R?

A
  1. Change marker colour -> col(colour)
  2. Change marker type -> pch(#)
  3. Change x-axis label –> xlab = “”
  4. Change y-axis label –> ylab = “”
  5. Change the main titlle –> main = “”
  6. Change border colour in histogram –> border = “”
22
Q

Scatter plot Example

A

plot(log10(MyDF$Predator.mass),log10(MyDF$Prey.mass),pch=20)

23
Q

Histogram Example

A

hist(log10(MyDF$Predator.mass), xlab = “log10(Predator Mass (g))”, ylab = “Count”)

24
Q

How can we combine mutliple plots on the same page?

A

We can create two plots and use the par function to compare them

How do we do that?

You start your code with the following

  1. par(mfcol=c(2,1))
  2. par(mfg = c(1,1))

The first line tells R to create a multiplot –> 2,1 indicates the organisation – (Rows, Columns)

The second line tells R specifies the location of the plot e.g. the follow plot place in row 1 column 1

25
Q

How can you overlay plots?

A

To overlay plots you don’t use the par function instead just input use the code for each histogram and define the colour and transparency –> R recognizes that it is all meant for the same plot using the overarhcing bracket

hist(log10(MyDF$Predator.mass), # Predator histogram

xlab=”log10(Body Mass (g))”, ylab=”Count”, col = rgb(1, 0, 0, 0.5), # Note ‘rgb’, fourth value is transparency

main = “Predator-prey size Overlap”)

hist(log10(MyDF$Prey.mass), col = rgb(0, 0, 1, 0.5), add = T) # Plot prey

legend(‘topleft’,c(‘Predators’,’Prey’), # Add legend

fill=c(rgb(1, 0, 0, 0.5), rgb(0, 0, 1, 0.5))) # Define legend colors

26
Q

Boxplot example

A

Boxplots are useful for getting a visual summary of the distribution of your data.

boxplot(log10(MyDF$Predator.mass), xlab = “Location”, ylab = “log10(Predator Mass)”, main = “Predator mass”)

27
Q

How to save your graphics to a pdf?

A

How to save your graphics to the correct location  Graphics file?

Open your code with: pdf(“../results/Pred_Prey_Overlay.pdf”, 11.7, 8.3) # Open blank pdf page using a relative path + numbers are the page dimensions in inches

Follow this with any piece of graphics code…

End you code with –> graphics.off(); or dev.off() # Tells R where to stop including things into the pdf

28
Q

How can you subset out a specific categorical level?

A

covid <- subset(covid, covid$Diabet_Cat != “High”)

! –> used to subset out