Stats 1 - R Refresher Flashcards
How do assign a number to a variable in R? How can you manipulate these variables mathematically?
Store information in variables - a <- 4
You can perform several mathematical functions with this variable – a * a, squared, sqrt(a)
What are Vectors in R and how do you build them?
Vectors - Vector = Set of data –> Think of it as a row or column in a spreadsheet
Code
(Name) <- c(0, 1, 2, 3, 4)
Can you apply mathematical functions to a vector?
You can apply a multitude of mathematical functions to a vector –> e.g. mean(v), variance – var(v), median, sum (v), prod(v), length (v) – how many elements
What are the types of parentheses in R?
What are the different variable types?
There are 4 different variables types
- Integer
- Float/Numeric –> Real numbers
- String –> character/text
- Boolean –> True or False /1 and 0
How to ask ‘R’ what variable type you have?
To figure out the variable type –> class(v)
How can you convert between variable types?
Type of Conversions and Special Values:
- as.x() –> convert between variable types
- as.integer(3.1) –> 3
- as.roman(155) –> CLV
- as.character(155) –> ‘155’
Note that it is put between apostrophe’s –> indicates that it is now a character
- as.logical(5) –> True
Note that R maps all values not equal to 0 = True, whereas 0 = false
Note that ‘R’ –> uses E notation for scientific notation –> 1e4 or 5e-2
What are data structures in R?
Data structures = Different ways to store and manipulate data
Outline the data structure - Vector.
Vector - Fundamental data structure/object in R
Think of vectors as single rows or columns on a spreadsheet
Example
v1 <- c(0.02, 0.5, 1)
v2 <- c(“a”, “bc”, “def”, “ghij”)
v3 <- c(TRUE, TRUE, FALSE)
BUT! Vectors can only store data of one TYPE –> e.g. all numeric, all character, etc.
If you try to combine them –> it will homogenize them
v1 <- c(0.02, “Mary”, 1)
‘0.02’ ‘Mary’ ‘1’
The function c “coerces” arguments that are of mixed types (strings/text, real numbers, logical arguments, etc) to a common type.
Outline the data structure - Matrix & Array.
Matrix –> is a 2 dimensional vector (has both rows and columns)
mat1 <- matrix(1:25, 5, 5)
This creates a matrix with number 1 to 25 in a 5x5 grid.
dim(mat1) –> retrieves the dimensions of the matrix
Array –> can store data in more than two dimensions (e.g., a stack of 2-D matrices).
For example…
array <- (1:50, c(5, 5, 2))
array1 produces two 5x5 matrices stacked on top of each other –> with variables going from 1 to 50
In matrices and arrays, do the data variable types need to be homogenous?
Just like vectors the data types MUST be Homogenous!
R will automatically homogenize it if you don’t
Outline the data structure - Data Frame
How can you build a data frame?
Data Frame
Very important & useful data structure
Why??? –> Each Column can have different data types or the individual column itself can contain mixed data types!
Building Data frames
- Build the individual columns –> like vectors
Col1
Col2
Col3
- Joing these columns into the same ‘spreadsheet’ using the data.frame function
MyDF <- data.frame(Col1, Col2, Col3)
- Change the Column names using the following code –> No Spaces allowed
names(MyDF) <- c(“MyFirstColumn”, “MySecondColumn”, “My.Third.Column”)
How can you select specific columns in a data frame?
You can target specific columns instead of having to print the entire data frame –> using the $ symbol
MyDF$MyFirstColumn
or….
You can also access specific rows/columns using numerical indexing
MyDF[R,C] –> Number of row and column –> if left blank all rows/columns will be considered
Outline the data structure - Lists
How can you create a list?
List –> just a way to combine shit together –> simple ordered collection of objects
List is used to collect a group of data objects of different sizes and types (e.g., one whole data frame and one vector can both be in a single list)
Creating a list
MyList <- list(species=c(“Quercus robur”,”Fraxinus excelsior”), age=c(123, 84))
Data frames are more flexible and you can use them for more things, why use matrices at all?
Problem is that Data frames are slow when working with large numbers and performing mathematical calculations Hence, in such case you should convert it to a matrix
BUT! For statistical analysis, plotting, etc –> Data frames are more convenient