Data Analytics R Flashcards
Add 5 and 49
5+49
Subtract 5 from 49
49-5
Display a sequence of integers from 1 to 20
1:20
Multiply 3 by 5
3*5
Divide 12 by 4
12 / 4
Two to the power of three
2^3
Square root of 2
sqrt(2)
Sine of pi
sin(pi)
Exponential of 1
exp(1)
Log 10 to the base of e
log(10)
Log 10 to the base of 10
log(10, base = 10)
Add a comment in R
#
Add a mass comment in R
Control, shift, C
Remove a variable
rm(var) or remove(var)
How do you investigate a dataset or function in R?
?function
What is the first thing you should do when presented with a dataset?
Investigate the variables - use head(dataset) or ?dataset in the console.
How do you refer to variables created in the R markdown?
Backticks r var_name
How do you print to the markdown?
- Type the variable name that you want to print directly
- Paste()
- Paste0()
How do you assign to an object? What is the benefit of this?
Use <-
We can store it in the R workspace and save it for future use.
How do you calculate the mean?
mean(x)
How do you calculate the variance?
var(x)
Longhand: sum((x-mean(x))^2)/(n-1)
Find the size / the number of objects in a vector or list.
length(x)
Find the maximum, minimum and range of a vector of objects.
- max(x)
- min(x)
- range(x) - this will paste both
What function is used to collect things together into a vector?
x <- c(0, 7, 8)
What is a vector in R?
A vector is a sequence of data elements of the same basic type. Members of a vector are called Components.
Join the vector x and y together
c(x, y)
Extract the 4th, 6th and 8th elements of a vector
x[ c(4, 6, 8) ]
Extract everything from the 3rd to 9th element
x[ 3:9 ]
Extract the second element of a vector
x[ 2 ]
Remove everything from the 3rd to 9th element
x[ - (3:9) ]
Can arithmetic operators be performed element-by-element on vectors?
Yes, the operation is performed on each element.
Eg y^x is y1 ^ x1, y2 ^ x2, y3 ^ x3 etc.
- If you sum two vectors in R it takes the element-wise sum
What functions can be used to obtain patterned vectors?
- rep()
- seq()
How do you generate a sequence?
- seq()
seq(from, to, by, length.out)
from: Starting element of the sequence
to: Ending element of the sequence
by: Difference between the elements
length.out: Maximum length of the vector
How do you replicate elements in a vector?
rep(x, times = 1, length.out = NA, each = 1)
x: The object to replicate
times: The number of times to replicate object
length.out: Repeated x as many times as necessary to create vector of this length
each: Number of times to replicate individual elements in object
Create a character vectors containing three colours
colours <- c(“red”, “yellow”, “green”)
Extract or replace substrings in a character vector.
substr(x, start, stop)
x: the current character vector
start: position of digit to start at
stop: position of digit to end at
What are different ways to format pastes?
sep = “:”
Leveraging ‘collapse’ to combine a vector into a single string - paste(c(‘Apple’, ‘Banana’, ‘Cherry’), collapse=’, ‘)
What four things should be at the top of an R notebook?
install.packages(“tidyverse”)
# install.packages(“plotly”)
library(tidyverse)
library(plotly)
What does paste() do for a vector?
Prints individually for each element
How do you print vectors as one whole (ie not like paste which prints each element individually)?
cat()
Need to have “\n” to break between prints.
How do you find the remainder of division?
Modulo - %%
What does a variable allow you to do?
Store a value or function in R
What types of data are used in R?
- Numerics
- Integers (which are also numerics)
- Logical
- Characters (text or string)
How can you check the data type of a variable?
class() function
- class(varName)
What is a vector?
A one-dimensional array that can hold numeric data, character data or logical data. It is a simple tool to store data.
How do you create a vector?
The combine function c().
Place the vector elements separated by a comma.
Can use the created vector to do calculations.
How do you name a vector? Why might you want to do this?
names(vector) function
- names(some_vector) <- c( “Name”, “Profession”)
- Naming aids understanding of the data you are using, and what each element refers to
How can you calculate the sum of all elements in a vector?
sum(x)
How do you compare values?
What are these called?
< for less than
> for greater than
<= for less than or equal to
>= for greater than or equal to
== equal to each other
!- not equal to each other
Relational operators
What is the comparison operator for equal to?
==
How do you select elements of a vector (or matrix or dataframe)?
Use of square brackets - indicate which element you want to select eg vector[3] or vector[c(2,3,4)] or vector[2:4].
Or could use the names of the vector elements (assigned with names(vector)) eg vector[“Position”]
What is the index of the first element of a vector in R?
1
What is returned when you use comparison operators on a vector in R?
The command tests every element of the vector to see if the condition stated by the comparison operator is true or false. Get a vector of logicals.
Instead of selecting a subset of days to investigate yourself, you can get R to return only the days with eg a positive return. How do you do this?
selection_vector <- vector > 0
new_vector <- vector[selection_vector]
- R knows how to handle it when you pass in a logical vector into the square brackets. It will only select elements that correspond to TRUE in the selection vector.
What is a matrix in R?
A collection of elements of the same data type arranged into a fixed number of rows and columns (2D)
How do you construct a matrix in R?
The matrix() function
matrix(1:9, byrow = TRUE, nrow = 3)
or matrix(c(1,2,3,4,5,6,7,8,9), byrow = TRUE, nrow = 3)
or matrix(1:6, nrow = 2, ncol = 3)
- byrow indicates that the matrix is filled by the rows (FALSE if the matrix is filled by the columns)
Can also add in argument dimnames = list(colnames, rownames) where colnames and rownames are vectors. Therefore there is no “ “
How do you name a matrix in R? Why would you want to do this?
- You can add names for the rows and columns:
rownames(matrix)
colnames(matrix) - Naming a matrix helps us read the data and is useful for selecting certain elements from the matrix
How do you calculate the sum of each row in a matrix?
rowSums(matrix)
What function do you use to add a new column to a matrix?
The cbind() function - this merges matrices and/or vectors together by a column
- cbind(matrix1, matrix2, vector)
What function do you use to add a new row to a matrix?
The rbind() function
How do you investigate the contents of the workplace?
The ls() function
How do you select elements from a matrix?
Square brackets
- matrix[1,2] selects the element at the first row and second column
- matrix[,1] selects all elements of the first column
- matrix[1,] selects all elements of the first row
How do standard operators like + / - * work on matrices in R?
Standard operators work in an element-wise way on matrices in R.
NB: The matrix1 * matrix2 creates a matrix where each element is the product of the corresponding elements. This is not standard matrix multiplication (achieved by %*%)
What is a factor?
A factor is a statistical data type used to store categorical variables.
How do you create factors in R?
The factor() function.
factor_vector <- factor(vector)
Firstly, you need to create a vector that contains all observations that belong to a limited number of categories.
By default the function factor() transforms a vector into an unordered factor.
How do you create an ordered vector of factors?
This is possible for ordinal categorical variables, those with a natural ordering.
factor_temp_vector <- factor (temp_vector, order = TRUE, levels = c(“low”, “medium”, “high”)
How would you change the names of factors?
Change the names of the factor levels for clarity or other reasons using the function levels()
levels(x) <- c(“name1”, “name2”, …)
Check the order which you assign levels
What do you need to check before renaming factor levels?
The order of the current labels - check the output. R will automatically assign alphabetically if the order is not assigned.
How can you give a quick overview of the contents of a variable?
summary()
Male and female are what kind of factor levels?
Unordered or normal - using comparator operators is meaningless, R returns NA. R attaches an equal value to the levels for such factors.
How do you create an ordered factor?
Use the factor() function with two additional arguments.
factor(x, ordered = TRUE, levels = c(“lev1”, “lev2”…))
It may be more efficient to internally code the levels of the factor as integers. How do you do this?
as.integer(x), where x is the vector containing categorical variables
How do you output the levels of a vector?
levels(x)
What is an array in R?
An array is a more general way to store data. Array objects can hold two or more than two-dimensional data.
How do you create an array?
The array() function
- eg a <- array(1:24, c(3,4,2))
- array(numbers, dimensions)
- creates a 3x4x2 array
What is a data frame in R?
R provides a data structure, called a data frame, for collecting vectors into one object, which we can imagine as a table. More specifically, a data frame is an ordered collection of vectors, where the vectors must all be the same length but can be different types.
They are like matrices but the columns have their own names. Columns can be different types.
How do you investigate the column names of a data frame?
names(df)
How do you access the columns in the data frame?
Use the $ symbol
df$colName - this alone will print the data in the column and any associated levels
What function applies a function to each value in a vector?
sapply(x, function)
How do you check if two vectors are completely identical (same length and same elements in the same position)?
all(x == y): This checks if all corresponding elements in x and y are equal. It returns TRUE if all comparisons are TRUE, and FALSE otherwise.
What does the for() statement do?
Allows one to specify that a certain operation should be repeated a fixed number of times.
How do you create a vector of length 12, which can hold numeric values?
Fib <- numeric(12)
How do you assign the value 1 to the first two elements of a vector?
vector[1:2] <- 1
What is the format of a for loop in R?
eg for (i in 3:12)
What does an if statement allow you to do?
Allows you to control the statements that are executed
What are functions?
Self-contained units of code. They generally take inputs, do calculations and produce outputs.
What is the format of a function?
fun <- function(y) {
x <- 3*y
}
What does attach() do?
Allows you to use column names directly - used when using data from an imported package. Otherwise would need to use df$colName
What happens when you try to add these two vectors? p <- c(3,5,6,8)
q <- c(3,3,3)
You get a warning message about object length.
R uses the recycle rule when vectors have different lengths, i.e. it re-uses elements from the shorter vector (starting at the beginning of the vector).
I want to select only the rows from a data frame where the column “Gender” has “M” in it. How do I do this?
output <- df[df$Gender == “M”, ]
How do you count the number of NA elements in a vector?
num_NA <- sum(is.na(x))
Does the length() function consider NA values?
Yes
What are two ways to remove NAs from a vector?
na.omit(x) or x[!is.na(x)]
How do you remove all rows with NA values?
df[complete.cases(df), ]
complete.cases(df): This function returns a logical vector indicating which rows have no NA values. Rows that have no missing values are marked as TRUE.
When creating a function, how do you create a default argument? eg raising one number to the power of another
powerFunc <- function(base, exponent = 2) {
result <- base^exponent
print(result)
}
if a second parameter isn’t specified, it uses 2 as default
What is a data frame in R?
A data frame collects vectors into one object, which we can imagine as a table. More specifically, a data frame is an ordered collection of vectors, where the vectors must all be the same length but can be different types.
A dataframe has the observations as rows and the variables as columns.
What functions can you use to get an oversight of very large data frames?
head() shows the first observations and tail() shows the last observations. Both print a top line called “header” which contains the different variables in the dataset.
Another method used to get rapid oversight of the data is the function str() - this shows the structure of the dataset. Structure: no obs, no vars, list of var names, data type, first obs
How do you construct a data frame?
Using data.frame(), you can pass in all the vectors of equal length.
Use str() to confirm your understanding of the data frame.
How do you select elements from a data frame?
Square brackets
df[1,2] selects the value at the first row and second column
df[1,] selects all elements of the first row
Can use variable names for columns as well as numerics.
How do you select an entire column of a data frame?
df[, colName]
df$colName
How do you select elements from a data frame with a certain condition in a certain column?
The subset() function
subset(df, subset = condition)
eg subset(df, subset = rings)
eg subset(df, subset = (diameter < 1))
How do you sort the data according to a certain variable in the dataset?
order(x) - gives you the ranked position of each element when applied to a variable
x[order(x)] - using the output of order(x) to rank the vector, producing the vector rearranged in ascending order.
For a df:
positions <- order(df$column)
df[positions, ]
What is a list in R?
A list in R programming is a generic object consisting of an ordered collection of objects.
It allows you to gather a variety of objects under one names.
How do you construct a list?
The list() function
The arguments are the list components
How do you construct a named list?
Naming the components is useful
list <- list(name1 = var1, name2 = var2)
If you want to name list components after it has been created - use the names() function
names(list) <- c(“name1”, “name2”)
How do you name the components of a list?
Use the names() function
names(list) <- c(“name1”, “name2”)
How do you select elements from a list?
A list is built with numerous elements and components, so getting a single element is not always straightforward,
One way to select a component is using the numbered position (note double brackets)
list[[1]]
You can also refer to the names of the components
list[[“reviews”]]
list$reviews
Can also select specific elements from a component in the list
list[[1]][2] - select from the first component the second element
How would you change all values that are 0 in the column X in data to 2.
data$X[data$X == 0] <- 2
How can you output the even rows of a data frame?
rows <- nrow(data)
even_rows <- seq_len(rows) %% 2
data[even_rows == 0, ]
even_rows <- df[seq(1, nrow(df)) %% 2 == 0, ]