Functions: Matrices and Dataframes Flashcards
exploring the data frame called bsale
head(bsale) # Show me the first few rows
str(bsale) # Show me the structure of the data
View(bsale) # Open the data in a new window
names(bsale) # What are the names of the columns?
nrow(bsale) # How many rows are there in the data?
calculate descriptives from column vectors
mean(bsale$age) # What was the mean age?
table(bsale$color) # How many boats were there of each color?
max(bsale$price) # What was the maximum price?
notice you have to specify both the dataframe as a whole and the column of data you want a statistic of
adding new columns to a data frame
bsale$id <- 1:nrow(bsale)
bsale$age.decades <- bsale$age / 10
bsale$profit <- bsale$price - bsale$cost
What was the mean price of green boats?
with(bsale, mean(price[color == “green”]))
matrix
can contain either character or numeric columns
combinations of vectors of the SAME LENGTH
data frame
can contain BOTH character or numeric columns
the more flexible and widely used type of data file in R
combinations of vectors
common functions to create matrices and data frames
cbind(), rbind()
cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows
matrix()
The matrix() function creates a matrix form a single vector of data. The function has 4 main inputs: data – a vector of data, nrow – the number of rows you want in the matrix, and ncol – the number of columns you want in the matrix, and byrow – a logical value indicating whether you want to fill the matrix by rows.
data.frame()
survey <- data.frame(“index” = c(1, 2, 3, 4, 5),
“sex” = c(“m”, “m”, “m”, “f”, “f”),
“age” = c(99, 46, 23, 54, 23))
functions for previewing matrices and data frames
changing a column name in a data frame
names(df)[names(df) == “old.name”] <- “new.name”
add a new column in a data frame
survey$sex <- c(“m”, “m”, “f”, “f”, “m”)
slicing with [ , ]
Return row 1
df[1, ]
df[, 5]
df[1:5, 2]
slicing example
Give me the rows 1-6 and column 1 of ToothGrowth
ToothGrowth[1:6, 1]
##[1] 4.2 11.5 7.3 5.8 6.4 10.0
slicing example 2
Give me rows 1-3 and columns 1 and 3 of ToothGrowth
ToothGrowth[1:3, c(1,3)]
##len dose
##1 4.2 0.5
##2 11.5 0.5
##3 7.3 0.5
slicing example 3
Give me the 1st row (and all columns) of ToothGrowth
ToothGrowth[1, ]
##len supp dose
##1 4.2 VC 0.5
slicing with logical vectors
Create a new df with only the rows of ToothGrowth
#where supp equals VC
ToothGrowth.VC <- ToothGrowth[ToothGrowth$supp == “VC”, ]
#where supp equals OJ and dose < 1
ToothGrowth.OJ.a <- ToothGrowth[ToothGrowth$supp == “OJ” & ToothGrowth$dose < 1, ]
subset()
Get rows of ToothGrowth where len < 20 AND supp == “OJ” AND dose >= 1
one of the most useful functions with using data frames
subset(x = ToothGrowth,
subset = len < 20 &
supp == “OJ” &
dose >= 1)
##len supp dose
##41 20 OJ 1
##49 14 OJ 1
create a subset data frame
oj <- subset(x = ToothGrowth,
subset = supp == “OJ”)
with()
The with() function helps to save you some typing when you are using multiple columns from a dataframe. Specifically, it allows you to specify a dataframe (or any other object in R) once at the beginning of a line – then, for every object you refer to in the code in that line, R will assume you’re referring to that object in an expression.
with() examples
health <- data.frame(“age” = c(32, 24, 43, 19, 43),
“height” = c(1.75, 1.65, 1.50, 1.92, 1.80),
“weight” = c(70, 65, 62, 79, 85))
# Save typing by using with() with(health, height / weight ^ 2) ##[1] 0.00036 0.00039 0.00039 0.00031 0.00025