R-Studio Code for Intro to Statistics, Module 4 Flashcards
We have a data set “wild_cats” which has 13 rows and 3 columns. Only the first 5 rows have actual species under the column “Spp” (species are “cheetah”, “lion”, “ocelot”, “lynx”, and “tiger”).
The columns “BodyWt” and “BrainWt” give values for these five species, but rows 6-13 just have the values “NA” under the “BrainWt” column and rows 6-13 under the “Spp” and “BodyWt” columns are blank.
How can we go about shrinking the data set to just the relevant species?
query_cats = is.na(wild_cats$BrainWt)
index_cats = which(query_cats)
wild_cats_new = wild_cats[-index_cats , ]
Now that we have a table only of our five relevant cat species, we want to make a scatter-plot which relates the values in the “BodyWt” and “BrainWt” columns to one another.
We want “BodyWt” to be the x-variable and “BrainWt” to be the y-variable.
What line of code should we write?
plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt)
We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to change our open circles on the graph to solid black equilateral triangles.
What is the new line of code we run to generate a graph with such points?
plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
pch = 17)
We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to change the label on the x-axis to “Body Weight (kg)” and the label on the y-axis to “Brain Weight (g)”.
What is the new line of code we run to generate a graph with such points?
plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
xlab = “Body Weight (kg)” ,
ylab = “Brain Weight (g)”)
We have generated a scatter-plot relating body weight and brain weight in the “wild_cats_new” data set, but we want to specify that the x-axis starts at 0 kilograms and runs to 250 kilograms, and the y-axis starts at 150 grams and runs to 1500 grams.
What is the new line of code we run to generate a graph with such points?
plot(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
xlim = c(0 , 250) ,
ylim = c(150 , 1500))
What line of code would we write if we wanted to hard code the word “TIGER” at the coordinates (62 , 1320) on a scatter-plot?
text(x = 62 , y = 1320, labels = “TIGER”)
What line of code would we write if we wanted to assign appropriate labels to all five of the wild cat species in our scatter-plot, with the labels themselves appearing to the right of the plotted symbol?
text(x = wild_cats_new$BodyWt ,
y = wild_cats_new$BrainWt ,
labels = wild_cats_new$Spp ,
pos = 4)
We have the data set “malaria_Afr” which has nine columns and 1,508 samples.
The columns are:
“outdoor_occupation” (0 or 1),
“microsc” (0 or 1 or NA),
“pcr” (0 or 1 or NA),
“x” (numbers between 702725.8 and 715037.3),
“y” (numbers between 8913068 and 8928818),
“gender” (0 or 1),
“age” (numbers between 0.25 and 90),
“time_Afr” (numbers between 0 and 36), and
“occupation” (14 different jobs listed)
We want to create a table of the data based on the “occupation” column and to call it “occ_table”.
What code should we write?
occ_table = table(malaria_Afr$occupation)
After establishing our “occ_table” from the “malaria_Afr” data set, we want to establish a bar plot to show the occupations the 1,508 individuals sampled in the data work in.
What code should we write?
barplot(occ_table)
After writing “barplot(occ_table)”, we want to make our labels on the x-axis vertical so that they will all fit on the projected plot.
How should the code be altered?
barplot(occ_table, las = 2)
After establishing our bar plot on occupations from the “malaria_Afr” data set with the vertical labels, we want to add an overall title called “Primary Occupation”.
How should the code be altered?
barplot(occ_table, las = 2,
main = “Primary Occupation”)
After establishing our “Primary Occupation” bar plot, we decide that we want the bars to be red rather than the default gray.
How should the line of code be altered?
barplot(occ_table, las = 2,
main = “Primary Occupation”,
col = “red”)
Now that our “Primary Occupation” bar plot has red bars, we decide we want the y-axis to be called “Number of Samples” and the x-axis to be called “Profession”.
How should the line of code be altered?
barplot(occ_table, las = 2,
main = “Primary Occupation”,
col = “red”,
ylab = “Number of Samples”,
xlab = “Profession”)
We want to examine how the variable “age” varies based on the variable “occupation” from the “malaria_Afr” data set, and we want to examine this as a box plot. We also want to establish a y-axis label of “Age [years]”.
What would be the correct line of code?
boxplot(age ~ occupation,
data = malaria_Afr, ylab = “Age [years]”)
In the “malaria_Afr” data set, the column “gender” uses a binary system where ‘0’ means men and ‘1’ means women. We want to change this to the actual words for the establishment of a box plot.
What is a query sequence we could use to change the numbers to the words ‘Men’ and ‘Women’?
malaria_Afr$gender_M = ‘Men’
query_M = malaria_Afr$gender==0
index_M = which(query_M)
malaria_Afr$gender_M[index_M] = ‘Women’