Quiz Questions Flashcards
Do setting the working directory and creating a project accomplish the same task? “yes” or “no”.
yes
When writing in vectors or variables from scratch in R, which variable takes quotes?
character variables
What file types can you load into R?
csv .csv
excel .xlsx
R .Rdata
stata .dta
What command would you use to make the equivalent of an Excel Pivot Table in R? What is the code?
summarize data
summary(data)
What does the table( ) command do in R?
It helps you count values in variables or vectors.
What does ifelse( ) do? Provide an example to demonstrate your knowledge of the command.
It helps you classify values one way or another–often with a binary, 0 or 1, classification.
For example, let’s say I have a data of 2 test scores from Jim and Valerie, and I wanted to classify if they were A grades, I could write something like the following:
# create data frame
test_scores = data.frame(student = c(“Jim”, “Valerie”),
test_score = c(80,91))
# create variable to classify whether scores were an A
library(tidyverse)
test_scores =
test_scores %>%
mutate(A_grade = ifelse(test_score>=90,1,0))
Calculate the sample standard deviation of the following three numbers: 3, 6, and 9. Please show all of your work (hint: sample mean > sample variance > standard deviation).
∑𝑥i 3+6+9 18
____ = ________ = ______ = 6
N 3 3
∑ (𝑥i − 𝑥)̄² 3−6=(−3)² =9
_________ = 6−6=(0)² =0
N - 1 9-6=(3)²=9
9+9=18/2-1=9
√9 = 3
If you had a choice between having your data normally distributed or distributed according to any other distribution, which one would you pick and why? Hint: you may want to draw something.
I would pick a normal distribution. It takes the shape of a bell curve, under which the standard deviations from the mean are predictable and evenly spaced out due to unimodality. Relatedly, the normal distribution is the foundation for most statistical inference.
Then draw graph: Standard Normal Distribution with Standard Deviation Labels
Draw a boxplot with its five main elements, ensuring to explain each of them.
- minimum: the lowest value of the data
- first quartile (Q1): 25th percentile of the data
- median (Q2): the middle value of the data (not the mean!)
- third quartile (Q3): 75th percentile
- a maximum: the highest value of the data
How are frequency distributions different from probability distributions? Show your understanding of the difference using an example. When writing your answer, make sure to mention the Central Limit Theorem and Law of Large Numbers.
Frequency distributions capture how often values occur in true data, whereas probabil- ity distributions capture what happens to our data in theory. When our sample size approaches infinity (i.e., as 𝑁 → ∞), our frequency distribution converges on the true probability distribution according to the Law of Large Numbers. However, that is not all we learn as 𝑁 → ∞. More specifically, per the Central Limit Theorem, we learn that our data become more normally distributed as 𝑁 → ∞. That is a good thing per the answer to question 1.
What is an estimand?
the true quantity of interest in a statistical analysis
What is the 𝑧-score associated with a 95% confidence interval?
1.96
If you had the choice between a highly biased but reliable estimate or a highly noisy estimate, which one would you pick and why?
I would pick the highly biased but reliable estimate: with this estimate, I could easily just adjust for the bias. By contrast, the noisy estimate would be very difficult to get any grasp on.
What is a Type II error in the context of Null Hypothesis Significance Testing (NHST)? Provide an example to show that you understand the concept.
A Type II error is a false negative: that is, failing to reject to reject the null hypothesis when it is true.
Example: We know that over a large sample size, eating ice cream generally makes Denly very happy. Thus, eating ice cream generally is going to push Denly’s happiness toward the positive right tail of the distribution. By contrast, if we ran a test and found that eating ice cream did not me happy, that would generally be a false negative. In other words, we should have rejected the null hypothesis that eating ice cream has no effect on my happiness. Because we didn’t do so, that would be a false negative.
Which is bigger—a 95% confidence interval or a 99% confidence interval? Draw them to show that you understand.
The 99% confidence interval is bigger because of the pink tails 1%, in contrast to 95% CI’s which is 5%, so more of your data is getting captured in that 99% CI