Quiz Questions Flashcards

1
Q

Do setting the working directory and creating a project accomplish the same task? “yes” or “no”.

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When writing in vectors or variables from scratch in R, which variable takes quotes?

A

character variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What file types can you load into R?

A

csv .csv
excel .xlsx
R .Rdata
stata .dta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What command would you use to make the equivalent of an Excel Pivot Table in R? What is the code?

A

summarize data
summary(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the table( ) command do in R?

A

It helps you count values in variables or vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does ifelse( ) do? Provide an example to demonstrate your knowledge of the command.

A

It helps you classify values one way or another–often with a binary, 0 or 1, classification.

For example, let’s say I have a data of 2 test scores from Jim and Valerie, and I wanted to classify if they were A grades, I could write something like the following:
# create data frame
test_scores = data.frame(student = c(“Jim”, “Valerie”),
test_score = c(80,91))
# create variable to classify whether scores were an A
library(tidyverse)
test_scores =
test_scores %>%
mutate(A_grade = ifelse(test_score>=90,1,0))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculate the sample standard deviation of the following three numbers: 3, 6, and 9. Please show all of your work (hint: sample mean > sample variance > standard deviation).

A

∑𝑥i 3+6+9 18
____ = ________ = ______ = 6
N 3 3

∑ (𝑥i − 𝑥)̄² 3−6=(−3)² =9
_________ = 6−6=(0)² =0
N - 1 9-6=(3)²=9
9+9=18/2-1=9

√9 = 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you had a choice between having your data normally distributed or distributed according to any other distribution, which one would you pick and why? Hint: you may want to draw something.

A

I would pick a normal distribution. It takes the shape of a bell curve, under which the standard deviations from the mean are predictable and evenly spaced out due to unimodality. Relatedly, the normal distribution is the foundation for most statistical inference.

Then draw graph: Standard Normal Distribution with Standard Deviation Labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Draw a boxplot with its five main elements, ensuring to explain each of them.

A
  • minimum: the lowest value of the data
  • first quartile (Q1): 25th percentile of the data
  • median (Q2): the middle value of the data (not the mean!)
  • third quartile (Q3): 75th percentile
  • a maximum: the highest value of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are frequency distributions different from probability distributions? Show your understanding of the difference using an example. When writing your answer, make sure to mention the Central Limit Theorem and Law of Large Numbers.

A

Frequency distributions capture how often values occur in true data, whereas probabil- ity distributions capture what happens to our data in theory. When our sample size approaches infinity (i.e., as 𝑁 → ∞), our frequency distribution converges on the true probability distribution according to the Law of Large Numbers. However, that is not all we learn as 𝑁 → ∞. More specifically, per the Central Limit Theorem, we learn that our data become more normally distributed as 𝑁 → ∞. That is a good thing per the answer to question 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an estimand?

A

the true quantity of interest in a statistical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the 𝑧-score associated with a 95% confidence interval?

A

1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you had the choice between a highly biased but reliable estimate or a highly noisy estimate, which one would you pick and why?

A

I would pick the highly biased but reliable estimate: with this estimate, I could easily just adjust for the bias. By contrast, the noisy estimate would be very difficult to get any grasp on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Type II error in the context of Null Hypothesis Significance Testing (NHST)? Provide an example to show that you understand the concept.

A

A Type II error is a false negative: that is, failing to reject to reject the null hypothesis when it is true.

Example: We know that over a large sample size, eating ice cream generally makes Denly very happy. Thus, eating ice cream generally is going to push Denly’s happiness toward the positive right tail of the distribution. By contrast, if we ran a test and found that eating ice cream did not me happy, that would generally be a false negative. In other words, we should have rejected the null hypothesis that eating ice cream has no effect on my happiness. Because we didn’t do so, that would be a false negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which is bigger—a 95% confidence interval or a 99% confidence interval? Draw them to show that you understand.

A

The 99% confidence interval is bigger because of the pink tails 1%, in contrast to 95% CI’s which is 5%, so more of your data is getting captured in that 99% CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly