Quizzes after midterm - the questions Flashcards

Question 1

Q

Which of the following commands will NOT work in R? (Two correct answers.)

t <- “Hello world!”
x <- c(“10”, 20, 30)

a. x[1] * 10
b. print(t)
c. sum(x)
d. t[1]

Answer

A

a. x[1] * 10
c. sum(x)

Question 2

Q

What will the following code output?
ldt_data$Freq <- c(5, 10, 15)
for (i in ldt_data$Freq) {
if (i < 10) {
print(“low”)
} else {
print(“high”)
}
}

a. low, high
b. high, high, high
c. low, high, high
d. low, low, high

Answer

A

c. low, high, high

Question 3

Q

What are the issues in the following R code? The digits on left side indicate line numbers. (Two correct answers.)
1 numbers <- c(1, 2, 3, 4, 5)
2 sum <- 0
3 for (i in numbers)
4 sum <- sum + i
5 print[sum]

a. Line 2 does not show the right way to assign a variable.
b. Everything in Line 5 needs to be inside a parenthesis.
c. Line 3 needs to end with a curly brace
d. Line 5 cannot have square brackets.

Answer

A

c. Line 3 needs to end with a curly brace
d. Line 5 cannot have square brackets.

Question 4

Q

What is the issue in the following R code?
x <- 10
y = 5
result <- x + y
print(results)

a. The operators “=” and “<-“ cannot be used in the same chunk of code
b. There is an error in writing the print function
c. The variable results is not defined
d. The assignment operator <- is incorrectly used

Answer

A

c. The variable results is not defined

Question 5

Q

A linguist is conducting a survey to find out which languages their local community speak at home. The survey requires participants to state the name(s) of the language(s) they speak at home. What type of data is the linguist working with?

a. Ratio
b. Ordinal
c. Interval
d. Nominal

Answer

A

d. Nominal

Question 6

Q

Marie is exploring the possibility of transforming different data types into ordinal data. Assist Marie in identifying the data type(s) that are suitable for such transformation.

a. Both interval and ratio data
b. Ratio data
c. Nominal data
d. Interval data

Answer

A

a. Both interval and ratio data

Question 7

Q

Imagine you are analyzing the data frame provided in the previous question in R. The name of the dataset is Dataset1. You want to exclude all of the Single people from your analysis. Which code do you use to generate a table that only contains data from non-single people?

filter(Dataset1, Single != 0)
filter(Dataset1, NOT Single == 0)
filter(Dataset1, Single == 0)
filter(Dataset1, Single =! 1)

Answer

A

filter(Dataset1, Single == 0)

Question 8

Q

Assuming the same dataset above, and you have run the following R expression? Which of the following are true. (Two correct answers.)

Dataset1 %>%
mutate(age_shifted = Age + 5)

a. Running names(Dataset1) next will print out names of four different columns.
b. Running names(Dataset1) next will print out names of five different columns.
c. It will convert the Age variable into ordinal data.
d. It will create a new column named “age_shifted”

Answer

A

a. Running names(Dataset1) next will print out names of four different columns.
d. It will create a new column named “age_shifted”

Question 9

Q

Assume the same Dataset1 as above. Which of the following R commands will not work?

a. Dataset1 %>%
mutate(Age_new = Age + 1) %>%
filter(Age > 50)
b. Dataset1 %>%
filter(Shoe size < 9) %>%
mutate(Shoe size > 4)
c. Dataset1 %>%
filter(Shoe size < 9) %>%
select(Age, Single, Married)
d. Dataset1 %>%
select(Age, Single, Married) %>%
filter(Shoe size < 9)

Answer

A

b. Dataset1 %>%
filter(Shoe size < 9) %>%
mutate(Shoe size > 4)
d. Dataset1 %>%
select(Age, Single, Married) %>%
filter(Shoe size < 9)

Question 10

Q

A phonetic study investigates whether there is a significant difference in the vowel duration between two dialects of English: Dialect A and Dialect B. The null hypothesis states that the mean vowel durations in the two dialects are equal. After conducting a statistical test, the researchers obtain a p-value of 0.08. Assuming a significance level of 0.05, what is the correct conclusion?

a. The p-value is greater than the significance level, so we fail to reject the null hypothesis.
b. There is insufficient evidence to reject the null hypothesis, but we cannot conclude that the mean vowel durations are equal.
c. The p-value is less than the significance level, so we reject the null hypothesis and conclude there is a difference.
d.There is insufficient evidence to reject the null hypothesis, so we conclude that the mean vowel durations are equal.

Answer

A

a. The p-value is greater than the significance level, so we fail to reject the null hypothesis.

Question 11

Q

Using the dataframe from the previous question, you want to inspect whether individuals with children earn more on average than those without children. Which of the following ggplot functions are appropriate for this task? (Two correct answers.)

a. ggplot(df, x=has_children, y=salary) + geom_histogram()

b. df %>% ggplot(aes(x=has_children, y=salary)) + geom_boxplot()

c. ggplot(df, aes(x=children), y=salary) + geom_boxplot()

d. ggplot(df, aes(x=has_children, y=salary)) + geom_boxplot()

Answer

A

b. df %>% ggplot(aes(x=has_children, y=salary)) + geom_boxplot()

d. ggplot(df, aes(x=has_children, y=salary)) + geom_boxplot()

Question 12

Q

Using the dataframe from Question 2, which of the following categorizations best describes the types of data for the variables? (Two correct answers.)

a. both years_at_university and salary are interval data
b. years_at_university is ordinal data, salary is interval data
c. salary is ratio data, has_children is nominal data
d. years_at_university is ratio data, has_children is nominal data

Answer

A

c. salary is ratio data, has_children is nominal data
d. years_at_university is ratio data, has_children is nominal data

Question 13

Q

Using the dataframe from Question 2, which of the following ggplot functions is most appropriate to visually examine whether there is a relationship between the time spent at university and whether participants have children?

a. ggplot(df) + geom_point(x=has_children, y=years_at_university)

b. ggplot(df, aes(x=years_at_university, y=has_children)) + geom_histogram()

c. ggplot(df, aes(x=years_at_university, y=has_children)) + geom_boxplot()

d. ggplot(df, aes(x=has_children, y=years_at_university)) + geom_point()

Answer

A

d. ggplot(df, aes(x=has_children, y=years_at_university)) + geom_point()

Question 14

Q

Which of the following statements are most accurate when analyzing a skewed distribution? (Two correct answers; choose the “best ones”)

a. Skewness affects measures like the median and standard deviation more than the mean.

b. The mean is always greater than the median in a positively skewed distribution.

c. The mean is often greater than the median in a positively skewed distribution.

d. In a negatively skewed distribution, the median is always greater than the mean.

Answer

A

b. The mean is always greater than the median in a positively skewed distribution.

d. In a negatively skewed distribution, the median is always greater than the mean.

Question 15

Q

In an acoustic phonetics study, a researcher wants to examine whether there is a significant association between the occurrence frequency of certain swearing words and the gender of the speakers. Which statistical test would be most appropriate for this analysis?

a. Unpaired U-test

b. Paired-samples t-test

c. Unpaired t-test

d. Chi-square test

Answer

A

d. Chi-square test

Question 16

Q

Following are the first 10 rows of a dataset containing 720 rows where each row represents one participant/person. (In the columns Single and Married, 0 indicates not single/married while 1 indicates the participants are single/married.)

John wants to statistically verify if the shoe size of married people are significantly different from those who are unmarried. He checked and found that the data for both married and unmarried people are normally distributed.

Which test of significance should John perform?

a. Unpaired U-test

b. Paired t-test

c. Paired U-test

d. Unpaired t-test

Answer

A

d. Unpaired t-test

Question 17

Q

With the same data as above, John forms a new null hypothesis: The mean shoe size of married people is significantly higher than that of unmarried people. John has already created two subsets of data for married and unmarried people.
Which of the following R function is appropriate for the task?

a. t.test(married_df$shoe_size, unmarried_df$shoe_size, alternative = “greater”)

b. t.test(married_df$shoe_size, unmarried_df$shoe_size, paired = T)

c. t.test(married_df$shoe_size, unmarried_df$shoe_size)

d. t.test(married_df$shoe_size, unmarried_df$shoe_size, alternative = “less”)

Answer

A

a. t.test(married_df$shoe_size, unmarried_df$shoe_size, alternative = “greater”)

Question 18

Q

Which of the following cases would prevent you from running an unpaired t-test? (Select two correct answers)

a. When there are exactly two groups to compare.

b. When the data in any of the groups are not normally distributed.

c. When the dependent variable in one group is measured on an ordinal scale.

d. When the datapoints are independent of each other.

Answer

A

b. When the data in any of the groups are not normally distributed.

c. When the dependent variable in one group is measured on an ordinal scale.

Question 19

Q

Which of the following are true about one-tailed vs two-tailed t-tests? (Select two correct answers)

a. A one-tailed test is appropriate when the hypothesis does not predict the direction of the effect.

b. A two-tailed test has a stricter threshold for rejecting the null hypothesis compared to a one-tailed test.

c. A two-tailed test is used when the research hypothesis predicts a specific direction of the effect.

d. A one-tailed test is more likely to detect a significant effect in a specific direction compared to a two-tailed test.

Answer

A

b. A two-tailed test has a stricter threshold for rejecting the null hypothesis compared to a one-tailed test.
d. A one-tailed test is more likely to detect a significant effect in a specific direction compared to a two-tailed test.

Question 20

Q