Quizzes after midterm - the questions Flashcards
Which of the following commands will NOT work in R? (Two correct answers.)
t <- “Hello world!”
x <- c(“10”, 20, 30)
a. x[1] * 10
b. print(t)
c. sum(x)
d. t[1]
a. x[1] * 10
c. sum(x)
What will the following code output?
ldt_data$Freq <- c(5, 10, 15)
for (i in ldt_data$Freq) {
if (i < 10) {
print(“low”)
} else {
print(“high”)
}
}
a. low, high
b. high, high, high
c. low, high, high
d. low, low, high
c. low, high, high
What are the issues in the following R code? The digits on left side indicate line numbers. (Two correct answers.)
1 numbers <- c(1, 2, 3, 4, 5)
2 sum <- 0
3 for (i in numbers)
4 sum <- sum + i
5 print[sum]
a. Line 2 does not show the right way to assign a variable.
b. Everything in Line 5 needs to be inside a parenthesis.
c. Line 3 needs to end with a curly brace
d. Line 5 cannot have square brackets.
c. Line 3 needs to end with a curly brace
d. Line 5 cannot have square brackets.
What is the issue in the following R code?
x <- 10
y = 5
result <- x + y
print(results)
a. The operators “=” and “<-“ cannot be used in the same chunk of code
b. There is an error in writing the print function
c. The variable results is not defined
d. The assignment operator <- is incorrectly used
c. The variable results is not defined
A linguist is conducting a survey to find out which languages their local community speak at home. The survey requires participants to state the name(s) of the language(s) they speak at home. What type of data is the linguist working with?
a. Ratio
b. Ordinal
c. Interval
d. Nominal
d. Nominal
Marie is exploring the possibility of transforming different data types into ordinal data. Assist Marie in identifying the data type(s) that are suitable for such transformation.
a. Both interval and ratio data
b. Ratio data
c. Nominal data
d. Interval data
a. Both interval and ratio data
Imagine you are analyzing the data frame provided in the previous question in R. The name of the dataset is Dataset1. You want to exclude all of the Single people from your analysis. Which code do you use to generate a table that only contains data from non-single people?
filter(Dataset1, Single != 0)
filter(Dataset1, NOT Single == 0)
filter(Dataset1, Single == 0)
filter(Dataset1, Single =! 1)
filter(Dataset1, Single == 0)
Assuming the same dataset above, and you have run the following R expression? Which of the following are true. (Two correct answers.)
Dataset1 %>%
mutate(age_shifted = Age + 5)
a. Running names(Dataset1) next will print out names of four different columns.
b. Running names(Dataset1) next will print out names of five different columns.
c. It will convert the Age variable into ordinal data.
d. It will create a new column named “age_shifted”
a. Running names(Dataset1) next will print out names of four different columns.
d. It will create a new column named “age_shifted”
Assume the same Dataset1 as above. Which of the following R commands will not work?
a. Dataset1 %>%
mutate(Age_new = Age + 1) %>%
filter(Age > 50)
b. Dataset1 %>%
filter(Shoe size < 9) %>%
mutate(Shoe size > 4)
c. Dataset1 %>%
filter(Shoe size < 9) %>%
select(Age, Single, Married)
d. Dataset1 %>%
select(Age, Single, Married) %>%
filter(Shoe size < 9)
b. Dataset1 %>%
filter(Shoe size < 9) %>%
mutate(Shoe size > 4)
d. Dataset1 %>%
select(Age, Single, Married) %>%
filter(Shoe size < 9)
A phonetic study investigates whether there is a significant difference in the vowel duration between two dialects of English: Dialect A and Dialect B. The null hypothesis states that the mean vowel durations in the two dialects are equal. After conducting a statistical test, the researchers obtain a p-value of 0.08. Assuming a significance level of 0.05, what is the correct conclusion?
a. The p-value is greater than the significance level, so we fail to reject the null hypothesis.
b. There is insufficient evidence to reject the null hypothesis, but we cannot conclude that the mean vowel durations are equal.
c. The p-value is less than the significance level, so we reject the null hypothesis and conclude there is a difference.
d.There is insufficient evidence to reject the null hypothesis, so we conclude that the mean vowel durations are equal.
a. The p-value is greater than the significance level, so we fail to reject the null hypothesis.
Using the dataframe from the previous question, you want to inspect whether individuals with children earn more on average than those without children. Which of the following ggplot functions are appropriate for this task? (Two correct answers.)
a. ggplot(df, x=has_children, y=salary) + geom_histogram()
b. df %>% ggplot(aes(x=has_children, y=salary)) + geom_boxplot()
c. ggplot(df, aes(x=children), y=salary) + geom_boxplot()
d. ggplot(df, aes(x=has_children, y=salary)) + geom_boxplot()
b. df %>% ggplot(aes(x=has_children, y=salary)) + geom_boxplot()
d. ggplot(df, aes(x=has_children, y=salary)) + geom_boxplot()
Using the dataframe from Question 2, which of the following categorizations best describes the types of data for the variables? (Two correct answers.)
a. both years_at_university and salary are interval data
b. years_at_university is ordinal data, salary is interval data
c. salary is ratio data, has_children is nominal data
d. years_at_university is ratio data, has_children is nominal data
c. salary is ratio data, has_children is nominal data
d. years_at_university is ratio data, has_children is nominal data
Using the dataframe from Question 2, which of the following ggplot functions is most appropriate to visually examine whether there is a relationship between the time spent at university and whether participants have children?
a. ggplot(df) + geom_point(x=has_children, y=years_at_university)
b. ggplot(df, aes(x=years_at_university, y=has_children)) + geom_histogram()
c. ggplot(df, aes(x=years_at_university, y=has_children)) + geom_boxplot()
d. ggplot(df, aes(x=has_children, y=years_at_university)) + geom_point()
d. ggplot(df, aes(x=has_children, y=years_at_university)) + geom_point()
Which of the following statements are most accurate when analyzing a skewed distribution? (Two correct answers; choose the “best ones”)
a. Skewness affects measures like the median and standard deviation more than the mean.
b. The mean is always greater than the median in a positively skewed distribution.
c. The mean is often greater than the median in a positively skewed distribution.
d. In a negatively skewed distribution, the median is always greater than the mean.
b. The mean is always greater than the median in a positively skewed distribution.
d. In a negatively skewed distribution, the median is always greater than the mean.
In an acoustic phonetics study, a researcher wants to examine whether there is a significant association between the occurrence frequency of certain swearing words and the gender of the speakers. Which statistical test would be most appropriate for this analysis?
a. Unpaired U-test
b. Paired-samples t-test
c. Unpaired t-test
d. Chi-square test
d. Chi-square test