Statistics Flashcards
Which measure of central tendency represents the most frequently occurring value in a dataset?
A) Mean
B) Median
C) Mode
D) Range
C) Mode
Which measure of central tendency is best used when the dataset contains outliers?
A) Mean
B) Median
C) Mode
D) Range
B) Median
- difference between highest and lowest observation in a data
Range
True or False
The median is less affected by outliers compared to the mean, making it a better measure of central tendency when outliers are present
True
Which measure of variability indicates the average distance of each data point from the mean?
A) Range
B) Interquartile range
C) Variance
D) Standard deviation
D) Standard deviation
used to measure how far the data values are dispersed from the mean
variance
True or False
Standard deviation measures the average distance of each data point from the mean, providing insight into the spread of the data.
true
True or False
In a positively skewed distribution, the mean is lower than the median, which is lower than the mode, due to the tail on the right side.
False
In a positively skewed distribution, the mean is greater than the median, which is greater than the mode, due to the tail on the right side
Which of the following is a measure of the central location of a dataset?
A) Standard deviation
B) Variance
C) Median
D) Interquartile range
C) Median
Which measure of spread is defined as the difference between the first quartile and the third quartile?
A) Range
B) Standard deviation
C) Variance
D) Interquartile range
D) Interquartile range
In hypothesis testing, what is the p-value?
A) The probability of accepting the null hypothesis
B) The probability of rejecting the null hypothesis when it is true C) The probability of observing the test results under the null hypothesis
D) The level of significance
C) The probability of observing the test results under the null hypothesis
Which of the following is a Type I error?
A) Rejecting the null hypothesis when it is true
B) Accepting the null hypothesis when it is false
C) Failing to reject the null hypothesis when it is false
D) Failing to accept the null hypothesis when it is true
A) Rejecting the null hypothesis when it is true
If you are going to describe the findings of a survey about what annual income is for the people of Makati City, in which you have both extremely wealthy and extremely poor people, which two measures would you use?
A) Mean and Mode
B) Mean and Range
C) Mean and Standard Deviation
D) Mode and Standard Deviation
C) Mean and Standard Deviation
In a confidence interval, what does the margin of error represent?
A) The range of values within which the population parameter lies B) The standard deviation of the sample
C) The maximum error allowed in the estimate
D) The sample mean
C) The maximum error allowed in the estimate
indicates the range within which we expect the true population parameter to lie
margin of error
What does a confidence level of 95% mean?
A) There is a 95% probability that the sample mean is within the confidence interval
B) 95% of the population data lies within the confidence interval C) 95% of the time, the true population parameter lies within the confidence interval
D) There is a 5% chance that the sample mean lies outside the confidence interval
C) 95% of the time, the true population parameter lies within the confidence interval
What is the purpose of a t-test?
A) To compare the variances of two populations
B) To compare the means of two populations
C) To test the independence of two variables
D) To test the relationship between two variables
B) To compare the means of two populations
values wanted to explain or forecast; values depend on something else; denote it as y
Dependent Variable
explains the other one; denote it as x
Independent Variable
point where the regression line crosses the Y-axis, representing the value of Y when X is zero
intercept
What does the coefficient of determination (R2) indicate?
A) The strength of the linear relationship between two variables
B) The percentage of variation in the dependent variable explained by the independent variable
C) The slope of the regression line
D) The correlation between two variables
B) The percentage of variation in the dependent variable explained by the independent variable
explaining or predicting a single Y variable from two or more X variables
Multiple Regression Analysis
occurs when independent variables in a regression model are highly correlated
Multicollinearity
Which of the following best describes heteroscedasticity in regression analysis?
A) The error terms have constant variance
B) The error terms have increasing or decreasing variance
C) The error terms are normally distributed
D) The error terms are autocorrelated
B) The error terms have increasing or decreasing variance
means that the variance of the error terms changes across observations
Heteroscedasticity
to find the relationships between two data factors
Logistic Regression
In logistic regression, what type of dependent variable is used?
A) Continuous
B) Ordinal
C) Nominal
D) Binary
D) Binary
Binary Explanation: Logistic regression is used for modeling binary (____) outcomes
(dichotomous)
What is the purpose of an ANOVA test?
A) To compare the means of two groups
B) To compare the variances of two groups
C) To compare the means of three or more groups
D) To test the independence of two categorical variables
C) To compare the means of three or more groups
In ANOVA, what does a significant F-test indicate?
A) All group means are equal
B) At least one group mean is different
C) The variances are equal across groups
D) There is a linear relationship between groups
B) At least one group mean is different
What is the null hypothesis in a chi-square test of independence?
A) The two variables are independent
B) The two variables are dependent
C) The two variables have equal variances
D) The two variables have equal means
A) The two variables are independent
purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.
chi-square test
In a chi-square test, what is the expected frequency?
A) The observed frequency in each category
B) The frequency expected if the null hypothesis is true
C) The sum of observed frequencies
D) The total number of observations
B) The frequency expected if the null hypothesis is true
Which of the following is a characteristic of a simple random sample?
A) Each member of the population has an equal chance of being selected
B) The population is divided into subgroups and samples are taken from each subgroup
C) Samples are chosen based on convenience
D) Samples are chosen based on specific characteristics
A) Each member of the population has an equal chance of being selected
What is stratified sampling?
A) Dividing the population into strata and randomly selecting samples from each stratum
B) Selecting samples based on convenience
C) Selecting every nth member of the population
D) Grouping the population into clusters and randomly selecting clusters
A) Dividing the population into strata and randomly selecting samples from each stratum
What is the main advantage of using a larger sample size?
A) It reduces the population size
B) It increases the standard error
C) It increases the accuracy of the sample mean
D) It reduces the variability of the population
C) It increases the accuracy of the sample mean
The coefficient of variation (CV) is measured in terms of what unit?
A) same unit with the data
B) squared unit
C) percent
D) square root of the given unit
C) percent
standardized measure of the dispersion of a probability distribution or frequency distribution
coefficient of variation (CV)
What is the shape of the normal distribution?
A) Skewed left
B) Skewed right
C) Symmetrical bell-shaped
D) Uniform
C) Symmetrical bell-shaped
Which of the following best describes the central limit theorem?
A) The sum of a large number of random variables is normally distributed
B) The mean of a large number of random variables is normally distributed
C) The variance of a large number of random variables is normally distributed
D) The median of a large number of random variables is normally distributed
B) The mean of a large number of random variables is normally distributed
states that the distribution of the sample mean approaches a normal distribution as the sample size becomes large
central limit theorem
What does the term ”statistical power” refer to?
A) The probability of making a Type I error
B) The probability of making a Type II error
C) The probability of correctly rejecting the null hypothesis
D) The probability of accepting the null hypothesis
C) The probability of correctly rejecting the null hypothesis
likelihood that a test will detect an effect when there is an effect to be detected
Statistical power
What is the purpose of standardizing a variable?
A) To change the variable’s mean to 1
B) To change the variable’s standard deviation to 0
C) To make the variable’s mean 0 and standard deviation 1
D) To convert the variable to a binary format
C) To make the variable’s mean 0 and standard deviation 1
What is the purpose of a boxplot?
A) To display the frequency of data
B) To show the distribution of data based on a five-number summary
C) To display the relationship between two variables
D) To show the central tendency of data
B) To show the distribution of data based on a five-number summary
visually displays the distribution of a dataset using the minimum, first quartile, median, third quartile, and maximum
Boxplot
Which of the following statements about measures of variability must always be true if the standard deviation is equal to 1?
A) The standard deviation is equal to the variance
B) The standard deviation is less than the variance
C) The standard deviation is greater than the variance
D) None of the above
A) The standard deviation is equal to the variance
Which of the following statements is true about a normal distribution?
A) It is skewed to the right
B) It is skewed to the left
C) It is symmetric about the mean
D) It has two peaks
C) It is symmetric about the mean
What is the purpose of using a scatter plot?
A) To display the frequency of different categories
B) To show the relationship between two variables
C) To compare the means of different groups
D) To show the distribution of a single variable
B) To show the relationship between two variables
What does the null hypothesis in hypothesis testing typically state?
A) There is an effect or difference
B) There is no effect or difference
C) The effect or difference is greater than expected
D) The effect or difference is less than expected
B) There is no effect or difference
typically states that there is no effect or difference, serving as a starting point for statistical testing
null hypothesis
What is a confidence interval?
A) A range of values within which the sample mean lies
B) A range of values within which the population parameter lies
C) The range between the smallest and largest values in a dataset
D) The range of values within one standard deviation of the mean
B) A range of values within which the population parameter lies
What is the main purpose of a control group in an experiment?
A) To provide a comparison for the experimental group
B) To increase the sample size
C) To reduce the variability within the data
D) To eliminate the need for random sampling
A) To provide a comparison for the experimental group
In the context of regression analysis, what is multiple regression used for?
A) Analyzing the effect of a single independent variable on a dependent variable
B) Analyzing the effect of multiple independent variables on a dependent variable
C) Analyzing the effect of a single independent variable on multiple dependent variables
D) Analyzing the effect of multiple independent variables on multiple dependent variables
B) Analyzing the effect of multiple independent variables on a dependent variable
What is the difference between a bar chart and a histogram?
A) A bar chart displays categorical data, while a histogram displays numerical data
B) A bar chart displays numerical data, while a histogram displays categorical data
C) A bar chart is used for one variable, while a histogram is used for two variables
D) There is no difference; they are the same
A) A bar chart displays categorical data, while a histogram displays numerical data
What is an outlier in a dataset?
A) A value that is exactly equal to the mean
B) A value that is very different from the other values in the dataset C) A value that occurs most frequently
D) A value that falls within the interquartile range
B) A value that is very different from the other values in the dataset
Which of the following refers to the degree of flatness or peakedness of a curve?
A) Central Tendency
B) Dispersion
C) Skewness
D) Kurtosis
D) Kurtosis
measures the ”tailedness” of the distribution, indicating whether the data are heavy-tailed or light-tailed relative to a normal distribution.
It provides information about the height and sharpness of the central peak relative to that of a normal distribution
Kurtosis
measures the ”tailedness” of the distribution, indicating whether the data are heavy-tailed or light-tailed relative to a normal distribution.
It provides information about the height and sharpness of the central peak relative to that of a normal distribution
Kurtosis
represents the size of distribution of values that are expected for a specific variable.
Dispersion
measure of the asymmetry of a distribution; right (positive) or left (negative) skewness
Skewness