Statistics Flashcards

1
Q

Which measure of central tendency represents the most frequently occurring value in a dataset?

A) Mean
B) Median
C) Mode
D) Range

A

C) Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which measure of central tendency is best used when the dataset contains outliers?

A) Mean
B) Median
C) Mode
D) Range

A

B) Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • difference between highest and lowest observation in a data
A

Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or False

The median is less affected by outliers compared to the mean, making it a better measure of central tendency when outliers are present

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which measure of variability indicates the average distance of each data point from the mean?

A) Range
B) Interquartile range
C) Variance
D) Standard deviation

A

D) Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

used to measure how far the data values are dispersed from the mean

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False

Standard deviation measures the average distance of each data point from the mean, providing insight into the spread of the data.

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False

In a positively skewed distribution, the mean is lower than the median, which is lower than the mode, due to the tail on the right side.

A

False

In a positively skewed distribution, the mean is greater than the median, which is greater than the mode, due to the tail on the right side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is a measure of the central location of a dataset?

A) Standard deviation
B) Variance
C) Median
D) Interquartile range

A

C) Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which measure of spread is defined as the difference between the first quartile and the third quartile?
A) Range
B) Standard deviation
C) Variance
D) Interquartile range

A

D) Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In hypothesis testing, what is the p-value?

A) The probability of accepting the null hypothesis
B) The probability of rejecting the null hypothesis when it is true C) The probability of observing the test results under the null hypothesis
D) The level of significance

A

C) The probability of observing the test results under the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following is a Type I error?

A) Rejecting the null hypothesis when it is true
B) Accepting the null hypothesis when it is false
C) Failing to reject the null hypothesis when it is false
D) Failing to accept the null hypothesis when it is true

A

A) Rejecting the null hypothesis when it is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you are going to describe the findings of a survey about what annual income is for the people of Makati City, in which you have both extremely wealthy and extremely poor people, which two measures would you use?

A) Mean and Mode
B) Mean and Range
C) Mean and Standard Deviation
D) Mode and Standard Deviation

A

C) Mean and Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a confidence interval, what does the margin of error represent?

A) The range of values within which the population parameter lies B) The standard deviation of the sample
C) The maximum error allowed in the estimate
D) The sample mean

A

C) The maximum error allowed in the estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

indicates the range within which we expect the true population parameter to lie

A

margin of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a confidence level of 95% mean?

A) There is a 95% probability that the sample mean is within the confidence interval
B) 95% of the population data lies within the confidence interval C) 95% of the time, the true population parameter lies within the confidence interval
D) There is a 5% chance that the sample mean lies outside the confidence interval

A

C) 95% of the time, the true population parameter lies within the confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of a t-test?

A) To compare the variances of two populations
B) To compare the means of two populations
C) To test the independence of two variables
D) To test the relationship between two variables

A

B) To compare the means of two populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

values wanted to explain or forecast; values depend on something else; denote it as y

A

Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

explains the other one; denote it as x

A

Independent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

point where the regression line crosses the Y-axis, representing the value of Y when X is zero

A

intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the coefficient of determination (R2) indicate?

A) The strength of the linear relationship between two variables
B) The percentage of variation in the dependent variable explained by the independent variable
C) The slope of the regression line
D) The correlation between two variables

A

B) The percentage of variation in the dependent variable explained by the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

explaining or predicting a single Y variable from two or more X variables

A

Multiple Regression Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

occurs when independent variables in a regression model are highly correlated

A

Multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which of the following best describes heteroscedasticity in regression analysis?

A) The error terms have constant variance
B) The error terms have increasing or decreasing variance
C) The error terms are normally distributed
D) The error terms are autocorrelated

A

B) The error terms have increasing or decreasing variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

means that the variance of the error terms changes across observations

A

Heteroscedasticity

26
Q

to find the relationships between two data factors

A

Logistic Regression

27
Q

In logistic regression, what type of dependent variable is used?

A) Continuous
B) Ordinal
C) Nominal
D) Binary

A

D) Binary

28
Q

Binary Explanation: Logistic regression is used for modeling binary (____) outcomes

A

(dichotomous)

29
Q

What is the purpose of an ANOVA test?

A) To compare the means of two groups
B) To compare the variances of two groups
C) To compare the means of three or more groups
D) To test the independence of two categorical variables

A

C) To compare the means of three or more groups

30
Q

In ANOVA, what does a significant F-test indicate?

A) All group means are equal
B) At least one group mean is different
C) The variances are equal across groups
D) There is a linear relationship between groups

A

B) At least one group mean is different

31
Q

What is the null hypothesis in a chi-square test of independence?

A) The two variables are independent
B) The two variables are dependent
C) The two variables have equal variances
D) The two variables have equal means

A

A) The two variables are independent

32
Q

purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

A

chi-square test

33
Q

In a chi-square test, what is the expected frequency?

A) The observed frequency in each category
B) The frequency expected if the null hypothesis is true
C) The sum of observed frequencies
D) The total number of observations

A

B) The frequency expected if the null hypothesis is true

34
Q

Which of the following is a characteristic of a simple random sample?

A) Each member of the population has an equal chance of being selected
B) The population is divided into subgroups and samples are taken from each subgroup
C) Samples are chosen based on convenience
D) Samples are chosen based on specific characteristics

A

A) Each member of the population has an equal chance of being selected

35
Q

What is stratified sampling?

A) Dividing the population into strata and randomly selecting samples from each stratum
B) Selecting samples based on convenience
C) Selecting every nth member of the population
D) Grouping the population into clusters and randomly selecting clusters

A

A) Dividing the population into strata and randomly selecting samples from each stratum

36
Q

What is the main advantage of using a larger sample size?

A) It reduces the population size
B) It increases the standard error
C) It increases the accuracy of the sample mean
D) It reduces the variability of the population

A

C) It increases the accuracy of the sample mean

37
Q

The coefficient of variation (CV) is measured in terms of what unit?

A) same unit with the data
B) squared unit
C) percent
D) square root of the given unit

A

C) percent

38
Q

standardized measure of the dispersion of a probability distribution or frequency distribution

A

coefficient of variation (CV)

39
Q

What is the shape of the normal distribution?

A) Skewed left
B) Skewed right
C) Symmetrical bell-shaped
D) Uniform

A

C) Symmetrical bell-shaped

40
Q

Which of the following best describes the central limit theorem?

A) The sum of a large number of random variables is normally distributed
B) The mean of a large number of random variables is normally distributed
C) The variance of a large number of random variables is normally distributed
D) The median of a large number of random variables is normally distributed

A

B) The mean of a large number of random variables is normally distributed

41
Q

states that the distribution of the sample mean approaches a normal distribution as the sample size becomes large

A

central limit theorem

42
Q

What does the term ”statistical power” refer to?

A) The probability of making a Type I error
B) The probability of making a Type II error
C) The probability of correctly rejecting the null hypothesis
D) The probability of accepting the null hypothesis

A

C) The probability of correctly rejecting the null hypothesis

43
Q

likelihood that a test will detect an effect when there is an effect to be detected

A

Statistical power

44
Q

What is the purpose of standardizing a variable?

A) To change the variable’s mean to 1
B) To change the variable’s standard deviation to 0
C) To make the variable’s mean 0 and standard deviation 1
D) To convert the variable to a binary format

A

C) To make the variable’s mean 0 and standard deviation 1

45
Q

What is the purpose of a boxplot?

A) To display the frequency of data
B) To show the distribution of data based on a five-number summary
C) To display the relationship between two variables
D) To show the central tendency of data

A

B) To show the distribution of data based on a five-number summary

46
Q

visually displays the distribution of a dataset using the minimum, first quartile, median, third quartile, and maximum

A

Boxplot

47
Q

Which of the following statements about measures of variability must always be true if the standard deviation is equal to 1?

A) The standard deviation is equal to the variance
B) The standard deviation is less than the variance
C) The standard deviation is greater than the variance
D) None of the above

A

A) The standard deviation is equal to the variance

48
Q

Which of the following statements is true about a normal distribution?

A) It is skewed to the right
B) It is skewed to the left
C) It is symmetric about the mean
D) It has two peaks

A

C) It is symmetric about the mean

49
Q

What is the purpose of using a scatter plot?

A) To display the frequency of different categories
B) To show the relationship between two variables
C) To compare the means of different groups
D) To show the distribution of a single variable

A

B) To show the relationship between two variables

50
Q

What does the null hypothesis in hypothesis testing typically state?

A) There is an effect or difference
B) There is no effect or difference
C) The effect or difference is greater than expected
D) The effect or difference is less than expected

A

B) There is no effect or difference

51
Q

typically states that there is no effect or difference, serving as a starting point for statistical testing

A

null hypothesis

52
Q

What is a confidence interval?

A) A range of values within which the sample mean lies
B) A range of values within which the population parameter lies
C) The range between the smallest and largest values in a dataset
D) The range of values within one standard deviation of the mean

A

B) A range of values within which the population parameter lies

53
Q

What is the main purpose of a control group in an experiment?

A) To provide a comparison for the experimental group
B) To increase the sample size
C) To reduce the variability within the data
D) To eliminate the need for random sampling

A

A) To provide a comparison for the experimental group

54
Q

In the context of regression analysis, what is multiple regression used for?

A) Analyzing the effect of a single independent variable on a dependent variable
B) Analyzing the effect of multiple independent variables on a dependent variable
C) Analyzing the effect of a single independent variable on multiple dependent variables
D) Analyzing the effect of multiple independent variables on multiple dependent variables

A

B) Analyzing the effect of multiple independent variables on a dependent variable

55
Q

What is the difference between a bar chart and a histogram?

A) A bar chart displays categorical data, while a histogram displays numerical data
B) A bar chart displays numerical data, while a histogram displays categorical data
C) A bar chart is used for one variable, while a histogram is used for two variables
D) There is no difference; they are the same

A

A) A bar chart displays categorical data, while a histogram displays numerical data

56
Q

What is an outlier in a dataset?

A) A value that is exactly equal to the mean
B) A value that is very different from the other values in the dataset C) A value that occurs most frequently
D) A value that falls within the interquartile range

A

B) A value that is very different from the other values in the dataset

57
Q

Which of the following refers to the degree of flatness or peakedness of a curve?

A) Central Tendency
B) Dispersion
C) Skewness
D) Kurtosis

A

D) Kurtosis

58
Q

measures the ”tailedness” of the distribution, indicating whether the data are heavy-tailed or light-tailed relative to a normal distribution.

It provides information about the height and sharpness of the central peak relative to that of a normal distribution

A

Kurtosis

59
Q

measures the ”tailedness” of the distribution, indicating whether the data are heavy-tailed or light-tailed relative to a normal distribution.

It provides information about the height and sharpness of the central peak relative to that of a normal distribution

A

Kurtosis

60
Q

represents the size of distribution of values that are expected for a specific variable.

A

Dispersion

61
Q

measure of the asymmetry of a distribution; right (positive) or left (negative) skewness

A

Skewness