Statistics mid year exam practice Flashcards

1
Q

We need to clean our data because:

A: We need to exclude people who don’t complete all items

B: It makes our conclusions correct

C: This is what everyone does

D: If we don’t clean out data our statistical analysis may be invalid or unreliable

A

D: If we don’t clean out data our statistical analysis may be invalid or unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Accepted methods of dealing with outliers DO NOT include:

A:Excluding scores ±3 SD’s from the mean

B: Trimming the outlying scores so that the scores are within ±3 SD’s from the mean

C: Excluding scores ±2 SD’s from the mean

D: Ignoring them and proceeding to statistical analysis

A

D: Ignoring them and proceeding to statistical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

We can improve our Cronbach alpha score by:

A: Re-running the analysis

B: Removing a ‘poor’ item

C: Changing the Likert scale from 1-4 to 0-3

D: Removing the .70 criterion for satisfactory scale reliability

A

B: Removing a ‘poor’ item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An ad hoc inference is:

A:Based on scientifically captured data

B: A scientific judgement

C: As reliable as a statistical inference

D: A ‘gut’ or intuitive inference based on recent past experience

A

D: A ‘gut’ or intuitive inference based on recent past experience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample statistics differ from population parameters as:

A: Sample statistics are represented by Roman letters

B: Sample statistics are represented by Greek letters

C: Sample statistics cannot be used in statistical formulae

D: Population parameters refer to a subset from the sample

A

A: Sample statistics are represented by Roman letters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cohen’s d suggests that:

A: .2 is a small effect
B: Scores >1 are not possible
C: .5 is a large effect
D: .8 is a small effect

A

A: .2 is a small effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The main advantage of multiple imputation is?

(A) You get a larger sample size

(B) The data is now significant

(C) Your conclusions after your analyses are more likely to be correct

(D) It guarantees a better estimate than simple imputation

A

(C) Your conclusions after your analyses are more likely to be correct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When data is MCAR:

(A) The dataset must be abandoned

(B) The missing data must be cleaned before proceeding

(C) You do not have to impute data to get accurate findings.

(D) The missing data must be replaced before analyzing

A

(C) You do not have to impute data to get accurate findings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cronbach alpha score of .50 suggests:

(A) That items need to be removed

(B) The data is MCAR

(C) The items of the measure are not all highly correlated with each other.

(D) The measure has satisfactory reliability

A

(C) The items of the measure are not all highly correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Listwise deletion:

(A) Involves only using data from participants who have provided it for the variables in your analysis

(B) Involves manually removing any participant from the dataset if they have any missingdata on any variable.

(C) Involves conducting a one-tailed test instead of a two-tailed test.

(D) Increasesthe magnitude of r.

A

(A) Involves only using data from participants who have provided it for the variables in your analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The correlation coefficient is:

A: Used only to determine the direction of the relationship between variables

B: Determines a causes b

C: Unable to determine the strength of the relationship

D: Bounded between -1 and 1

A

D: Bounded between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The assumptions that need to be satisfied for Pearson correlation are:

A: Heteroscedasticity and independence

B: Bivariate normality and heteroscedasticity

C: That the data comes from a valid and reliable sample

D: Bivariate normality and independence

A

D: Bivariate normality and independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outliers:

A: Increase the strength of the correlation

B: Have no impact on the relationship between variables

C: Can increase, reduce, or have no substantive impact on the strength of a relationship

D: Reduce the strength of a correlation

A

C: Can increase, reduce, or have no substantive impact on the strength of a relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The p value associated with an r value:

A: Tells us if the direction of the relationship is statistically significant

B: Tells us if the p and r value are related

C: Tells us if the strength of the relationship is strong

D: Tells us if the r value significantly differs from the null hypothesis

A

D: Tells us if the r value significantly differs from the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The p symbol refers to:

A: The direction of the relationship between variables

B: The sample correlation value

C: The population r

D: The probability of the r value being significant

A

C: The population r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A Type I error refers to:

A: Cases when the null hypothesis is incorrectly rejected

B: Making an error in calculating the statistic

C: Cases when the alternate hypothesis is correctly accepted

D: Cases when the null hypothesis is correctly rejected

A

A: Cases when the null hypothesis is incorrectly rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Simple linear regression is:

A: Does not give an indication of the strength of a relationship between variables

B: Can involve measuring the association of several independent variables with the dependent variable

C: Gives an indication of the strength of the relationship between variables and is used for assessing straight line associations

D: Is best used for estimating curvilinear relationships

A

C: Gives an indication of the strength of the relationship between variables and is used for assessing straight line associations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The advantage of reporting standardized statistics is:

A: That it explains more variance than the unstandardized form

B: That it explains less variance than the unstandardized form

C: That a person unfamiliar with the variable properties can still interpret the effect size

D: That it tells us if the variables are significantly related to each other

A

C: That a person unfamiliar with the variable properties can still interpret the effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

X is the independent variable and Y is the ..

A: Mediator

B: Predictor variable

C: Dependent Variable

D: Slope variable

A

C: Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The WAIS intelligence quotient (IQ) is normally distributed with a mean of 100 and a standard deviation of 15. If an IQ between 85 and 115 is deemed normal, what percentage(approximately) of the general population would be considered normal?

(A) 30%

(B) 50%

(C) 70%

(D) 90%

A

(C) 70%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In the four bivariate relationships given below, in terms of Pearson product-moment
correlations (r), the strongest relationship is the one for which r is:

(A) .69.

(B) -.70.

(C) -.10.

(D) .01.

A

(B) -.70.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In correlational analysis, skewed variables are transformed to be normally distributed because

(A) this increases the relationship between variables.

(B) this decreases the relationship between variables.

(C) this removes the effect of influential points.

(D) this might invalidate the test

A

(D) this might invalidate the test

23
Q

Which one of the following does not increase the power of a test of r between X and Y?

(A) Standardizing the scores of X and Y.

(B) Adopting a larger Type I error rate.

(C) Conducting a one-tailed test instead of a two-tailed test.

(D) Increasing the magnitude of r.

A

(A) Standardizing the scores of X and Y.

24
Q

In clinical trials, it is ethically responsible to use a sample size that is sufficient to detect an effect but not too large as to inconvenienced the patients any longer than necessary. Which sample size should the researcher use to ensure that she has sufficient power?

(A) N = 15 (power of .70)

(B) N = 22 (power of .80)

(C) N = 30 (power of .90)

(D) N = 37 (power of .95)

A

(B) N = 22 (power of .80)

25
Q

A scatterplot:

A: Uses a formula to calculate the p value

B: Can be used to identify heterogeneous subsamples and outliers in the data set

C: Can be used to assess if two variables are significantly correlated

D: Is used as a last resort to identify outliers in the data set

A

B: Can be used to identify heterogeneous subsamples and outliers in the data set

26
Q

Removing outliers from the dataset

A: Increases the strength of the association between variables

B: Reduces the strength of the association between variables

C: s inappropriate as you are altering the dataset

D: May reduce the ability to report a significant finding as the sample size will decrease

A

D: May reduce the ability to report a significant finding as the sample size will decrease

27
Q

‘Fanning’ of the data points to one side of the scatterplot may indicate:

A: Heterogeneity of variance

B: Linearity

C: Homogeneity of variance

D: Non-normality

A

A: Heterogeneity of variance

28
Q

Multiple regression involves:

A: The assessment of multiple Y’s on X

B: The use of several simple regressions to produce a mean

C: The assessment of the relationship of multiple predictors on multiple dependent variables

D: The assessment of the relationship of multiple predictors on Y

A

D: The assessment of the relationship of multiple predictors on Y

29
Q

Multicollinearity is:

A: An assumption that needs to be satisfied to proceed with a multiple regression

B: A necessary element to proceed with a multiple regression

C: A problem as highly correlated dependent variables violates the assumption for multiple regression

D: A problem as highly correlated independent variables violates the assumption for multiple regression

A

D: A problem as highly correlated independent variables violates the assumption for multiple regression

30
Q

A multiple regression may not be statistically significant due to:

A: The R² being too large

B: The sample being too large

C: The p value being too small

D: The R² being too small

A

D: The R² being too small

31
Q

When values of X do not exactly predict values of Y, the R
2 value is:

(A) less than 0.

(B) 0.

(C) somewhere between 0 and 1.

(D) 1

A

(C) somewhere between 0 and 1.

32
Q

A scatterplot of the residuals is NOT used to check for:

(A) normality of distribution of errors.

(B) the relationship between variables.

(C) the presence of influential cases.

(D) homogeneity of variance of errors.

A

(B) the relationship between variables.

Answer is (B). A (normal) scatterplot is indeed used to check on the relationship between
variables. However, a scatterplot of residuals (i.e., errors of prediction) is used to check for
normality, homoscedasticity and influential cases

33
Q

When a regression line is fitted to the data, the lack of fit is best indicated by:

(A) A negative correlation coefficient.

(B) A random distribution of the residuals.

(C) A non-random distribution of the residuals.

(D) A small unstandardized regression coefficient.

A

(C) A non-random distribution of the residuals.

Answer is (C). Following on from the Question 1, if the regression equation (i.e., line) is a fitting mathematical description of the relationship, then the errors (i.e., lack of fit) would be randomly distributed. On the other hand, if the fit is not good, then the distribution of residuals would
display some non-random trend of non-normality, non-linearity or heteroscedasticity.

34
Q

Which case will NOT be considered an influential in a regression of Y on X?

(A) A case with a mean of X and a large Y.

(B) A case with a small X and a mean of Y.

(C) A case with a large X and a small Y.

(D) A case with a mean of X and a mean of Y.

A

(D) A case with a mean of X and a mean of Y.

Answer is (D). A case with a mean of X and a mean of Y will not cause the regression line to
pivot one way or another; that is, it will not affect the regression coefficient (slope).

35
Q

Residual analysis is conducted in regression to:

(A) get a more significant statistical result (i.e., t value)

(B) get a higher r value.

(C) reduce the influence of misrepresentative cases.

(D) reduce the sample size.

A

(C) reduce the influence of misrepresentative cases.

Answer is (C). Influential cases are not representative of the relationship

36
Q

If we assessed if the use of various psychological techniques were related with improved sporting performance and the finding was that the set of 3 techniques predicted performance R² = .41, F(3, 25), p = .04. We could state:

A: That the finding was non –significant

B: That there were 4 independent variables

C: That there were 25 independent variables

D: That there were 29 scores in the data set

A

D: That there were 29 scores in the data set

37
Q

You are more likely to attain a higher R² when:

A: Your set of independent variables share substantial variance with each other

B: Your independent variables capture substantial unique components of the dependent variable

C: Your independent variables are highly correlated

D: Your dependent variable is not related with the independent variables

A

B: Your independent variables capture substantial unique components of the dependent variable

38
Q

If we are looking for evidence to support the hypothesis that 3 psychological techniques predict sporting performance, we would hope that:

A: The tolerance statistic of several independent variables is < .10

B: The dependent variable is not too highly correlated with the
independent variables

C: The three predictors are highly correlated with each other

D: All three independent variables are uniquely related to the dependent variable

A

D: All three independent variables are uniquely related to the dependent variable

39
Q

In the lectures you have been exposed to various forms of regression -including:

A: Reduction regression

B: Poisson regression

C: Logistic regression

D: Simple regression

A

D: Simple regression

40
Q

Hierarchical regression is not useful for:

A: Reducing predictors in a data set

B: Hypothesis testing

C: Removing the effects of nuisance variables

D: Covariance analysis

A

A: Reducing predictors in a data set
Hierarchical regression doesn’t do this - statistical regression does

41
Q

If the R²change is not significant in a hierarchical regression we can say that:

A: The predictors shared too much variance

B: The N was too small

C: The group of predictors in this step did not improve the model fit

D: None of the variables were related to the DV

A

C: The group of predictors in this step did not improve the model fit

42
Q

In multiple regression, the effect size of the prediction is given by the:

(A) multiple correlation coefficient.

(B) squared multiple correlation coefficient.

(C) sum of the squared beta coefficients.

(D) sum of the individual squared correlations between the independent variables and the dependent variable.

A

(A) multiple correlation coefficient.

Answer is(A). The multiple correlation coefficient (R) ist he correlation between the observed Y and the predicted Y. The latter is derived from a linear weighted combination of variables that best predict Y. The square of R is the effect size.

43
Q

In standard multiple regression where X1 and X2 are used to predict Y. The square of the multiple correlation (R2), is:

(A) the sum of the correlations between Y and X1 and between Y and X2.

(B) the sum of the squared correlations between Y and X1 and between Y and X2.

(C) the proportion of variance in Y uniquely and jointly predicted by X1 and X2.

(D) the proportion of variance in Y uniquely predicted by X1 plus the proportion of variance in Y uniquely predicted by X2.

A

(C) the proportion of variance in Y uniquely and jointly predicted by X1 and X2.

Answer is(C). The explained variance of Y (i.e., R2) comes from two sources – one from the sum of unique contributions made by individual predictors and one from shared contributions made by two or more predictors.

44
Q

The simple correlations of X1, X2, and X3 with Y are .30, .20, and .10 respectively. A multiple regression of X1 and X2 predicting Y was just statistically significant at p = .04. If X3 is added to the equation predicting Y, we cannot be sure that:

(A) the p value will decrease.

(B) the R value will increase.

(C) the tolerances of X1 and X2 will decrease.

(D) the semi-partial correlations of X1 and X2 will decrease.

A

(A) the p value will decrease.

Answer is (A). We cannot be confident that X3 will make a significant contribution to the prediction but we can be confident that the addition of a third predictor will bring about (B), (C) and (D)

45
Q

In a standard multiple regression, X1, X2 and X3 were used to predict Y. The obtained F statistic had a probability of .001. The unstandardized equation of predicted Y was 0.8X1 +0.4X2 + 0.3X3+ 0.5. The semi-partial correlations of X1, X2 and X3 were .15, .25, and .05,respectively. We can conclude that:

(A) The set of predictors did not significantly predict Y.

(B) X1 is a stronger predictor of Y than X2 or X3.

(C) X3 alone would not significantly predict Y.

(D) X2 makes the best unique contribution to the prediction of Y.

A

(D) X2 makes the best unique contribution to the prediction of Y.

Answer is (D). The unstandardised regression coefficients of X1, X2 and X3 is dependent on
their scale of measurement and range, so their magnitude cannot be used to convey strength
of prediction. In the other hand, a higher semi-partial correlation does in indicate a stronger
contribution.

46
Q

In a standard multiple regression of X1, X2 and X3 predicting Y, the tolerance value of X1 is the proportion of variance in X1 that is:

(A) shared with X2 and X3 but not with Y.

(B) shared with Y but not with X2 and X3.

(C) not shared with X2 and X3.

(D) shared with X2 and X3.

A

(C) not shared with X2 and X3.

Answer is (C). Tolerance is defined as the proportion of variance of a predictor that is NOT shared with (or predictable by) other predictors.

47
Q

Three types of statistical regression are:

A: Hierarchical, Forward, Backward

B: Backward, Stepwise, Forward

C: Stairwise, Forward, Reverse

D: Reverse, Forward, Statistical

A

B: Backward, Stepwise, Forward

48
Q

It is NOT possible to use a measure as an independent variable in a linear regression if the format is:

A: La Trobe University, Murdoch University, Melbourne University

B: < 20 years old, 20-29 years, 30-39 years

C: Voting Yes/No

D: Height in millimetres

A

A: La Trobe University, Murdoch University, Melbourne University

CORRECT! This is a non-scalable discrete variable!

49
Q

A binary logistic regression should be used when

A:When linear regression is non-significant

B: The independent variable is dichotimised

C: The dependent variable is continuous

D: When the dependent variable is dichotomised

A

D: When the dependent variable is dichotomised

50
Q

In a study that had collected initial weight, weight loss after training, motivation to lose weight, duration of training period, and amount of exercise, what is likely to be the dependent variable?

(A) Weight loss after training.

(B) Motivation to lose weight.

(C) Duration of training period.

(D) Amount of exercise.

A

(A) Weight loss after training.

Answer is (A). Although the aim of the study was not specified, it would be reasonable to assume
that weight loss after training is the dependent variable.

51
Q

In the study described in (2), which of the following regression methods would allow the most sophisticated test of a theory on weight loss?

(A) Multiple regression.

(B) Hierarchical regression.

(C) Stepwise regression

(D) Backward regression.

A

Answer is (B). Both (C) and (D) are exploratory methods and do not represent a theoretically
justifiable approach to the selection of predictors in regression. Of (A) and (B), it is (B)
that would require further theoretical justification in the order by which predictors are
entered into the equation.

52
Q

In the study described in (2), if you were select a predictor to enter in the first step, which would you select?

(A) Initial weight

(B) Motivation to lose weight.

(C) Duration of training period.

(D) Amount of exercise.

A

Answer is (A). It is (more) reasonable to assume that (A) affects the contribution of other predictors
to predict weight loss as it is chronologically an antecedent cause.

53
Q

In a hierarchical regression predicting Y, predictors X1 and X2 as Subset A were first entered by the researcher, followed by predictors X3 and X4 as Subset B. Which one of the following
statements is definitely true?

(A) Predictors in Subset A are not correlated with predictors in Subset B.

(B) Predictors in Subset A account for more variance in Y than predictors in Subset B.

(C) Total R² is the sum of the changes in R² due to the entry of Subset A and Subset B.

(D) Total R² is the sum of the squared correlations between Y and each predictor

A

Answer is (C). The choice of predictors in each subset was made on theoretical grounds (e.g.,
contributions of predictors in Subset B are to be evaluated only after adjusting for contributions of
predictors in Subset A). As such, in a hierarchical regression, (A) and (B) are not of interest. Option
(D) is false because the sum of the squared correlations between Y and each of the predictors could, indeed, exceed 1. For example, if we have two predictors that both correlate about .80 with Y (i.e.,
each accounting for 64% of Y), hence the sum of their squared correlations would be more than 1 –
and we know that no matter how many predictors we use, we cannot predict more than 100% of
any dependent variable.