Week 1: The Descriptive Stats of Outcomes - How is the Data Distributed and How can we Assess the Distribution Flashcards

1
Q

*

What distribution is needed for parametric tests?

A

A normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The normal distribution curve is also referred as the

A

bell curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Normal distribution is symmetrical meaning

A

This means that the distribution curve can be divided in the middle to produce two equal halves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The bell curve can be described using two parameters called (2)

A
  1. Mean (central tendency)
  2. Standard deviation (dispersion)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

μ is

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

σ is

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Diagram shows:

A

e.g., If we move 1σ to the right then it contains 34.1% of the valeues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Many statistical tests (parametric) cannot be used if the data are not

A

normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The mean is the sum of

A

scores divided by the number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mean is a good measure of

A

central tendency for roughly symmetric distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The mean can be a misleading measure of central tendency in skewed distributions as

A

it can be greatly influenced by scores in tail e.g., extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Aside from the mean, what are the 2 other measured of central tendency? - (2)

A
  1. Median
  2. Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The median is where (2)

A

the middle score when scores are ordered.

the middle of a distribution: half the scores are above the median and half are below the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The median is relatively unaffected by … and can be used with… (2)

A
  • extreme scores or skewed distribution
  • can be used with ordinal, interval and ratio data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The mode is the most

A

frequently occurring score in a distribution, a score that actually occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The mode is the only measure of central tendency that can be used with

A

with nominal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The mode is greatly subject to

A

sample fluctuations and is therefore not recommended to be used as the only measure of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Many distributions have more than one

A

mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The mean, median and mode are identical in

A

symmetric distribtions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For positive skewed distribution, the

A

mean is greater than the median, which is greater than the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

For negative skewed distribution

A

usually the mode is greater than the median, which is greater than the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Kurtosis in greek means

A

bulge or bend in greek

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is central tendency?

A

the tendency for the values of a random variable to cluster round its mean, mode, or median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Diagram of normal kurotsis, positive excess kurotsis (leptokurtic) and negative excess kurotsis (platykurtic)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does lepto mean?

A

prefix meaning thin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is platy

A

a prefix meaning flat or wide (think Plateau)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Tests of normality (2)

A

Kolmogorov-Smirnov test
Shapiro-Wilks test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Tests of normality is dependent on

A

sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If you got a massive sample size then you will find these normality tests often come out as …. even when your data visually can look - (2)

A

significant
normally disttibuted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

If you got a small sample size, then the normality tests may look non-siginificant, even when data is normally distributed, due to

A

lack of power in the test to detect a significant effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

There is no hard or fast rule for

A

determining whether data is normally distributed or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Plot your data because this helps inform on what decisions you want to make with respect to

A

normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Even if normality test is sig and data looks visually normally distributed then still do

A

parametric tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

A frequency distribution or a histogram is a plot of how many times

A

each score occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

2 main ways a distribution can deviate from the normal - (2)

A
  1. Lack of symmetry (called skew)
  2. Pointyness (called kurotsis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

In a normal distribution the values of skew and kurtosis are 0 meaning…

A

tails of the distribution are as they should be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Is age nominal or continous?

A

Continous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Is gender continous or nominal?

A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Is height continous or nominal?

A

Continous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Which of the following best describes a confounding variable?

A. A variable that affects the outcome beingmeasured as well as, or instead of, theindependent variable

B. A variable that is manipulated by theexperimenter

C. A variable that has been measured using an unreliable scale

D.A variable that is made up only of categories

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

If a test is valid , what does it mean?

A.The test measures what it claims to measure.

B. The test will give consistent results. (Reliability)

C.The test has internal consistency (measure for correlations between different items on same test = see if it measures same construct)

D. Test measures a useful construct or variable = test can measure something useful but not valid

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

A variable that measures the effect that manipulating another variable has is known as:

A. DV

B. A confounding variable

C. Predictor variable

D. IV

A

A

43
Q

The discrepancy between the numbers used to represent something that we are trying tomeasure and the actual value of what we are measuring is called:

A. Measurement error

B. Reliability

C. The ‘fit’ of the model

D. Variance

A

A

44
Q

A frequency distribution in which low scores are most frequent (i.e. bars on the graph arehighest on the left hand side) is said to be:

A. Positively skewed

B. Leptokurtic = distribution with positive kurotsis

C. Platykurtic = negative kruotsis

D. Negatively skewed = frequent scores

A

A

45
Q

Which of the following is designed to compensate for practice effects?

A. Counterbalancing

B. Repeated measures design = practice effects issue in repeated measures

C. Giving a participants a break between tasks = this compenstates for bordeom effects

D. A control condition = provides reference point

A

A

46
Q

Variation due to variables that has not been measured is

A. Unsystematic variation

B. Homogenous variance = assumption variance each population is equal

C. Systematic variation = due to exp manpulation

D. Residual variance = confirms how well regression line constructed fit to actual data

A

A

47
Q

Purpose of control condition is to

A. Allow inferences about cause

B. Control for participants’ characteristics = randomisation

C. Show up relationship between predictor variables

D. Rule out tertium quid

A

A Allow inferences of cause

48
Q

If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z-score for a score of 18?

A. -2

B. 11

C. 2

D. -1.41

A

A (18-26) = -8/4 = -2

49
Q

The standard deviation is the square root of the

A. Variance

B. Coefficient of determination = r squared

C. Sum of squares = sum of squared deviances

D. Range = largest = smallest

A

A

50
Q

Complete the following sentence:A large standard deviation (relative to the value of the mean itself

A. Indicate data points are distant from man (i.e., poor fit of data)

B. Indicate the data points are close to mean

C. Indicate that mean is good fit of data

D. Indicate that you should analyse data with parameteric

A

A

51
Q

The probability is p = 0.80 that a patient with a certain disease will besuccessfully treated with a new medical treatment. Suppose that thetreatment is used on 40 patients. What is the “expected value” of thenumber of patients who are successfully treated?

A. 32

B. 20

C. 8

D. 40

A

A = 80% of 40 is 32 (0.80 * 40)

52
Q

Imagine a test for a certain disease.

Suppose the probability of a positive test result is .95if someone has the disease, but the probability is only .08 that someone has the disease if his or her test result was positive.

A patient receives a positive test, and the doctor tellshim that he is very likely to have the disease. The doctor’s response is:

A. confusion of intervse

B. Law of small numbers

C. Gambler’s fallacy

D. Correct, because test is 95% accurate when someone has the disease = incorrect as doctor based assumption on incorrect inverse proability

A

A

53
Q

Which of these variables would be considered not to have met the assumptions ofparametric tests based on the normal distribution?

(Hint many statistical tests rely on data measured on interval level)

A. gender

B. Reaction time

C. Temp

D. Heart rate

A

A

54
Q

The test statistics we use to assess a linear model are usually _______ based on thenormal distribution

(Hint: These tests are used when all of the assumptions of a normal distribution havebeen met

A. Parametric

B. Non-parametric

C. Robust

D. Not

A

A

55
Q

Which of the following is not an assumption of the general linear model?

A. Dependence

B. Addictivity

C. Linearity

D. Normally distributed residuals

A

A = independence is an assumption of parametric and not dependence

56
Q

Looking at the table below, which of the following statements is the most accurate?

Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that thedata are not normally distributed

A. For the number of hours spent practicsing , there is not an issue of kruotsis

B. For level of msucial skill, data are heavily negatively skewed

C. For number of hours spent practicsing there is an issue of kruotsis

D. For the number of hours spent practicsing, the data is fairly positively skewed

A

A - correct

B. Incorrect as value of skewnessis –0.079, which suggests that the dataare only very slightly negatively skewedbecause the value is close to zero

C. Incorrect as value of kurtosis is0.098, which is fairly close to zero,suggesting that kurtosis was not aproblem for these data

D. Incorrect as value of skewnessfor the number of hours spent practisingis –0.322, suggesting that the data areonly slightly negatively skewed

57
Q

Diagram of skewness

A
58
Q

In SPSS, output if value of skewness is between -1 and 1 then

A

all good

59
Q

In SPSS, output if value is below -1 or above 1 then

A

data is skewed

60
Q

In SPSS, output if value of skewness is below -1 then

A

negatively skewed

61
Q

In SPSS, output if value of skewness is above 1 then

A

positively skewed

62
Q

Diagram of lepto kurotic, platykurtic and mesokurtic( normal)

A
63
Q

What does kurotsis tell you?

A

how much our data lies around the ends/tails of our histogram which helps us to identify when outliers may be present in the data.

64
Q

A distribution with positive kurtosis, so much of the data is in the tails, will be

A

pointy or leptokurtic

65
Q

A distribution with negative kurtosis, so the data lies more in the middle, will be more

A

be more sloped or platykurtic

66
Q

Kurtosis is the sharpness of the

A

peak of a frequency-distribution curve

67
Q

If our Kurtosis value is 0, then the result is a

A

normal distribution

68
Q

If kurotsis value in SPSS between -2 and 2 then

A

all good! = normal distribution

69
Q

If kurotsis value in SPSS less than -2 then

A

platykurtic

70
Q

If kurotsis value is greater than 2 in SPSS then

A

leptokurtic

71
Q

Are we good for skewness and kurotsis in this output SPSS?

A

Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.

72
Q

Are we good for skewness and kurotsis in this output SPSS?A

A

Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2

73
Q

Correlational research doesn’t allow to rule out the presence of a

A

third variable = confounding variable

e.g, we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? Maybe due to the weather

74
Q

The tertium quid is a variable that you may not have considered that could be

A

influencing your results e.g., ice cream and drowning session

75
Q

How to rule out tertium quid? - (2)

A

Use of RCTs.

Randomized Controlled Trials allow to even out the confounding variables between the groups

76
Q

Correlation does not mean

A

causation

77
Q

To infer causation,

A

we need to actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.

78
Q

Correlation does not mean causation as according to Andy

A

causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results”

79
Q

Aside from checking of kurotsis and skewness assumptions in data also check if it has

A

linearity or less commonly additivity

80
Q

Additivity refrs to the combined

A

effect of many predictors

81
Q

What does this diagram show in terms of additivty /linearity? - (5)

A

There is a a linear effect when the data increases at a steady rate like the graph on the left.

Your cost increases steadily as the number of chocolate bars increases.

The graph on the right shows a non-linear effect when there is not this steady increase rather there is a sharp change in your data.

So you might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach ache increases quite rapidly the more chocolates you eat.

This effect is super important to check or your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.

82
Q

Discrepnacy between measurement and actual value in population is .. and not..

A

measurement error and NOT variance

83
Q

Measurement error can happen across all psychological experiments from.. to ..

A

recording instrument failure to human error

84
Q

What are the 2 types of measurement errors? - (2)

A
  1. Systematic
  2. Random
85
Q

What is systematic measurement error?

A

: predictable, typically constant or proportional to the true value and always affect the results of an experiment in a predictable direction

86
Q

Example of systematic measurement error

A

for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment

87
Q

What is random measurement error?

A

measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken.

88
Q

Example of random measurement error

A

for example my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.

89
Q

What is variance?

A

Average squared deviation of each number from its mean.

90
Q

Variability is an inherent part of

A

things being measured and of the measurement process

91
Q

Diagram of variance formula

A
92
Q

In central limit theorem - (2)

A

states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30.

Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .

93
Q

What does histogram look at? - (2)

A

Frequency of scores

Look at distribution of data, skewness, kurotsis

94
Q

What does boxplot look at? - (2)

A

To identify outliers

Shows median rather than mean (good for non-normally distributed data)

95
Q

What do line graphs are?

A

simply bar charts with lines instead of bars

96
Q

Bar charts are a good way to display

A

display means (and
standard errors)

97
Q

What do scatterplot illustrates? - (2)

A

a relationship between two variables, e.g. correlation or regression

Only use regression lines for regressions!

98
Q

What are matrix scatterplots? - (2)

A

Particular kind of scatterplot that can be used instead of the 3-D scatterplot

clearer to read

99
Q

Using data provided how would you summarise skew?

A. The data has an issue with positive skew
B.The data has an issue with negative skew
C.The data is normally distributed

A

B

100
Q

What is the median number of bullets shot at a partner by females?

A

67.00

101
Q

What descriptive statistics does the red arrow represents?

A. Inter quartile range

B. Median

C. Mean

D. Range

A

A

102
Q

What is the mean of males and females SD? - (2)

A

Males M = 27.29

Females SD = 12.20

103
Q

What is the respective standard error of mean for femals and males?

A

3.26 & 3.42