Practical W2: Basics of Statistics Flashcards

1
Q

One of the first things that’s super important after collecting your data is to graphically look at your data by making a

A

histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

There are two main ways in which a distribution can deviate from normal - (2)

A
  • skewness
  • Kurotsis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Diagram of positive and negative skew

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If the skewness value between -1 and 1 in SPSS then

A

it’s fine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the skewness value in SPSS is less than -1 then

A

it is a negative skew = non-normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the skewness value in SPSS is greater than 1 then

A

positive skew = non-normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Diagram of skewness value shown in SPSS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kurotsis is basically looking at how

A

‘pointy’ your histogram is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kurtosis tells us how much our data lies around the

A

ends/tails of our histogram which helps us to identify when outliers may be present in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A distribution with positive kurtosis, so much of the data is in the tails, will be very

A

pointy or leptokurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A distribution with negative kurtosis, so the data lies more in the middle, will be more

A

sloped or platykurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Normal distribution will have kurotsis value of

A

0 (mesokurtic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Characteristic of a negative skew

A

tail it is pointing towards the lower values and the data is clustered at the higher values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristic of a positive skew

A

– the tail is pointing towards the higher values and the data is clustered at the lower values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Diagram of mesokurtic (normal) , leptokurtic and platykurtic distribution curve

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Kurotsis value in SPSS between -2 and 2 is

A

all good, normal kurotsis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If kurotsis value in SPSS is less than -2 then shows

A

platykurtic (non-normal, issue with kurotsis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If kurotsis value in SPSS is greater than 2

A

leptokurtic (non-normal, shows issues with kurotsis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Diagram of kurotsis value in SPSS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Is kurotsis and skewness value here fine?

A

Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is kurotsis and skewness values fine here?

A

Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

3 ways to transformations your data to make it closer to normal distribution - (3)

A
  1. exponential
  2. power
  3. log
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

There is a tertium quid which prompts the saying that

A

correlation not causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is tertium quid a word for?
third factor?
26
The tertium quid is a variable that you may not have considered that
could be influencing your result
27
The tertium quid (third factor) is known as a
confounding variable
28
Example of may not considered tertium quid variable could be influencing your results - (2)
: we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? NO, , since it is most likely that both are actually due to weather, and when it’s hotter outside people eat more ice cream and go more frequently to the pool or to the beach to swim.The fact that more people go to swim is the reason why there are more drownings.
29
If one/both of skewness/kurotsis value is out of range than assumptions for
parametric tests is not satisfied
30
Rule out tertium quid (third factor) through
RCTs = even out confounding variable between groups
31
In RCT, you randomly assign your participants to two or more groups involving - (2)
one group receives no intervention or experimental manipulation (so your control), other group will receive the intervention or treatment and then you can directly compare the dependent variables.
32
To infer causation we need to
actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.
33
Example of control condition in a lesion studies - (2)
double dissociation experiment where one test is affected by a lesion in one area but not a second area and then a different test is conducted which affects the second area but not the first. The only way we can actually infer causation is by comparing the two controlled situations; one where the cause so the lesion is present and one where the lesion is absent.
34
Another assumption for parametric tests is having
linearity/addivity
35
Linearity refers to the - (2)
combined effect of several predictors should form a straight line or show a linear relationship the data increases at a steady rate like the graph
36
What does this graph show?
Your cost increases steadily as the number of chocolate bars increases
37
This graph shows multiplicative/non-linear (not steady but sharp increase/change in data) which is not an assumption of
parametric tests
38
What does this graph show?
might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach-ache increases quite rapidly the more chocolates you eat.
39
Why is it important to check for linearity in your data?
your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.
40
When we talk about additivity/linearity we are referring to the combined effect of
several predictors
41
What is measurement error?
The discrepancy between the actual value we’re trying to measure and the number we use to represent that value.
42
Example of measurement error - (2)
conducting an experiment where I was measuring the length of a tree and used cm and someone else in my research group measured the same tree using a different metric and got a different value from me that’s a measurement error. This is an example or human error but recording instrument failure is another possibility.
43
What are the 2 types of measurement error? - (2)
* Systematic measurement error * Random measurement error
44
Measurement error can happen across all psychological experiments from...
recording instrument failure to human error.
45
What is systematic measurement error?
when the error is proportional to the the true value and effects the results of experiment in a predictable direction
46
What is example of systematic measurement error?
for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment
47
What is random measurement error and when does it usually occurs? - (2)
when the measurable values are inconsistent when repeated measures of a constant attribute or quantity is taken, so this error happens by chance and is more related to natural variabilit
48
Example of random measurement error - (2)
my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.
49
Measurement error is completely different from variance in the sense that it is the
average spread of your data
50
Variance is specifically the averaged squared deviation from
each number from its mean
51
Variance helps us assess group differences to determine whether the populations that our samples come from
differ from each other
52
How to calculate variance?
53
Example of variance in line graph (orange dots and lines are variance)
54
55
The purpose of a control condition is to allow inferences about causality as field's quote was:
only way to infer causality is through comparison of two controlled situations: one in which cause is present and one in which cause is absent
56
What are residuals?
difference between the observed value of the dependent variable and the predicted value (usually mean).
57
GLM assumption is that residuals will be
normally distributed - observed values of a variable will be normally distributed around the predicted value.
58
Last assumption of GLM: Homoscedasticity which is that
residuals have constant variance at every level of x – for each level of the independent variable the amount of error or “noise” has a similar variance
59
What is a dependent variable?
A dependent variable (or outcome variable) is a variable that is thought to be affected by changes in an independent variable. 
60
What is a confounding variable? - (2)
A confounding variable is a variable which has an unintentional effect on the dependent variable.  When carrying out experiments we attempt to control these extraneous variables; however, there is always the possibility that one of these variables is not controlled and if this affects the dependent variable in a systematic way, we call this a confounding variable.
61
Predictor variables is
variable that is thought to predict another variable.
62
What is an independent variable? - (2)
An independent variable is a variable that is thought to be the cause of some effect. This term is usually used in experimental research to denote a variable that the experimenter has manipulated.
63
We can not control for everything especially in sale of chocolate bars we might expect other variables to impact popularlity of chocolate so in LM (linear model) we can add something called - (4)
predictor variable, this are additional variables that are related to what your variable of interest. For example, the time of year may be a predictor variable – like over easter you may see an increase in sales In GLM you can plug this predictor variable and any others to expand your model using predictor variables i.e independent variables you may not be directly interested in. we have several predictors in a regression it is a multiple regression.
64
central limit theorem tells us that that if we have enough participants (typically larger than 30) the sampling distribution of the mean approaches a
normal distribution
65
The central limit theorem states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for
sample sizes over 30 --> N >30
66
, as a sample size increases, the sample mean and standard deviation will be (CLT)
closer in value to the population mean μ and standard deviation σ .
67
The central limit theorem tells us that no matter what the distribution of the population is, the shape of the sampling distribution will approach normality as the sample size (N)
increases
68
How is CLT useful? - (2)
research never knows which mean in the sampling distribution is the same as the population mean, but by selecting many random samples from a population the sample means will cluster together, allowing the research to make a very good estimate of the population mean.
69
as the sample size (N) increases the (CLT)
sampling error will decrease
70
In a normal distribution the values of skew and kurtosis are
0
71
Definition of tertium quid
the possibility that an apparent relationship between two variables is actually caused by the effect of a third variable on them both (often called the third-variable problem)
72
Definition of confounding variable
a variable (that we may or may not have measured) other than the predictor variables in which we’re interested that potentially affects an outcome variable.
73
Confounding variable jeopardises the
reliability and validity of an experiment's outcome
74
Confounding variables can be measured using reliable and
unreliable scale
75
A test can still measure a useful construct or vaeriable but still not be
valid
76
Internal consistency is - (2) and example
It measures whether several items that propose to measure the same general construct produce similar scores. e.g., pp expressed agreement with statement like "enjoyed rock music" and disagreed with statement like "I hate rock music"
77
DV or outcome variable is variable thought to be affected by changes in
independent variable
78
An independent variable is a variable that is thought to be the cause of
some effect
79
Reliability is whether an instrument can be
interpreted consistently across different situations
80
What is the 'fit' of a model?
The ‘fit’ of the model is the degree to which a statistical model represents the data collected
81
Counterbalancing can compensate for
practice effects as ensure that they produce no systematic variation between our conditions since it counterbalances the order in which person participates in a condition
82
Practice effects are an issue in what design?
repeated design
83
Giving participants a break between tasks is a technique used to compensate
boredom effects
84
Homogenous variance assumption is that variance
within each of the populations is equal
85
Residual variance helps us confirm how well a - (2)
regression line that we constructed fits the actual data set. The smaller the variance, the more accurate the predictions are
86
The coefficient of determination is the correlation
coefficent squared: amount of variability in one variable shared by another
87
The sum of squares, variance and standard deviation are all measures of the
dispersion or spread of data around the mean (
88
The probability is p = 0.80 that a patient with a certain disease will be successfully treated with a new medical treatment. Suppose that the treatment is used on 40 patients. What is the "expected value" of the number of patients who are successfully treated? Calculation
because 80% of 40 patients is 32 (or 40 x .80 = 32)
89
The sum of squared errors is the sum of the
squared deviances
90
Assumptions of parametric data - (4)
1. Normally distributed data 2. Homogenity of variance: ariances should be the same throughout the data. 3. Data measured at least at interval level 4. Indpeendence