rmb Flashcards
what is a simple way to simplify a large set of numbers?
counting how often each number occurs (frequency)
what type of data do we use histograms for?
continuous
where is the centre of a histogram?
1
what is the benefit of using more bins in a histogram?
shows the distribution with higher resolution (but can get noisy)
how does a change in mean affect distribution shape metrics?
a change in mean keeps the shape of the distribution the same but changes the centre of mass such that the highest bars occur where the most likely values are
how does a change in variance affect distribution shape metrics?
a change in variance stretches or compresses the data set to reflect the values in the dataset occurring from a wide range of values or a very narrow range of values
how does a change in skewness affect distribution shape metrics?
a dataset with a negative skewness will have a long tail in which that tail points towards negative values in the dataset
how does a change in kurtosis affect distribution shape metrics?
kurtosis reflects the peak hardness of our datasets
so data high kurtosis will have a sharp peak, and low kurtosis will have very wide tails
what is a dataset?
a collection of data acquired for a specific purpose
may relate to multiple experiments or hypotheses
what is a variable?
a number that can ‘vary’ (e.g. take a high or a low value) depending on an attribute that we’re trying to measure
name all the types of variables
nominal
ordinal
interval
ratio
what is nominal data?
no relationship between different possibilities in scale. sometimes called categorical data
the distinct set of possible answers, and there is no particular order in relating those things together
e.g. country of origin
what is ordinal data?
a natural order between possibilities but nothing else. can’t interpret the ‘magnitude’ of differences
e.g. likert scales
what is interval data?
the possibilities are ordered and have interpretable magnitudes, though ‘zero’ does not have special meaning
e.g. temperature
what is ratio data?
like interval data, but now zero is directly interpretable and we can interpret ratios between values
e.g. reaction times
what is continuous data?
a variable that can change freely to take any value
for example - temp could be 4C, 10.34C or -0.0000513C
what is discrete data?
a numbered variable that takes one of a fixed set of values
for example - number of cars owned
what is a sample?
the data we’ve actually collected
what is a population?
in most cases a theoretical or hidden quantity which represents the distribution we would have seen if we were able to collect all possible data to completely describe the group of people we’re interested in
the total set of everyone within a group that we want to test
do very large datasets reflect the wider population better or worse than small datasets?
better
what does a sampling distribution tell us?
how variable the mean is for a given data sample from a given population
how does a larger standard deviation differ than a smaller one on a distribution graph?
with a larger standard deviation we notice a very similar mean/centre of the sampling distribution but the breadth of it is much larger
if a larger sample produced a higher standard error of mean, what does this suggest?
that each sample in the larger population is more variable so we can be less precise in our estimation of the mean from one sample of the second population compared to the first
if a dataset is normally distributed, how can we calculate the standard error of the mean?
SEM = σ / √N
dividing the standard deviation of the data by the square root of the number of samples
what are confidence intervals?
95% confidence intervals define a range of values which have a 95% chance of containing the population mean
how do you calculate confidence intervals?
95% CI = 1.96 * SEM
upper = X̄ + CI
lower = X̄ - CI
what does the centre line on a distribution graph mean?
the mean
what do the white lines inside the distribution represent?
one standard deviation away from the mean on each side
what is the standard normal distribution?
a special case of the normal distribution in which the mean is zero and the standard deviation is one
what does Shapiro-Wilk test for?
an objective test for whether data is normally distributed
what does Shapiro-Wilk W test for?
is a metric indicating how ‘normal’ the data is, higher values indicate more normal data
what does Shapiro-Wilk P test for?
a probability indicating how significant any difference from normality is
what does a horizontal line in the centre of a box whisker graph represent?
the median
what do the edges of the box in a box whisker graph represent?
the interquartile range
what does the vertical line and the dots that occur outside of it represent?
line - 95% intervals of data
dots - often outliers
what does it mean if a sample bias is systematic?
that the bias will continue to be true even if we recruit a larger sample
e.g. perhaps certain people are just more or less likely to respond to a recruitment email
what is ecological validity?
a measure of how test performance predicts behaviours in real-world settings
what does WEIRD stand for?
Western
Educated
Industrial
Rich
Democratic
what does a histogram represent?
the sample distribution of our value of interest
what is each sample we measure an approximation of?
the underlying population which we can’t actually measure
how do you calculate the sample mean?
X̄ = Σ xj / N
the sample mean is the sum of all the individual data points divided by the total number of data points
how do you calculate the sample standard deviation?
σ = √Σ (xj -X̄) ^2 / N
the sample standard deviation is the square root of the sum of the squared difference between the sample mean and each individual data point divided by the total number of data points
what is the standard error of the mean?
is the likely variability in our estimate of the population mean from a given sample
does the SEM increase, decrease, or stay the same when the sample size grows larger?
decreases, larger samples are more reliable
what does a wide standard error tell us compared to a small standard error?
a wide standard error tells us that our mean is very very varied so if we did this ten times we can expect to get very different numbers
however if our standard error is very small, it’s telling us that we’re going to get the same mean every time
a small standard error = number of data points are high
what are confidence intervals?
a range of values around the sample mean that has 95% chance of containing the population mean
what is a statistical hypothesis?
a comparison to a single value
what is a null hypothesis?
the sample mean is indistinguishable from the reference value
why can we not just look at the difference between the mean and the specified value?
noise
no measurement is perfect, there is often an associated error with any data point
sampling bias
we cant measure data from everybody
therefore, we are only working with an estimate of the mean of our group - not the true mean
what is the formula for a t-test?
t(24) = X̄ - μ / SEM
T = t value
24 = degrees of freedom (one less than the number of data points in out dataset)
X̄ = mean of the observed data
μ = comparison value (what we compare our observed mean to)
SEM = standard error of the mean
what is a one-sample t-test ?
the difference between the mean of observed data and a hypothesised comparison value, all divided by the standard error of the mean of the observed data
what is a t-value?
a test statistic - intended to provide a single number that tells us the extent to which the data sample matches the null hypothesis
what does a small t value suggest for a one sample t-test?
indicates that the SEM is much larger than the difference
this means we aren’t likely to be able to distinguish between the sample mean and the comparison value
what does a large t-value suggest for a one sample t-test?
indicates that the difference is larger than the SEM
this means that we are likely to be able to distinguish the sample mean from the comparison
when does the t-value grow for a one sample t-test?
when the difference between the observed data mean and comparison value gets bigger
this is as the top of the fraction gets larger whilst the bottom stays the same
when does the t-value shrink for a one sample t-test?
as the variance of the observed data gets bigger
this is as the bottom of the fraction gets larger whilst the top stays the same
what is apophenia?
the tendency to see meaningful connections between unrelated things
why do we assume the null to be true until otherwise?
this is proof by contradiction
put the burden of proof on the alternative hypothesis
give an example of a one-sample hypothesis and a one-sample null hypothesis
attendance in class is more than 80%
attendance in class is NO different from 80%
what does a t-test account for?
uncertainty in our estimate of the mean by using the standard error of the mean
what were will, merit, jenkins & kingston interested in for the medusa effect?
whether pictures capture something of the mind that is significant to us, albeit at reduced potency
what does a two-sample hypothesis entail?
a test hypothesis that asks whether two groups have different means
what does the statistical null hypothesis state for a two sample t-test?
the sample means of the two groups cannot be distinguished
what is a between subjects design?
two independent groups of data points
each participant is in a single group and contributes a single data point
what is a within subjects design?
two dependent groups of data points
each participant completes two conditions and contributes two data points
sometimes called repeated measures
what is an independent samples t-test?
the difference between the two groups of data, all divided by the standard error of that difference
it is a ratio between the size of the difference and the precision to which it is estimated
what is the equation for an independent samples t-test?
t(df) = X̄1 - X̄2 / Sp √2/N
mean of group 1 - mean of group 2 / pooled standard error of the difference
how do you find the standard error of the difference?
by using the pooled standard deviation of the two groups
what is a pooled standard deviation?
a single deviation to represent the variability in both groups - assuming that both groups have the same variability
what does a large positive t-value indicate for an independent samples t-test?
the mean of group 1 is above than the mean of group 2
what does a near zero t-value indicate?
the mean of group 1 is indistinguishable from the mean of group 2
what does a large negative t-value indicate for an independent samples t-test?
the mean of group 1 is below the mean of group 2
what does a levene’s test, test for?
homogeneity of variance
what does a levene’s test assess?
assesses the null hypothesis that different groups of samples are from populations with equal variances
what does a significant value indicate for a levene’s test?
that the groups are likely to have different variances - suggesting that a pooled estimate of standard deviation is not appropriate