5.Review of Descriptive statistics and hypothesis Flashcards
what are descriptive statistics?
statistics simply to describe data collected, whether it be sample or population data. It is a Screen of the data and observation of trends
what are inferential statistics?
use sample statistics to infer something about a population
to test whether a difference/relationship seen in sample data is sufficiently large to accept it may be real in the population
allows us to test hypotheses and make decisions based on sample data
what do equations aim to do?
achieve specific things for specific purposes
what are equations made up of?
subcomponents all of which do something useful for achieving that purpose
what do equations produce?
numbers that are meaningful with respect to that purpose
what is the first thing we want to do when we imagine a set of data?
have a look at its distribution and we might want to think about how to characterise that distribution numerically
what are the characteristics of a data set?
central tendency, variability and shape
what is central tendency
mean
median
mode
what is variability?
sum of squares variance standard deviation range standard error
what is the normal distribution?
a function that represents the distribution of many random variables as a symmetrical bell-shaped graph.
what is modality (with regard to the shape of a distribution)?
the number of central clusters that a distribution possess
what are the two types of modality?
unimodal and bimodal
unimodal
scores vary around one central point
bimodal
scores vary around two “central” points
what does kurtosis mean?
“Peakedness” - how tightly clustered are scores arond the mean?
skew
the symmetry of the rails of the distribution
what are the characteristics of “normality” curve?
distribution is unimodal
has moderate peakedness
and has symmetric tails.
what does Sigma designate?
“The sum of” - so simply add them up
what is the symbol for sigma?
∑
what does ∑x mean?
the sum of all values of x
what does the mean tell us?
something useful about the center of the data-set
what does the mean not tell us?
doesnt tell us anything about the variability around the mean
what is the equation of the mean?
mean=M= (∑x)/n
what is a simple way we can calculate how each participant’s score varies with respect to the mean?
subtract the mean from each participant’s score.
X-M
how does subtracting the mean from each participant’s score characterise the data set as a whole?
when using sigma
thus ∑(X-M)
This will always sum to zero.
This is because we have subtracted the mean from each score that contributes to the mean. All we have left is the variability around the mean (which is 0)
what is ^2 (to the power of 2) also known as?
squared
what does X^2 designate?
X squared or X * X (X multiplied by itself)
why is using “square” handy?
because the square of negative numbers is positive
what is the abbreviation of sum of squared deviation?
SS
what is another way to say “Sum of squared deviation”
sum of squares
what is the equation for sum of squares or sum of squared deviations?
SS= ∑(X-M)^2
what does the sum of squared tell us?
it tells us something about the total variability in the data set, but does not really characterise the degree to which each participant varies around the mean
what is the abbreviation for variance?
SD^2
or
σ^2
how do we calculate the variance?
by dividing the sum of squares by the number of operations minus 1
That is:
σ^2= SS/(n-1)
what is the complete equation for variance?
σ^2=
(n-1)
what happens when you take the square root of the variance?
we can calculate the standard deviation
what is the abbreviation for standard deviation?
σ or SD
what is the complete equation for the standard deviation?
σ = √( SS / (n-1) )
what is the standard deviation?
the average amount of variability around the mean.. This is useful as any information about the degree of variability around the mean is important
what is the degrees of freedom?
the number of values in the final calculations of a statistic that are free to vary
what is the abbreviation of degrees of freedom?
df
what is the initial degrees of freedom equal to?
the number of observations
what is the abbreviation for the number of observations?
N
what is the equation for degrees of freedom when testing variability?
N-1
why do we minus 1 from N when calculating variabiliy (standard deviation) using degrees of freedom?
because when calculating the SD you first have to calculate the mean. In doing so, you use up one of your degrees of freedom. Therefore the df that remains for calculating the SD is N-1
what does using a degree of freedom where N-1 allow?
more accurate estimate of population parameters, which is what we want to do since we want to make inferences
what is the usual chosen measure of central tendency?
the mean
what doe the chosen measure of central tendency provide
provides an estimate of the level of performance in each condition
what is the usual measure of variability?
standard deviation
what does the measure of variability tell us?
how reliable the estimate is
What can outliers or extreme scores do?
effect both the measures of central tendency (especially the mean) and variability
what is the golden rule when measuring central tendency?
the measure of central tendency without an companying measure of variability cannot be accurately interpreted
finish the sentence:
Depending on the characteristics of the distribution the measure of central tendency may…
not be a good indicator of how the subjects performed
what is the measure of central tendency an estimate of?
effect size
what is the measure of variability an estimate of?
error
what is the general form that most of the statistical tests can use?
Stat=
(Estimate of Effect Size) / (Estimate of Error)
This is known as the stat value
how do we do inferential statistics?
compare the stat value against an approproate probability distribution
what can we infer if the stat value is sufficiently far from the center of the probability distribution?
that the stat value is significantly different from the mean
what does it mean to be significantly different or significantly far?
when p
when do we decide our p value (or significant level)
before we do out statistical test
what is the central tendency characteristic of a normal distribution?
mean = median = mode
what is the standard normal distribution?
Mean = 0
SD = 1
where every score or point on the distribution is associated with a probability of how often that score arises
what is the Z score?
the standardised normal distribution. IT is basically telling us how many SDs away from the M a particular score is
how can we calculate the z score?
if we know the mean (M) and the standard deviation (SD) of our set of data set, we can convert any score (X) to a Z-score simply by subtracting the mean, and the scaling (i.e. dividing) by the SD
what is the equation for a z score?
Z= (X-M) / SD
how does one find the Z score?
at the back of any leading stats book or using an online Z calculator
How do we know if a score is an outlier
if the Z score is > 3
what do we do with a Z > 3
we would exclude these scores from further analysis
what is an appropriate estimate of effect size?
the difference between an individual’s score and the mean of the distribution of the group of (individual’s) scores
This is appropriate because we are treating the group as the population of interest
what is an appropriate estimation of error>
the SD of the distribution of the group of (individual) scores
this is appropriate because we are essentially treating the group as the population of interest
what is finding a z score a case of?
hypothesis testing
what is the Z test asking?
does this particular individual belong to or differ from a particular population (of which we know the mean and SD).
more generally, we are asking questions about a group of people, where the population mean and the SD may be unknown
what do we need to do if we want to compare the mean of a group of peoples’ scores?
This is normally the case in an experiment
we need to compare this against a distribution of group mean scores
what is another way to say a distribution of group mean scores?
a distribution of means
the larger the set of means…
the smaller the variability
what is the comparison between a distribution of sampling means and a distribution of any given sample?
it has a much lower variability. This is proportional to the square root of the number of observations
what should we do if we wat to test a sample mean?
compare it to a distribution of sample means. But we do not need a whole bunch of sample means to form a distribution to test our particular sample mean of interest against
what does the behaviour of a normal distribution allow us to do?
to make an estimate of the variance (error) of the distribution of sample means
What is the Stsandard error of the mean
is the sample SD divided by the square root of the number of observations in the sample
what is the abbreviation of the standard error of the mean?
S(little)M
what is the equation of the standard error of the mean>
S_M= σ / √n
If we have a known population mean ( μ =100) and standard deviation (SD =10), we can determine whether a sample mean (M=104.75, n=20) is “significantly” different to the population. How can we do this?
using the Z equation.
Z= (M-μ) / S_M
= 104.75 – 100 / 10√20
=4.75 / 2.24
=2.12
as Z=2.12 is inside the critical region (below -1.97 or above 1.96) we can rejuct the null hypothesis and say there is a significant difference
what is the z equation for determining whether a sample mean is significantly different to the population?
Z= (M-μ) / S_M
when would you use a one sample t-test?
sometime the population parameters are not known. where the populatin mean is known by the Sd is not what can we use a one sample t-test
what is the equation for a one sample t-test?
t = (M-μ) / ( S / √n )
what is S in the one sample t-test equation?
S = estimated population standard deviation
wheat is not needed when estimating the population distribution?
the standard normal distribution (since one or more population parameter is unknown)
What is used instead of the standard normal distribution when estimating the population distribution?
a special family of distributions called t distribution
what are t distributions
approximations of the Z distribution, which changes shape according to the size of the degrees of freedom
why do t distributions change shape according to the size of the degrees of freedom?
because the larger our sample, the more accurate our sample statistics estimate the population parameters
what is the degrees of freedom?
the number of observation (N) minus the number of estimates made (e.g. the mean)
what do we need when using a table of the t distribution?
need to know the df and need to specify if we want a one-tailed (p
what does a larger degrees of freedom do to a t distribution?
makes the distribution taller, and when the df is smaller makes it flatter
how do t distributions and distributions differ>
they are similar but slight different for each sample size. get closer to normal as the sample size increases.
why is there more error involved in a t distribution?
because we have estimated population variabce so slight more distribution in tails
what does a smaller sample size of a t distribution indicte?
the smaller the df, the larget the critical t value that must be exceeded
what are the types of t tests
single sample t test
repeated measures t test
what is the equation for a single sample t test?
t= (M-μ) / S_M
what is the equation for a repeated measures t test?
same as single sample (t= (M-μ) / S_M ) but calculated from difference scores not raw scores.
Remember µ = 0 in Ho no difference
when dealing with difference between means, what is something we need
a corresponding distribution and error term
what is the equation of independent groups design t test?
t = (M_1 - M_2) / S_Diff
what is the equation for S_Diff?
S_difference = √ (S^_M1 + S^2_M2)
how do you calculate effect size for a z distribution?
individual score - sample mean?
what is the error of a z distribution?
sample standard deviation
what is the general statistic form?
Stat = (Estimate of effect size) / (Estimate of error)
what is the equation for testing an individual against a sample?
Z=(X-M)/SD
what is the equation for testing a sample against a known population (where the population SD is unknown)
t= (M-μ) /(S/√n)
what is the equation for testing a sample where population parameters are known?
Z= (M-μ) / (σ/√n)
what is the equation for testing two samples against eachother>
t= (M_1-M_2) / S_Diff
what is the question of error and statistical significance?
Is the difference we see sufficiently large given the amount of associated error. It is more likely to be an effect of IV or just sampling error.
what are the tree assumptions to be made when making a statistical test?
- all observations are independent
- Distributions are normally distributed
- Variance of one group is not too much larger than the other
what is the assumption that all observations are independent?
usually a methodological question. Ensure no one person’s performance is affected by or affects someone else’s
whatis the assumption that distributions are normal?
o check histograms of both groups + skewness & kurtosis
o if samples N>30 then sample distribution less important as theoretical distribution of the difference between the means will be normal
o Homogeneity of variance
what is the assumption that variance of one group is not too much larger than the other
o If doing manually; largest variance > x4 smallest variance problematic
o SPSS checks this automatically using Levene’s test
o Breaches to homogeneity assumption can inflate Type 1 Error
what does a statistically significant result not prove>
IV caused DV
what does causation and interpretation of results depend heavily on?
the nature and integrity of the research design
what does statistical significance indicate?
that the results seen is highly unlikely to happen by chance alone.