Midterm Flashcards
Descriptive statistics
just a way to describe the data - charts and graphs
Inferential statistics
allows you to make predictions from the data
- raw data put through statistical tests to come up with a conclusion about a population
- allows generalizations from a sample group
nominal level of measurement
named categories
-yes or no, race, gender, country, ethnicity, hair color
ordinal level of measurement
categorical data that’s ranked or ordered - has innate order
example: good, fair, poor; strongly agree to strongly disagree
* Weight could also be ordinal like boxes with >50kg, <50kg, etc.
interval level of measurement
equal distance b/w each value
Difference b/w 30 and 35 degrees Celsius is the same measurement as b/w 40-45 degrees Celsius
ratio level of measurement
same attributes as interval measurement, but has absolute 0 and no negative values
*example: length in cm - can’t have negative cm measurement
Place of absolute zero and no negative values
Ex. Length in cm – can have zero but no negative
Interval and ratio can be difficult to distinguish b/w (she will not make us do this)
Weight could be ordinal and ratio
Data classified as discrete
Can take on one value out of a limited number of options
• Number of kids (1,2,3, etc. but can’t have 1.2)
• Heart rate, number of pregnancies, number of hospital admissions, number of students in a class, shoe sizes, number of questions answered correctly
Dichotomous – a specific discrete variable where there are only two values
• Gender (M or F) Limited number of options (only 2 options)
Under 65 or over 65 would also be dichotomous discrete and ordinal…wouldn’t be interval or ratio because there is not an equal distance b/w each
Yes or no (dichotomous, nominal)
*all dichotomous data is nominal?
Data classified as continuous
Can take on any number w/I a range
Only limited by precision of measurement tool used
Look at height – only cm marks 158cm tall, 159 cm tall but height really could be more precise
o Some tests you can only use discrete or continuous – be able to pick this out in a study
*ex. standiometer can measure height to the 1/10 or the 1/100
central tendency
measured with what level of data?
gives you typical value (average), and three ways you can determine this are mean, median, and mode. Gives us a point value, one number that represents the whole data set.
Are measured in interval and ratio level data
Mean can only be calculated with?
mean is the average - doesn’t work with categories, and can only be calculated with interval or ratio level data
Median can be calculated with?
middle
calculated with ordinal, interval, ratio
Mode can be calculated with?
any type of data
Dispersion/variability
how closely the numbers cluster around the mean, median, and mode
aka variance from the standard score - the range and spread of the data from the center
Gives information about the spread of scores and indicates how well a measure of central tendency represents the “middle/average” value in the data set
So, you will often see a median reported and then a standard deviation (so this is using central tendency & dispersion).
Variance, Standard deviation - Under DISPERSION
What level of data do you typically see these with?
What does a small SD tell you?
Variance and standard deviation you will often see with interval or ratio level data. Don’t need to be able to calculate a standard deviation. She wants us to be able to look at one and figure out what it is telling us. So, if you have an article that tells you the standard deviation of something and it’s really small – that tells you that all the data points cluster closely together, and they are all close to the mean or the median.
Variance and standard deviation are related. The square root of variance is standard deviation.
Variance
the average difference b/w the data values and mean of a data set
*the average degree to which each point differs from the mean – the average of all data points
SD
standard deviation = the avg amount that data values will vary from the mean - how closely values are clustered
*example: SD small - data is close together and variance is small, if SD is large then there are more variables in the data so it’s more spread out
Range
Very simple measure of dispersion
Calculate by taking the max value in the data set and subtract from the minimum value = range. Smaller number you have for a range the closer the data set is, and the more clustered and less variable.
Ex: 9,3,2,6,7,8,7,5 so 9-2 = 7 = RANGE
Interquartile range
difference b/w the 75th percentile and the 25th percentile
ex. 11222333445 mean is 3, 25% is 2 and 75% is 4 so, 4-2 = 2 which is the interquartile range
The range and interquartile range proves a rough estimate of the variability of a data set but doesn’t use all of the data values
Frequency distribution - curtosis of curve leptokurtic
thin - peaked curve
shows what continuous data looks like
Frequency distribution - curtosis of curve mesokurtic
more normal curve
*shows what continuous data looks like
Frequency distribution - curtosis of curve platykurtic
flat curve
*shows what continuous data looks like
Describe the normal attributes of a normal distribution - bell curve
frequency distribution of data in which the data values are equally distributed around the center of the data point; normal bell curve; mean, median, and mode equal; symmetrical-not skewed
68% - within 1 SD of mean
95% - within 2 SD of mean
99% - within 3 SD of mean
Kurtosis
measure of how peaked/flat a distribution is
Skewness
measure of whether the set is symmetrical or off center
Distribution is said to be normal when both measures of skewness and kurtosis fall b/w
-1 and 1
and, not normal if fall below -1 or above 1
What does the level of significance mean?
When does researcher determine this value? What type of error is this?
researcher determines level before collecting data
% of the time that researcher will conclude that there is a statistically significant difference b/w groups or relationship b/w variables when there truly isn’t
Type I error
- aka the probability of incorrectly rejecting the null hypothesis
- level of significance (a)
type II error
probability of concluding there is no difference b/w groups or NO relationship b/w variables when there truly is – probability of accepting the null when it is not true
*fail to reject null hypothesis by error
P-value definition
probability that the difference, or one larger found, could arise by chance
*the difference b/w the two groups was statistically significant (p= 0.02) since it was less than 0.05 – reject the null hypothesis b/c there is a difference
If the p value is less than level of sig…
reject the null b/c there IS a difference
If the p value is greater than level of sig…
fail to reject null (accept the null) b/s there is NO difference
When is an independent t-test appropriate to use? What assumptions must be met?
when you are comparing the means of two independent groups of subjects
Assumptions: independence, normality, homogeneity of variances, DV level of measurement interval or ratio
In an SPSS output what does N stand for?
the sample size
With Levene’s test for equality of variances - looking at homogeneity of variance…what do we want? Where is the P value for this test listed?
Want to fail to reject the null hypothesis, so you want P value to be greater than 0.05
Listed under “SIG”
Remember that if homogeneity of variance is not met, you can’t use a t-test
For t-test always use the line equal variances assumed*
Use what formula to report t-test findings
t(df) = t, test p -value
t(22) = 2.225, p = 0.037
Degrees of freedom - df = what?
you would add up your total number of people (12) + (12) = 24 and then subtract # 0f groups (2) so 24-2= 22 is your t value
M & SD come from which chart t-test?
group statistics chart
when is a paired t-test appropriate to use? what is a paired t test also called?
when you want to compare the means of two paired groups of subjects
ex. pre/post test, twin studies, husband/wives
you can’t assume independence b/c not two independent groups - somehow groups are connected*
don’t have to worry about homogeneity of variance for this reason
also called a t test for dependent groups**
assumptions needed for t-test for paired data?
normality, DV (DEPENDENT VARIABLE) level of measurement interval/ratio
The steps of hypothesis testing - 6 steps
1- develop null and research hypothesis
2- choose level of sig
3- determine which statistical test is appropriate
4- run analysis to obtain test statistic and p value
5- make decision about rejecting or failing to reject the null hypothesis
6- make a conclusion
T test for paired data hypothesis - there will be a difference in IQ b/w the preschool and home group. Is this directional or non-directional?
non-directional, so 2 tailed test
t - test results for paired data - here you can find
mean difference in scores b/w preschool and no preschool…see slide page 3, mod 2 session 4
df for paired t test =
df = total # of pairs -1
When is an ANOVA appropriate to use?
What assumptions must be met?
when comparing the means of 3 or more independent groups of subjects
*same as independent t test:
independence of groups, normality of data, homogeneity of variances, DV level of measurement must be interval/ratio
ex. 1 group - printed d/c instructions only; 1 group - verbal d/c instructions only; 1 group - printed and verbal
After ANOVA if significance is found (reject the null) then what has to be done?
post hoc testing
ANOVA degrees of freedom
n=? N=? k =?
k = # of groups
n = number in each group, but ONLY if all group sizes are equal and if they’re not you use big N which is the total sample size
N = total sample size
df(between) =
k -1
df(within) =
nk-k or N-k
df(total) =
nk-1 or N-1
Show results of ANOVA as
F (between, within) = F#, p-value
F(2,51) = 13.630, p < 0.000 (you can’t have p=0.000 so it has to be p<0.000
F in ANOVA is
the F-statistic is this ratio: F = variation between sample means / variation within the samples.
In ANOVA how can you find the total number of people in the study
total df + 1
Post hoc test
if you find significance in ANOVA then must do post hoc to compare all the groups for significance to see which one is different
when do you use a two-way ANOVA?
compare effects of more than one independent variable on a dependent variable
two-way ANOVA assumptions
1-independence
2-normality
3-homogeneity of variance
4-DV interval/ratio
Two-way ANOVA looks for what effects?
- what is the main effect of IV A on the DV?
- what is the main effect of IV B on the DV?
- SO, post hoc testing would be used if we found a statistical sig when examining these two variables…
So, interaction effect…
-what is the interaction effect of IV A and IV B on the DV?
Two-way ANOVA test b/w subjects interpreted as?
F(df, error#) = F, p-value