2.4 - A Statistical Primer Flashcards
Descriptive Statistics
a set of techniques used to organization, summarize, and interpret data
Statistics used to describe and understand the data:
Frequency, central tendency, variability
Data Distribution
1) whether some scores occurred more often than others
2) whether all the scores were clumped in the middle or more evenly spaced across the whole range
Histogram
Bar graph
*vertical axis shows the frequency
Frequency
the number of observations that fall within a certain category or range of scores
Normal Distribution (Bell Curve)
a symmetrical distribution with values clustered around a central, mean value
Negatively Skewed Distribution
a distribution in which the curve has an extended tail to the left of the cluster
Positively Skewed Distribution
a distribution in which the curve has an extended tail to the right of the cluster
Skews occur because?
there is an upper or lower limit to the data
(ex. person cannot take less than 0 mins on a quiz, curve of quiz time cannot continue indefinitely to the left, beyond the 0 point)
Central Tendency
a measure of the central point of distribution
*measured usually by the mean, but there are exceptions
Three different measures of Central Tendency
mean, median, mode
Mean
the arithmetic average of set numbers
ex. class averages
Median
the 50th percentile - the point on the horizontal axis at which 50% of all observations are lower, and 50% of all observations are higher
Mode
the category with the highest frequency (category w/ most observations)
*measure that is used least
Which to use to calculate central tendency when mean, median, and mode are equal?
Normally distributed data - Mean
Mode = measure that is used least, provides less info than other two, used when dealing w/ categories of data (ex. when you vote for a candidate, the mode = candidate w/ most votes)
Skewed Data (Positively/Negatively) - median (extreme values have a large effect on mean but will not affect the median
*the longer the tail, the more the mean is pulled away from the centre of the curve
Variability
the degree to which scores are dispersed in a distribution
(some are spread out, some are clustered)
Higher Variability = larger # of cases that are closer to the extreme ends of the continuum for that set data
(ex. lots of excellent AND poor students in one class)
Lower Variability = most scores are similar
(ex. call filled with all “B” students)
*can be caused by measurement errors, imperfect measurement tools, differences between participants in the study, characteristics of participants on that given day (ex. mood, fatigue levels)
Standard Deviation
a measure of variability around the mean (estimate of the average distance from the mean)
*links central tendency and variability
The ______ always marks the 50th percentile of the distribution.
median
The ______ is a measure of variability around the mean of a distribution.
standard deviation
A histogram is created that presents data on the number of mistakes made on a memory test by participants in a research study. The vertical axis indicates?
the frequency of errors made
In a survey of recent graduates, your university reports that the mean salaries of the former students are positively skewed. What are the consequences of choosing the mean rather than the median or the mode in this case?
The mean is likely to provide a number that is higher than the largest cluster of scores
Hypothesis Test
a statistical method of evaluating whether differences among groups are meaningful, or could have been arrived at by chance alone
Statistical Significance
the means of the groups are father apart than you would expect them to be by random chance alone
- proposed by Ronald Fisher
- not used for limited numbers of potential participants
Null Hypothesis & Experimental Hypothesis
Null = any differences between groups (or conditions) are due to chance
Experimental = assumes that any differences are due to a variable controlled by the experimenter
P-value
the probability of the results being due to chance
lower p-value = decreased likelihood that results were a fluke, and therefor, an increased likelihood that it was a good experiment
Ronald FIsher
- presented idea of significance testing, rejected null hypothesis
- p-value cut-off point = p
Paul Meehl
- rejects significance testing
- more testing = more chance of fluke, p-value standard must be decreased as # of tests increase
Jacob Cohen
- developed power analysis
- goal to calculate effect sizes = whether difference is statistically small or large
- effect sizes allow researchers to adjust how much they believe that their hypothesis is true
A hypothesis test is conducted after an experiment to?
A) determine whether the two groups in the study are exactly the same
B) determine how well the two groups are correlated
C) see if the groups are significantly different, as opposed to being different due to chance
D) summarize the distribution using a single score
see if the groups are significantly different, as opposed to being different due to chance
Imagine an experiment where the mean of the experimental group is 50 and the mean of the control group is 40. Given that the two means are obviously different, is it still possible for a researcher to say that the two groups are not significantly different?
A) Yes, the two groups could overlap so much that the difference was not significant
B)Yes, if the difference was not predicted by the hypothesis
C) No, because the two groups are so far apart that the difference must be significant
D) No, in statistics a difference of 10 points is just enough to be significant
Yes, the two groups could overlap so much that the difference was not significant