Johnny, CH.1 - Intro to Statistical Reasoning Flashcards
Research Process
What are the steps in the Research Process?
- initial observation
- theory
- hypothesis (identify variables)
- prediction
- data collection (measure variables)
- data analysis
see picture 1!
Before going from your initial observation to your theory, what must you do first?
You must identify one or more Variables, in order to be able to collect data later on
What is a Theory?
An explanation/set of principles that has been substantiated by repeated testing’s and explains a broad phenomenon
- A theory is general, and not specific to your observations
What is a Hypothesis?
A proposed explanation for a fairly narrow phenomenon of observations
- A hypothesis is specific and theory driven, and attempts to explain what has been observed
- The step from “Hypothesis” to “Generate Predictions” is to transform your hypothesis (something unobservable) into a prediction (something osbervable)
What are the 2 types of Variables?
- Predictor Variable (PV) (How Johnny calls the Independent Variable (IV)). It is thought to predict the outcome Variable
- Outcome Variable (OV) (How Johnny calls the Dependent Variable (DV)). Changes as a function of changes in a predictor variable
What are the different types of variables?
-
categorical
> binary (2 categories)
> nominal (>2 categories)
> ordinal (many ordered categories) -
continuous
> interval (equal intervals represent the same difference; no true 0)
> ratio (equal intervals represent equal differences; true 0)
see powerpoint!
Validity and Reliability
What is Validity?
Whether an instrument measures what it sets out to measure
What are the 4 types of Validity (mentioned by Johnny, not in general)?
- Criterion Validity: How accurately a test measures the outcome it was designed to measure.
~ Concurrent Validity: The extent of agreement between two measures when data are recorded simultaneously
~ Predictive Validity: The ability of a test to predict a future outcome. - Content Validity: Degree to which test items represent the constructs being measured
What is Reliability?
Whether an instrument can be interpreted consistently across many studies
Research Designs
What are the two main research methods?
- Correlational research method
- Experimental method
What is the correlational research method?
Observing natural events and their correlations without manipulation
What are some problems with the correlational research method?
- Doesn’t establish contiguity between two variables (which variable affects which)
- Tertium Quid (3rd variable problem)
What is the experimental method?
Researcher manipulates the IV and observe the effects of that manipulation on the DV
- it establishes causality
What are the different possible designs in an experimental research method?
- Between-groups/subject design (or else, independent design): There are different groups with different people for each condition
- Within-subject/repeated measures group: All participants are in all conditions
What is Variance?
The statistical measure of Variability
What are the 2 types of variance in an experimental research method?
- Systematic: Due to manipulation
- Unsystematic: Created by unknown (random) factors
What can researchers do to maximize systematic and minimize unsystematic variance?
Randomize participants to conditions OR randomize the order in which participants receive conditions.
EXAMPLE: in repeated measures design, there are the following problems
- Practice effects: Participants perform differently in the 2nd condition because of familiarity with the situation and measures
- Boredom effects.
If we randomize this solves the above problems:
- Half of the participants: Condition 1, then Condition 2
- The other half: Condition 2, then condition 1
Data Analysis - Distribution
What are Histograms (or frequency distributions)?
A visual representation of the distribution of quantitative data. (See picture 2)
What is the skew of a frequency distribution?
It’s the measure of asymmetry of a distribution
(See Picture 3 for different skews)
What is the kurtosis of a frequency distribution?
A statistical measure used to describe the degree to which scores cluster in the tails or the peak of a frequency distribution
- Leptokurtic: too many scores in the tails (Kurtosis>0)
- Platykurtic: too little scores in the tails (Kurtosis<0)
- Mesokurtic: Normal Distribution (Kurtosis=0)
!!! Pointiness of distribution does not play a role !!!
(See picture 4 for examples of the above)
What is the central tendency of a distribution?
Where the center of the distribution is
How do we compute the central tendency of a distribution?
Using the mean, mode and median
- Median: Middle score (in the case of an even number of scores, e.g. 10, median equals the two middle scores divided by two: (5th + 6th)/2)
~ Unaffected by extreme scores at either end of the distribution
What is the mode?
- Mode: The score that occurs more frequently in the data
~ If there are two most frequent scores, the distribution is bimodal
~ If there are more than two most frequent scores, the distribution is multimodal
What is the median?
- Median: Middle score (in the case of an even number of scores, e.g. 10, median equals the two middle scores divided by two: (5th + 6th)/2)
~ Unaffected by extreme scores at either end of the distribution
(Mean)
(No need to explain formula)
- Affected by extreme scores
What is the dispersion of scores?
How spread out the scores are. Some measures of dispersion:
- x(largest) - x(smallest): Range of scores
- QUANTILES: Values that split the data into equal proportions, e.g.
~ Quartiles (4 equal parts)
~ Noniles (9 equal parts)
~ Percentiles (100 equal parts)
- Look at how spread out each score is from the center of the distribution: deviance (or error) = xi - x(mean)
~ Total Deviance: Sum of (xi - x(mean)
~ If Total Deviance = 0, we can use the Sum of Squared Errors: SS = Sum of (xi - x(mean))^2.
What is a problem with the SS and what do we do in that case?
- the size of SS depends on number of scores
- Variance (S^2) = SS/(N-1)
> problem: the measure is in units squared - square root of the variance -> the standard deviation (S).
- As S increases, the distribution gets fatter (more leptokurtic)
How do do different variables change how the distributions are presented?
- Discrete/categorical Variables: p (proportion) = height of bar
- Continuous Variable: p = area under curve
What is a probability distribution?
It is like a histogram but without shapes or lines, instead with a curve.
- All in between values are possible
- When we convert any set of scores to a standard normal distribution (Mean of 0, SD of 1) we use z-scores.
~ Formula of z-scores: Z = [X - X(MEAN)]/SD
- If z of a certain score was below 0, that original score was below the mean
- If z of a certain score was above 0, that original score was above the mean
What is a frequency distribution?
It can be either a table or chart that shows each possible score on a scale of measurement along with number of times that score occurred in the table
Reporting Data
What are some guiding, general principles when it comes to reporting data?
- Choose a mode of presentation that optimizes the understanding of the data
- If you present 3 or fewer numbers, try using one sentence to report that data
- If you need to present 4-20 items, use something like a table
- If you need to present >100 items, use something like a graph
What are some other issues to be considered when reporting data?
How many decimal points you use when reporting numbers
- Fewer decimal points are better, the more we round up the better, but bear in mind the precision of the measure you’re reporting