Stats Flashcards
2 Basic Mathematical Principles important for EPPP
Squaring Decimals
Square rooting Decimals
Critical Factor in determining the type of stat test to be used
Type of data, particularly for the DV
4 Types of Data
*NOIR
Nominal
Ordinal
Interval
Ratio
Nominal data
Non ordered categorical data, assigned a number for identification purposes but no further meaning to numbers
Sex, political party, race
Can compute percentages
Ordinal Data
Ordered categorical data
Ex-grouped according to SES
Interval Data
Numerical scores, but no zero score, or zero is not absolute (e.g. temp in celcius or farenheit)
Ratio data
Numerical score, has an absolute zero
Ex- money in bank, EPPP score, weight
Means can be calculated as well a comparisons across values
2 Broad classes of statistics
Descriptive
Inferential
With descriptive stats, the data collected is ____, whereas with inferential stats, the goal is to make inferences about the ___ from the ___
simply described
population
sample
2 basic groups of Descriptive stats
- Stats on on whole group’s data
2. Stats describing ind’s score relative to the group
Descriptive stats on group data include
measures of central tendency
measures of variability
Graphs
Measures of Central Tendency
Mean-avg score
Median- score at 50th percentile
Mode-most frequently occurring score
The best measure of central tendency is typically the ___
mean
If data is skewed (extreme scores present) the most accurate measure of central tendency is ___
median
Measure of Variability
Standard Deviation-avg spread from the mean
Variance-
Range-diff between lowest & highest score obtained
Standard deviation is the __ __ of the variance
square root
Variance is the standard deviation
squared
Data that are not normally distributed are ___ or ___, meaning that scores are not equally distributed above & below the mean
skewed, kurtotic
In a positive skew, how are measures of central tendency impacted?
Mode is lowest, mean is highest
In a negative skew, how are measures of central tendency impacted?
Mode is highest, mean is lowest
Leptokurtic distribution
Very sharp peak
Platykurtotic Distribution
Flattened
Normal Distribution
Bell shaped
Norm referenced score
provides info as to how a person scored relative to the group
The most informative norm referenced score is the ___ ___.
Percentile rank
Graphs for percentile ranks are ___ or ___
flat, rectangular
Standard scores
based on standard deviation of the sample
Examples of standard scores
z-scores t-scores IQ scores SAT scores EPPP scores
z-score
most basic standard score
corresponds directly to standard deviation units, mean of 0, SD of 1
Ex- z score of +2 means the score is 2 SDs above the mean
Shape of z score distribution always same as raw score distribution
z-score formula
z= score - mean/standard deviation
Parameters vs. Statistics
Population values vs Sample Values
mu
population mean
sigma
population standard deviation
Sampling Error
Samples are not perfectly representative of the population (sample means not identical to pop mean)
Standard Error of the Mean
The avg amount of deviation in a distribution of sample means
Standard Error of the Mean formula
SD population/square root of N
Central Limit Theorem
If an infinite number of equal sized samples are drawn from a population, the means of these samples will be a normal distribution.
The mean of the means (the grand mean) will equal the population mean
The standard deviation of the means will equal the SD of the population divided by the square root of the sample size (standard error of the mean)
*the shape of a sampling distribution of means approaches normality as sample size increases
Standard Error of the mean helps up to determine
If an obtained mean is most likely due to treatment/experimental effects vs chance (sampling error)
Ex: if SEM of IQ is 3 and testing the effectiveness of a IQ enhancement program yields a mean sample IQ of 103 this difference is likely due to chance. as opposed to sample IQ of 110, which would be 3 standard errors away from the mean (meaning that this is likely statistically significant)
Key concepts in hypothesis testing
Null Hypothesis
Alternative Hypothesis
Rejection Hypothesis
Null Hypothesis
States that there are no differences between groups, experimental research always hopes to reject the null hyp
*results almost always stated in terms of the null hypothesis
Alternative Hypothesis
Directly states that there are differences between groups
Rejection region/Region of Unlikely Values
The tail end of the curve; unlikely that a researcher will obtain means in this region simply by chance. Suggests that treatment did have an effect & null hyp is rejected
Size of the rejection region corresponds to the ___ ___
alpha level
Ex: alpha of .05 indicates that rejection region is 5% of the curve
Acceptance/Retention region
No sig diffs between groups, null hyp is accepted
2 Factors contributing to conclusions re: stat significance
- Treatment Effects
2. Sampling Error
The only way to know w/certainty if a tx effect is significant is to:
Replicate study numerous times
4 Possible Outcomes in terms of Correctness of Research Findings
Type I Error
Type II Error
Power
Correct Decision w/no name
Type I Error
Null is rejected, but later turns out to be a mistake, or diffs are found when they do not actually exist
The size of ___directly corresponds to likelihood of making Type I Error
Alpha
Conventional cutoff for alpha (.05, .01. .001) indicate that:
obtained means are different enough to be attributed to tx effects and not to chance
Type II Error
Null is accepted, but this is a mistake, or no diffs are found where differences do actually exist
The value of ___ corresponds to the probability of making Type II error
beta
Power
Null is rejected, and this is correct
Defined as the ability to correctly reject the null
Factors affecting Power
Increased w/: Large Sample Size Small random error Magnitude of intervention is large Statistical test is parametric Test is one tailed
___ has the most sig measurable effect of power; as ___ increases, so does power.
Beta; Alpha
Correct Decision w/no name
Null is accepted and this is correct
In determining the appropriate statistical test, you must first determine:
what type of question is being addressed in the research
Commonly asked questions in research
Questions of Difference between groups
Questions of Relationship & Prediction
Questions of Structure or Fit
Steps to Select the Appropriate Test of Difference
- Type of Data of the DV (Nominal, Ordinal, Interval, Ratio)
- Number of IVs and Levels of IVs
- Sample/Group Independence vs. Correlation
If the DV is Nominal or Ordinal, a ___ test test will be used
non-parametric, for example chi-square, Mann-Whitney, Wilcoxin
If the DV is interval or ratio data, a ___ test will be used
parametric, for example t-test or ANOVA
If there is more than one DV (interval or ratio data), a ___ will the stat test of choice
MANOVA
Independent Groups
Subjects randomly assigned to conditions or are grouped based on a pre-existing characteristic (gender or ethnicity)
3 Factors Resulting in Correlated Groups
- Repeated measures
- Subjects matched prior to assignment to groups (i.e. matched on income, IQ, etc)
- Inherent relationship between subjects (twins, siblings, spouses)
In order to use a parametric test, what 3 assumptions must be met?
- Data is interval or ratio
- Homoscedasticity-similar variability or SDs in the different groups
- Data must be normally distributed
* If one of these is not met, stat of choice will typically be one use for ordinal data
Assumption for the chi square test
Non parametric test
Answer: Independence of observations (no repeated measures design)
Degrees of freedom
# of possible variations in outcome that can be obtained *calculated differently based on the type of stat test
Single Sample Chi Square
Nominal data collected for 1 IV
Ex: 100 psychologists sampled as to their political affiliation (political party seen as columns or groups)
Single Sample Chi Square degrees of freedom formula
df= #columns - 1
Multiple Sample Chi Square degrees of freedom formula
Nominal data collected for 2 IVs
df= (#rows - 1) x (#columns -1)
Standard Error of the mean has a direct relationship with the ____ ____ ____ and an indirect relationship with ___ ___
population standard deviation
sample size
*SEM increases as SD increases and sample size decreases
2 Way ANOVA calculates:
calculates 3 F ratios (one for each main effect and one for the interaction)
df formula for single sample t test
df=N - 1
N- number of subjects
when do we use a one sample t test?
interval or ratio data collected for one group of subjects
Ex-BDI obtained for 30 subjects
when do we use a t test for matched or correlated samples?
interval or ratio data collected for 2 correlated groups of subjects
Ex- BDI obtained for 2 matched groups of 15 people (so 30 total)
df formula for matched samples t test
df= #pairs - 1
when do we use a Multiple sample chi square?
nominal data collected for 2 IVs
Ex- 100 psychologists sampled as to voting pref and ethnicity
when do we use a t test for independent samples?
interval or ratio data collected for 2 independent groups of subjects
Ex-BDI obtained for 2 group of 15 randomly assigned subjects (30 total)
df formula for t test for independent samples
df= N -2
One Way ANOVA
interval or ratio data collected for more than 2 groups of subjects
Ex- 60 subjects assigned to one of 4 tx groups
Formulas for df in one way ANOVA
df total= N - 1
df between groups= #groups - 1
df within groups= dftotal - dfbetweengroups
Formula for Expected Frequency in Chi Square when N & the groups are given
Expected Freq= N/total # of cells Ex- 4x2 chi square with a sample of 160 total # of cells is 8 160/8=20 expected freq in each cell=20
Formula for expected freq in any cell when data are given for a chi square
Expected freq for any cell= (sum of the row x sum of the column)/ N
When do you use a one-way ANOVA?
when more than 2 groups are being compared on one IV
Ex- comparing 4 diff depression txs
preferable to using multiple t tests to avoid increasing probability of Type I error
Stat for One Way ANOVA
F Ratio
Want to find high variability between groups and low within
Formula for F Ratio; Guidelines for significance
F ratio= Mean Square between groups/Mean Square within groups
*Mean square is measure of avg variability
F Ratio= 1, no significance
Typically sig when above 2.0
A significant F Ratio with an ANOVA means:
There are differences between groups, but you do not know which ones. Must perform post hoc analyses
Post hoc analyses following significant ANOVA involve:
many pairwise comparisons
Possible post hoc tests following sig ANOVA, in order from most to least protection from Type I error
Scheffe Tukey Duncan Dunette Neuman-Kuels Fisher's least sig diff *reverse order for protection from Type II error
When to use a Two Way ANOVA & main advantage over 2 separate one way ANOVAs
Groups are being compared on 2 IVs (ex- sex and treatment); examines main effects for each IV and interaction effects
In a 2 way ANOVA, if there are sig main & interaction effects, which is interp first?
Interactions
To calculate Main & Interaction effects of a 2 Way ANOVA on the test you:
- Find the sum of each column (if sums are different, there is a main effect for that IV)
- Find the sum of each row (if sums are different, there is a main effect for the second IV)
- Divide the table into squares and the diagonal means for each square (if sums are diff, there is an interaction effect for those IVs)
When do we use a MANOVA?
When there is more than one outcome measure or DV
When an IV is quantitative, how do we analyze the data?
Trend Analysis
Ex: IV is dosage of a drug, length of time, etc
Data is non-linear, so less interested in group diffs but trends in the data
Stats depicting relationships between variables are termed ____, while stats that predict are termed ___ or ___
correlations
regressions/analyses
Bivariate correlations
look at relationship between variables, X (predictor) and Y (criterion)
Range of Correlation Coefficient
-1.0 to +1.0 (describes strength and direction of the correlation)
Graphic depictions of correlations
data point reps ind’s score on both X and Y, the closer the points are clustered, the stronger the correlation
Correlation coefficient tells you
how the variability or spread of Y scores for any given X score compares to the total variability of Y scores
Ex- if there is no correlation at all (coefficient of 0.0), for any given X, the range of possible Y could be anywhere from bottom to top of possible scores
Coefficient of Determination
correlation coefficient squared
Represents amount of variability in Y that is explained or accounted for by X
Ex- correlation coefficient of .50 for level of education and income
.5 squared= .25, meaning that 25% of variability in income is explained by education level
Simple Linear Regression Equation
Derived anytime the correlation coefficient is other than 0.0, based on line of best fit through the scatter plot of scores
3 basic assumptions of bivariate correlations
Linear relationship between X and Y
Homoscedasticity-similar spread of scores across scatter plot
Unrestricted range of scores on both X and Y
Impact of restriction of range
Correlation, reliability and validity is always dramatically lower when the range of either variable is restricted
For Bivariate correlations, if both X and Y are interval or ratio data, you use
Pearson r
For Bivariate correlations, if both X and Y are ordinal (rank ordered) data, you use
Spearman’s rho or Kendall’s Tau
Zero Order Correlation
most basic correlation
analyzes rel btwn X and Y when no extraneous variable affect relationship
Partial Correlation ( First Order)
examines rel btwn X and Y when effect of a third, confounding variable is removed
Ex: examine relationship btwn GPA & SAT scores after removing impact of parental education
Part (Semipartial) Correlation
examines rel btwn X and Y when the effect of a third, confounding variable is removed from only one of the orig variables
Moderator Variable (in Bivariate Corr)
A variable that influences the strength of relationship between predictor & criterion
Ex- relationship between income & smoking may be different strength at diff ages
Mediator Variable (in Bivar Corr)
Explains why there is a rel between predictor & criterion
Ex- if effect of education removed from link btwn SES and smoking, corr goes down to almost 0
Multivariate Tests of correlation & prediction
Involve several predictors or IVs & one or more criterions or DVs Multiple R Multiple Regression Canonical R & Canonical Analysis Discriminant Functional Analysis Loglinear Analysis Path Analysis Structual Equation Modeling
Multiple R
Correlation btwn 2 or more IVs and one DV, where Y is always interval or ratio data and at least one X is interval or ratio data
Coefficient of Multiple Determination
Index of amt of variability in criterion Y that is accounted for by all predictors (Xs).
Multiple Regression
Uses Multiple R to derive equation that allows prediction of the criterion based on values of the predictors
- To optimally predict, want low corr btwn predictors (Xs) and moderate to high corr btwn each predictor and the criterion
- Compensatory technique b/c low scores on one predictor can be compensated for by high scores on another
Multicollinearity
Problem that occurs w/multiple regression equation when predictors are highly correlated with one another
2 most common subtypes of multiple regression
Stepwise-computerized, forward or backward
Hierarchical-researcher controls, adds variables to regr analysis in order most consistent w/theory proposed
Canonical R & Canonical Analysis
Extension of multiple R
Corr btwn 2 or more IVs (rpedictor set) and 2 or more DVs (criterion set)
*compensatory approach
Discriminant Fx Analysis
Used when there are 2 or more predictors (Xs) and one nominal (categorical) criterion variable
Ex: predicting likelihood of passing or failing EPPP (categorical Y) based on time spent studying and number of practice tests completed
*compensatory
Loglinear Analysis
Used to predict categorical criterion (Y) based on categorical predictors (Xs)
Ex: type of grad program (categorical X) and sex (categorical X) used as predictors for passing or failing EPPP (cat Y)
*compensatory
2 Approaches that apply correlational techniques to causal modeling
Path Analysis
Structural Equation Modeling
Tests of Structure
determine which variables in the set fit best together or form coherent subsets that are relatively independent of one another
Includes:
Factor Analysis, Cluster Analsysis
Factor Analysis
Extracts as many sig factors from the data (strongest to weakest), stronger the factor the more it will account for variability in scores
Eigenvalue
indicates strength of a factor, less than 1.0 are not interpreted
Factor Analysis starts w/___ ___ and computes ___ ___, which are correlations between a variable and the underlying factor
correlation matrix
factor loadings
Factor Rotation
Makes factor loadings more distinct & interpretable
2 types of factor rotation
Orthogonal (axes remain perpendicular)
Oblique
Cluster analysis
Gather data on variety of DVs and look for naturally occurring subgroups in the data, without a priori hypotheses