Definitions Flashcards
measurement
- Assigning number or codes to aspects of objects or events according to rules. -
- positioning observations along numerical continuum -
- classifying observations into categories
Observation
Unit upon which measurement is made
Variable
measurable charactoeristic that varies among persons, places, or objects
Nominal measuremsents
Observation variable that have two or more categories, but there is no intrinsic ordering to the categories. Nonparametric.
Examples: sex, blood type
aka. Categorical variable, attribute variable, qualitative variables
Ordinal measurements
Observation variable that has categories that can be put into rank order. Differs from interval, b/c space b/w values is not equal. Non-parametric.
Examples:Stage of cancer on a point scale); economic status (low, med, high)
Quantitative measurements
Observation variables are along meaningful numeric scale.
- Interval = is equal spacing scale, but not absolute zero. (i.e. Farenhight, celcius)
- Ratio = is value has absolute zero and can be added. (i.e. age, body weight, kelvin)
aka, ratio/interval measurement, numeric variable, scale variable, continuous variable.
Surveys
Type of study used to quantify population characteristics. “sampling” rule of statistics b/c data for entire population is rarely available.
Simple Random Sample (SRS)
Randomly sample population to collect data so:
1) each population member has same probability of being selected in the sample
2) selection of any individ into the samples is not bias for selecting another individ.
aka. sampling independence
Cautions
samples that tend to over- or under-represent certain segment of pop that can bias survey results.
Undercoverage
Type of sample caution. Occurs when some groups in the source pop are left out or underrepresented. Will undermine achieving equal selection probabilities.
Volunteer Bias
Type of sample caution. Occurs b/c self-selected participants of a survey are atypical of pop. ex. web survey volunteers have a particular view point causing hem to participate
Nonresponse Bias
Type of sample caution. Large % not represented, Occurs when large % of individs refuse to participate in survey. nonrepsonders differ from responders, which skews survey.
Probability Sample
Each member of pop has known probability of being selected. Include SRS, stratified random samples, cluster samples, and multistage sampling
Stratified random sample
draws independent SRS from a homogeneous “groups” or “strata.” Ex. divide pop into age groups
Cluster samples
Randomly selects large units (clusters) consisting of smaller subunits. Ex. list of household addresses to study all individs in cluster.
Comparative study
Learn relationship b/w an exploratory variable and a response variable. Compare group expose vs. not expose to exploratory factor.
- two types: Experimental and Non-Experimental (observationa)
Experimental studies
Investigator assigns exposure to one group and not the other
Nonexperimental Studues
investigator classifies groups as exposed or nonexposed w/o intervention aka. Observational studies
Exploratory Variable (IV)
Treatment or exposure that explains or predicts change in the response variable.
aka. (IV) Independent variable
Response Variable (DV)
Outcome or response being investigated.
aka. (DV)Dependent variable.
Lurking variables
Extraneous factors
Confounding Variables
Distortion in an association b/w exploratory variable and response variable by influence of extraneous factors.
Factors
Exploratory variables in experiments
Treatment
Specific set of factors applied to subject
Intersection
Factors in combination produce effects that could not be predicted by looking at the effect of the factors separately.
Trials
Experiments involving human subjects. Two types: Controlled and Randomized Controlled
Randomized control trial
Assigned treatment is based on chance. Helps sort out effect of treatment from those of lurking variables.
Equipoise
Balanced doubt about benefits and rick
Discrete variable
Finite number of values b/w any 2 points
Continuous variable
infinite number of values b/w 2 points
Shape (graph)
Configuration of data points as they appear on a graph. Described in terms of :
- skewness: shape reflects mirror image
- modality: number of peaks -
- kurtosis: “peakedness” of distrubution
Location (graph)
Distribution summarized by its center (Central tendency)
- Mean: center of distribution. “arithmetic avg.” is distrib. balancing point -
- Median -
- Mode
Depth of data Point
Corresponds to its rank from wither top or bottom of ordered list of values.
Spread (graph)
Refers to distribution/variability of data points.
Measures of Spread
- Range
- Quartiles
- Stnd. Dev.
- variance
Class intervals
Group data in intervals with equal or unequal spacing before tallying freq.
Endpoint Conversion: ensure observations falls within interval
- include left boundary and exclude the right
- include right boundary and exclude left
Relative Frequency
Proportion equation: freq. counts/ by total.
Expressed in %
Cumulative Frequency
Proportion that falls in or below a certain level.
Equation: add two consecutive Rel. Frequencies.
Expressed in %
Bar Chart
Display freq. with bars that correspond to height of freq.
Best for categorical variables
Histogram
Bar chart with line connecting freq. .
Best for Quantitative variables
Descriptive Statistics
Set of observations that describe the characteristics of a sample.
ex: Cetntral tendency (mean, median, mode), Variability (St. Dev. variance, range, quartiles)
Inferential Statistics
Set of statistical techniques that provide predictions about the population based on info in the pop sample.
Univariate Statistics
Involve one variable at a time (i.e. age, height, weight)
Bivariate statistics
Involve two variables of the sample examined simultaneously (pre/post test)
Multivariate Statistics
Involve 2 or more variables in the same analysis
Stemplot
graphical technique that organizes data in a histogram-like display
mean
Arithmetic average of data VALUES. Balancing point in a set. Highly susceptible to outliers and skew.
Formula:
- sample: (Σ n)/n = (X bar); Population: (Σ N)/N = µ
Functions: 1) predict individ. value drawn at random from sample, 2) predict value drawn at random from pop
* Best to pair with Stn. Dev for symmetrical distributions
Median
Midpoint of a distribution in CASES. More ROBUST (resilient to outliers and skew.)
Formula: put in order, calculate (n+1)/2, count places to midpoint.
* Best to pair with IQR for asymmetrical distributions. always Q2, 50th percentile
Mode
Most frequently occurring value in data set.
Useful in only ;arge sets with repeating values.
Variability
Measure of spread. Fundamental interest of behavior scientists.
Range
Measure spread of distribution. simplest measure of variability. Max -Minimum distribution Limitations; known to be biased or or highly unstable; increases w/ sample size. *Should always be supplemented with another unit of measure.
Quartile
Intuitive way to describe variability by dividing data set into 4 segments: - Q0 (min) = 0% - Q1 (lower hinge) =25% - Q2 (median) = 50% - Q3 (upper hinge) = 75% - Q4 (Max) = 100% Find MEDIAN to identify quartiles
Hinges
Orded array of “folds” upon itself.
Interquartile Range
Summary spread of measure that captures middle 50% of data points in set.
- 5 poitn sumary (Q0 - Q4)
IQR = Q3-Q1 (when Q3 is median b/w Q2 and Q4; Q1 is MEDIAN b/w Q0 and Q2; Q3 is the overall median)
Not sensitive to extreme values.
Box-and-Whiskers plot
Displays five-point summaries and “potential outliers” in graphical form.
aka. box plot.
box: spans IQR
Fences
lower = Q1 - (1.5)IQR. Upper = Q3 + (1.5)IQR Values below fences are “lower outside values” Values above upper fence are “upper outside values” Smallest values inside lower fence is the :lower inside values” Largest value inside upper fence is “upper inside value”
Variance
Common measure of spread.
Population: σ^2 = SS/(N) Sample: S^2 = SS/(n-1)*
SS=Sum of Squared deviations
*substract 1 from n to force a larger variance and SD (makes it an unbiased estimate)
Variability
- Always present average with variability as to not misrepresent data.
- 2 data sets can have the same average but differenct variability.
Standard Deviation
Common measure of spread Unbiased estimate of samples (good scientists are CONSERVATIVE!)
Formula: Square root of variance
- Sensitive to outliers and skews
- Useful for making comparisons
- smaller the SD, the more HOMOGENIOUS the set
Chebychev’s Rule
For Data sets: At least 3/4s of the date points lie within two stn. devs. of the mean.
Normal Rule
For data sets: applies only to distributions with a particular NORMAL shape..
- 68.3% of points fall within mean + 1 stb. dev. -
- 95.4% of data points lie within mean + 2 stn. devs -
- 99.7% of data points lie within mean + 3 stn. devs.
aka. 68-95-99.7 rule
Properties of Noral Curve:
- Asymmetrical
- unimodal
- bellshaped
- mean, median and mode are equal
Symmetrical vs. Asymmetrical Distribution
Symmetrical: Mean = Median
Asymmetrical: Mean not = Median -
- Positive Skew: Mean > Median -
- Negative skew: Mean < Median
Sum of Squares
Each data points deviation from the data set mean, squared, then all sumed. aka. SS +E (X1 - Xbar)^2 Calculating formula: SS= Ex^2 -((EX)^2/ N). 1) Sum data points and square, then divie by n. 2) Square each data point and then sum, 3) value of 2-1. *mathematically the same as above, needed for SPSS.
Probability
proportion of times an event is expected to occur.
Between 0 (never) and 1 (always)
Founded on ralative frequencies.
Probability: random variable
Numerical quantity that takes on different values depending on chance
Probability: population
set of all possible outcomes for a random variable
Probability: Event
An outcome or set of outcomes for a random variable
Probability: Discrete random variables
Countable set of possible outcomes. Fractional units not possible. ex. variable # of luekemia cares in the US in 1995, variable # of successes in n independent treatments,
Probability: Continuous Random variable
outcome quantities with unbroken continuum of possible values. Ex. variable amount of time it takes to complete a task; average weight or height of a newborn.
4 Properites of probability functions
1) Range of Prob. - individ. props are never less than 0 and never more than 1 . 01 2) Total Prob. - probs in the sample space must sum to 1. Pr(S) =1 3) Complements - prob of a complement is equal to 1 minus prob of event . Pr (_A_) = 1 - Pr(A) 4. Disjoint events - events are disjiont if they cannot exist concurrently. Pr(A or B) = Pr(A) + Pr(B)
Z score
States the number of std. devs by which the original score lies above or below the mean of a normal curve. Formula: z = (x^i - x_)/ s - z distribution aka. standard Normal curve. - Mean = 0; s= 1 - Method to interpret raw score; takes into account mean value and variability of set of raw scores.
Types of scores
- Raw Score (x): individual observed scores on measured variables. - Deviation of score (s) - standard score (Z)
Normal Curve
- Bell shape, symmetrical, unimodal. - Same Mean, Median, and Mode - precise relationship b/w area under curve and Std. Dev.
Law of Probability
Use statistical framework that allows researchers to determine how likely it is that the research findings based on sample data are VALID. Proportion of times an event is expected to occur in the population. Prob. ranges from 0 to 1
Inference
Act of using data in a sample to make generalizations about its population.
Goals:
- hypothesis testing
- estimate value of population parameters
Statistical Population
entire collection of values that conclusions are drawl on.
Hypothetical Population
Infinitely large population of potential values that could ensure following study.
Parameters vs. statistics
Parameter: numerical characteristics of a statistical population (population level) Statistic: value calculated in a sample. (sample level) - use different symbols (i.e u, σ vs. X_, s for mean)
Statistic –> statistical inference –> Parameter –> Random selection –> Statistic
Sampling distribution of a mean
The hypothetical distribution of mean from all possible samples of size n taken from the same population.
Characteristics:
- follows central limit theorem
- unbiased estimator of population mean.
- Samples means are less variable than individ. distribution. (square root law)
Central Limit Theorem
Sampling distribution of x̅ tends toward Normality even when the underlying population is not Normal
i.e. Distrubution gets narrower as sample size increases
Standard error of the mean (SE)
Standard Deviation of x̅
Formula: SEx= σ/ √(n)
Law of large numbers: As an SRS gets larger and larger, its sample mean x̅ gets closer and closer to the true value of pop. mean.
Null hypothesis
Statement of NO difference H^o: u = “some number”
Reject H0 = True (Type I error, a)/ False (correct decision)
Fail to Reject Ho=True (correct decision)/ False (Type II error, ß)
Alpha:
- Probabilty of Type I error
- Chnce you are willing to take in mistakenly rejecting a true null hypothesis
Beta:
- Probability of Type II error
- Chnce you are wiling to take in mistakenly accepting a false null hypothesis
Alternative hypothesis
Statement that claims a difference from null hypothesis.
Ha: u <,>, –> one-sided z-test
Ha: µ not = –> two-sided z-test
Zstat
Statistical distance of samples mean X_ from the hypothesized value of u this provides the weight of evidence for or against Ho. Zstat = (X_ - uo)/ SE_X_
Point Estimation
- Provides a single estaimtate of the parameter
- No info regarding probability of accuracy; best “guestimate”
Central Limit Theorem
If populiation is not Normal, the distribution of sample means approaches Normal distribution as the size of sample gets larger.
Hypothesis Testing Steps
- Define hypothesis: Hoand Ha.
- Test Statistic: calculate SE and Z/Tstat
- Determine P-value: Z/Tstat for CL
- Decide Significance level: Compare Z/Tstat to P-value. Statistically signifigant or not?
- State Conclusion
Interval Estimation
Provides a range of values (CI) that seekd to capture the parameter
- Confidence Interval between two limit values.
t-Test
Testing statistical hypothesis about µ when
1) σ is unknown
2) samples size is small (n > 30)
Degrees of Freedom (df)
Value indicating the # of independent pices of info a sample can provide for purposes of statistical inference.
Determining CI for µ
x̅ ± t¤ /2* SE
Mean Difference shoudl fall between upper and lower bound,
Ex. 90% CI –> ¤ = .1 –> .1/2 = .05 –> (1-.05) =.095
Look up in t-stat table: df and P(.095)
Single Sample
Reflect experience of a single group. NO control group, but results are cmpared to norms or expected values
Paired Sample
Uses Data from two samples in which each data point in the first samples is matched to a data point in the 2nd sample.
Ex. Pre- and Post-sample from same subject
Independent Samples t-Test
Use when comparing two samples in order to draw inferences about groups differences in the population.
- Two levels of a nominal level variable; dependent variable approximates interval-scale characteristics. I.e DV = #tv hrs; RV = males, females
- assumption of equal variances .
- St. Dev of such sampling distribution is standard error of the difference.
Independent Samples
Usese two smapels from separate populations. Data points are unrelated.
Ex. Eperimental study with treatment and control
ANOVA
One-way analysis of variance
- compares 3 or more groups defined by one factor.
- variation is the response analyized to understand group differences; in place of independent t-Test.
- Ho: µ1= µ2= … = µk
EX: patients assigned to three treatment groups and measured on stress score (DV) in reaction to treatment (IV)
Mean of Squares Bewteen (MSB)
(ANOVA)
Quantifies variance of group means around the grand mean.
MSB = SSB/ dfB
SSB =n(x - grand Xbar)2 +…. –> (group mean - grand mean)2 x group n +…
- measures variability between the groups comparing to grand mean.
Mean Square Within (MSW)
ANOVA
Quantifies variability of data points in a group around its mean.
MSW = SSW/ dfW
SSW = (x - Xbar)2 +……. –> (individual point - group mean)2 + ….. then sum all SS together
- Measures variability within each data group.
F-statistic
(ANOVA)
- Ratio of MSB and MSW.
- Large F-stat suggests the observed mean differences are NOT merelry due to random noise.
- Fstat = MSB/MSW
- When converting f-stat to P-values: DF: numerator dfB/ denominator dfW
Levene Test
Tests for variances assumed equal. Use when comparing two or more groups (samples).
Ho: σ12=σ22 = σ32
Accept null when p-value is greater than CI.
Correlation Coefficient (r)
Strength of a linear relationship.
1- < r 0 < r <1
Stength
- Close to 1: when all point fall on a line with an upward slope
- Close to 0: lack of linear correlation
Direction:
- Upward slope = postive number
- Downward slope = negative number
3 r’s:
- metric…
Coefficient of determination (r2)
Statistic that quantifies the proportion of variance in Y explained by X.
Expressed by coverting r2 to % - x% of varience of Y is explained by X
Single Regression Line
Expresses functional relationship b/w X and Y by fitted a line to observed data.
- Observed y = predicted y + residual
- Residual = observed y - predicted y
Least Squares regression Line: drawn to minimize sum of squares
Formula: ŷ = a +bx; ŷ = predicted y, a = interception of regression at Y axis , b = slope.coefficient
b = r (sy/sx)
a = Ybar = b(Xbar)
Notes:
- Not rebust
- b show relationship b/w X and Y in same units as measure. r is unit-free measeure of strength
- X must be IV; Y must be DV
Confidence Interval for Population Slope
Hypothesis:
- Ho:B = 0
- Ha:B not = 0
t-stat = b/ SEb
CI formula: b +/- tn-2, 1 - (¤/2)* SEb
- If “0” is captured in the CI for population slope, data is NOT sig.
Multiple Regression
Address multiple exploratory variable (IVs) in relation for response variable (DV).
IMPROVES prediction by using two or more variables to predict a dependent variable.
Formula: Y’ = a + b1X1+ b2X2 ….
Kurtosis
Refers to the “peakedness” of a distribution.
- Leptokurtic: narrow peak
- PLatykurtic: flat peak (plataeu)
Chi-Squared Test
- Measure os association b/w 2 nominal variables
- magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
- does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
Formula = Observed - Expected/ Expected
PARAMETRIC VERSUS NONPARAMETRIC STATISTICS
- Use nonparametric stats when:
- the parametric assumptions cannot be justified: normal distribution, equal variances, etc.
- data as gathered are measured on nominal or ordinal data
Properties of Sampling distribution
- mean of a sampling distribution of means will be the same as the mean of scores in the population (µ).
- Central Limit Theorem
- Allows us to determine the probability that the particular sample obtained will be unrepresentative.
*
One -Sample Z test
- Used to compare a sample mean to a (hypothesized) population mean and determine how likely (chance) it is that the sample came from that population.
- Compare the probability associated with statistical results (i.e. probability of chance) with a predetermined alpha level.