Statistics Flashcards
sample
A sample is selected to represent the population in a research study; helps answer questions about a population.
variable
characteristic or condition that can change or take on different values.
discrete variables
(such as class size) consist of indivisible categories (weird when it is represented by a fraction
continuous variables
(such as time or weight) are infinitely divisible into whatever units a researcher may choose. Also, that can be legitimately measured.
goal of experiment
Goal of an experiment is to demonstrate a cause-and-effect relationship between two variables; that is, to show that changing the value of one variable causes changes to occur in a second variable.
IV and DV
In an experiment, the manipulated variable is called the independent variable and the observed variable is the dependent variable.
non-experimental or quasiexperimental
non-experimental or quasi-experimental, are similar to experiments because they also compare groups of scores. These studies do not use a manipulated variable to differentiate the groups. Instead, the variable that differentiates the groups is usually a pre-existing participant variable (such as male/female) or a time variable (such as before/after).
Similar to correlational research because they simply demonstrate and describe relationships
positively skewed
In a positively skewed distribution, the scores tend to pile up on the left side of the distribution with the tail tapering off to the right.
negatively skewed
In a negatively skewed distribution, the scores tend to pile up on the right side and the tail points to the left.
percentile rank
The percentile rank for a particular X value is the percentage of individuals with scores equal to or less than that X value. When an X value is described by its rank, it is called a percentile.
nominal
name only, categorical; only permit you to determine whether two individuals are the same or different. (i.e. male/female; diagnosis)
ordinal
rank ordered (e.g. height shortest to tallest); tell you the direction of difference between two individuals.
spearman correlation (i.e. class rank)
interval
consistent intervals between numbers but no absolute zero (i.e. IQ); identify the direction and magnitude of a difference
ratio
interval plus absolute zero – height in inches; identify the direction and magnitude of differences and allow ratio comparisons of measurements
reliability
same results with repeated administrations
validity
measures what it says it measures- taps into the construct
standard error of the mean (SEM)
measure of variability; the average expected difference between sample means (i.e. M1 – M2 expected)
confidence interval
a type of interval estimate of a population parameter and is used to indicate the reliability of an estimate. Certain factors may affect the confidence interval size including size of sample, level of confidence, and population variability. A larger sample size normally will lead to a better estimate of the population parameter.
sampling error
The discrepancy between a sample statistic and its population parameter is called sampling error.
central tendency in normal distribution
a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores. The goal of central tendency is to identify the single value that is the best representative for the entire set of data. Allows researchers to summarize or condense a large set of data into a single value (thus a descriptive statistic).
mean and a mean in skewed data
Mean: the average; most commonly used; requires scores that are numerical values measured on an interval or ratio scale
When a distribution contains a few extreme scores (or is very skewed), the mean will be pulled toward the extremes (displaced toward the tail).
median
Median: If the scores in a distribution are listed in order from smallest to largest, the median is defined as the midpoint of the list; values measured on an ordinal, interval, or ratio scale
Relatively unaffected by extreme scores
mode
the most frequently occurring category or score in the distribution; any scale of measurement: nominal, ordinal, interval, or ratio
what will be equal in a symmetrical distribution
the mean and median will always be equal
variability
goal is to obtain a measure of how spread out the scores are in a distribution; describes how the scores are scattered around that central point. In the context of inferential statistics, variability provides a measure of how accurately any individual score or sample represents the entire population. Measuring distance.
range and interquartile range
largest to smallest
The interquartile range is the distance covered by the middle 50% of the distribution (the difference between Q1 and Q3).
standard deviation
average distance between most scores in the distribution (square root of the variance) – b/w the score and the mean
variance
average squared deviation between most scores in the distribution (it is the standard deviation squared)
negative skew vs positive skew outliers
Positive Skew – extreme outlier(s) in the positive/ high end
Negative Skew – extreme outlier(s) in the negative/ low end
implication of skew for parameters
violates the parametric assumptions needed for the parametric tests (t-tests, anova, pearson correlation)
descriptive statistics
methods for organizing and summarizing data
-mean, median, mode, standard deviation, variance
parameter vs statistic
A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic.
frequency distribution
organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. A frequency distribution presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution.
regular frequency distribution
when a frequency distribution table lists all of the individual categories (X values) it is called a regular frequency distribution.
grouped frequency distribution
In a grouped frequency distribution, the X column lists groups of scores, called class intervals, rather than individual values (too many different X values).
why are frequency distribution graphs useful
Frequency distribution graphs are useful because they show the entire set of scores. At a glance, you can determine the highest score, the lowest score, where the scores are centered, most common score.
inferential statistics
methods for using sample data to make general conclusions (inferences) about populations
effect size
measure of the strength of a phenomenon
hypothesis testing
general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results from a research study. A technique to help determine whether a specific treatment has an effect on the individuals in a population.
null hypothesis
H0, always states that the treatment has no effect (no change, no difference). According to the null hypothesis, the population mean after treatment is the same is it was before treatment.
Test statistic in critical region → reject the null = p<0.05 = there is an effect, difference
Test statistic in the body (outside of the critical region) → fail to reject the null = there is no effect/ no difference
alternative hypothesis
a statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is less than, greater than, or not equal to the value states in the null hypothesis
alpha level: establishes a criterion, or “cut-off”, for making a decision about the null hypothesis. The alpha level also determines the risk of a Type I error.
Power - probability that the test will reject the null hypothesis when the null hypothesis is false (find an effect when there is one)
-Influenced by: Alpha level; Sample size; Sensitivity of test; Effect size
type 1 error
occurs when the sample data appear to show a treatment effect when, in fact, there is none. In this case the researcher will reject the null hypothesis and falsely conclude that the treatment has an effect.
what causes type 1 error
caused by unusual, unrepresentative samples. Just by chance the researcher selects an extreme sample with the result that the sample falls in the critical region even though the treatment has no effect.
The hypothesis test is structured so that Type I errors are very unlikely; specifically, the probability of a Type I error is equal to the alpha level.
type 2 error
occurs when the sample does not appear to have been affected by the treatment when, in fact, the treatment does have an effect. In this case, the researcher will fail to reject the null hypothesis and falsely conclude that the treatment does not have an effect.
what causes type 2 errors
commonly the result of a very small treatment effect. Although the treatment does have an effect, it is not large enough to show up in the research study.
directional test or one tailed test
A directional test or a one-tailed test includes the directional prediction in the statement of the hypotheses and in the location of the critical region.
what is recommended that the hypothesis test be accompanied by
effect size
We use Cohen’s d as a standardized measure of effect size. Much like a z-score, Cohen’s d measures the size of the mean difference in terms of the standard deviation (Impact of the IV).
what are the three parametric assumptions we made to engage in inferential statistics
1) Independent Observations – random selection, representative sample you can’t just bring friends to the experiment – b/c of bias (not representative)
2) Normally distributed: Populations which samples are selected from are normally distributed (if not normally distributed then can’t run these tests)
3) Homogeneity of variance: populations from which samples are selected have equal variances (if change the experiment then the scores should shift equally)
independent measures between subjects t test
(2 separate groups)
An independent-measures design can be used to test for mean differences between two distinct populations (such as men versus women) or between two different treatment conditions (such as drug versus no-drug).
repeated measures design t test
single group of individuals is obtained and each individual is measured in both of the treatment conditions being compared. Thus, the data consist of two scores for each individual.
related sample t -test, matched subjects design
each individual in one treatment is matched one-to-one with a corresponding individual in the second treatment
ANOVA?
comparing 3 or more treatment conditions; more than 1 IV (factor); more than 2 levels of an IV
what does analysis of variance do in ANOVA
controls the risk of a Type I error in situations where a study is comparing more than two population means
why use post hoc tests
ANOVA simply establishes that differences exist, it does not indicate exactly which treatments are different. Specifically, you must follow the ANOVA with additional tests, called post hoc tests, to determine exactly which treatments are different and which are not.
examples of post hoc tests
The Scheffe test and Tukey=s HSD are examples of post tests. Indicates exactly where the difference is
what does the repeated measure design do
the repeated measures design eliminates individual differences from the between treatments variability because the same subjects are used in every treatment condition
MANOVA
ANOVA with the addition of multiple Dependent variables
ANCOVA
ANOVA with a covariate- a variable that varies systematically with the IV
-covariate: another variable that has a relation to the DV
when are non parametric tests used
used when violate the parametric assumptions (or data is NOT interval or ratio level)
chi square tests
tests the shape of the distribution of nominal data/ categorical/ not a parametric test
goodness of fit in chi square
roughly the same number of subjects in each category? OR does the distribution fit a predetermined distribution (i.e. 40% male and 60% female)
test for independence
similar to correlation in that it looks at the relationship between 2 variables but uses NOMINAL data
Mann Whitney U
analogous to independent measures t-test
Friedman tests
analogous to repeated measures anova
Wilcoxin test
analogous to repeated measures t-test
Kruskal Wallace
analogous to independent measures (one-way) anova
correlation
tests the relationship between 2 variables that occur naturally: relationship only; no cause and effect; determine whether there is a relationship between two variables and to describe the relationship
sign and strength
positive or negative tells nothing about the strength of the relationship but tells about the direction (both go up together or one goes up the other goes down)
-Negative correlation: one variable increases, other decreases
-Positive correlation: one variable increases, the other increases
Strength – between 0 and 1 (closer to 1 = stronger)
restriction of range
can make the relationship seem weaker because you are only getting a small snapshot of the full relationship between the two variables (i.e. looking at the relationship between age and reaction time if you only use 19-22 year olds you won’t find a relationship)
pearson product moment correlation
used for parametric data/ interval or ratio level data with a linear relationship
spearman correlation
used for non-linear (i.e. inverted U relationship) or when data is ordinal/ ranked
regression
used for prediction; a statistical technique that related a dependent variable to one or more independent (explanatory) variables. A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables
multiple regression
same as regression but using multiple variables to make the prediction (i.e. SAT scores, HS GPA, and quality of essay to predict college GPA)
path analysis
allows for the evaluation of the causal flow of relationships because a priori predictions are made about the correlations
logistic regression
prediction of group membership (doctor, lawyer, tradesperson) from interval or ratio level predictors (scores usually)
discriminant analysis
same as Logistic regression but predictors can be any kind of variable (dichotomous, interval, ratio, nominal)
statistics
branch of mathematics used to summarize, analyze, and interpret a group of numbers or observations
research method or scientific method
set of systemic techniques used to acquire, modify, and integrate knowledge concerning observable and measurable phenomena
experiments
use of methods and procedures to make observations in which a researcher fully controls the conditions and experiences of participants by applying three required elements of control (manipulation, randomization, and comparison/control) to isolate cause-and-effect relationships between variables
correlations
examines the relationship between variables by measuring pairs of score for each individual. This method can determine whether a relationship exists between variables, but it lacks the appropriate controls needed to demonstrate cause and effect.
quasi experiments
this study does not include a manipulated independent variable and it lack a comparison/control group
quasi independent variable
preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups
population
set of all individuals, items, or data of interest. This is the group about which scientist will generalize
sample
set of individuals, items, or data selected from a population of interest.
independent variable
variable that is manipulated in an experiment. This variable remains unchanged (or “independent”) between conditions being observed in an experiment. The specific conditions of the IV are referred to as levels.
dependent variable
the variable that is measured in each group of a study, and it is believed to change in the presence of the independent variable. It is the “presumed effect.”
descriptive statistics
procedures used to summarize, organize, and make sense of a set of scores called data. They are typically presented graphically or in tables.
inferential statistics
procedures used that allow researchers to infer or generalize observations made with samples to the larger population from which they are selected
population parameter
a characteristic (usually numeric) that describes a population. **Will usually never have this!
sample statistic
a characteristic (usually numeric) that describes a sample
nominal
are measurements in which a number is assigned to represent something or someone (a name, has no numerical properties)
ordinal
measurements that convey order or rank alone (does not tell us space between ranks)
interval
measurements that have no true zero and are distributed in equal units (equidistant, ex. Likert Scale)
ratio
measurements that have a true 0 and are distributed in equal units
qualitative
varies by class. This variable is often represented as a label and describes nonnumeric aspects of phenomena.
quantitative
varies by amount. This variable is measured numerically and is often collected by measuring or counting.
continuous
is measured alone a continuum at any place beyond the decimal points, can thus be measures in fractional units (ex. Time)
discrete
measured in whole units or categories that not distributed along a continuum
central tendency
statistical measures for locating a single score that is most representative or descriptive of all scores in a distribution
mean and when best to use
also called the average) is the sum of a set of scores in a distribution, divided by the total number of scores summed.
**Best to use the mean in a normal distribution and with interval and ratio scales. In normal distributions, the mean median, and mode are all the same
median and when best to use
is the middle value in a distribution of data listed in numeric order (aka the 50th percentile)
**Best to use the median with data that had outliers, is skewed and with ordinal scales
mode and when best to use
is the value in a data set that occurs most often or most frequently (can have multiple modes)
**Best to use the mode when working with a modal/bimodal distribution (curve sinks in the middle) and with nominal scales
normal distribution
bell-shaped) a theoretical distribution in which scores are symmetrically distributed above and below the mean, the median, and the mode at the centers of the distribution
skewed distribution
distribution of scores that includes scores that fall substantially above or below most other scores in a data set
-Positive Skew: a distribution of scores that includes scores that are substantially larger than most other score (tail/flat part will be on the positive/ right side)
-Negative Skew: a distribution of scores that includes scores that are substantially smaller than most other scores (tail/flat part will be on the negative/ left side)
bimodal distribution
a distribution of scores in which two scores occur most often or most frequently (a high hump where one mode is, a dip in the center (mean/median), and then another high hump where the second mode is, refer to pg. 95 in Privitera book for visual so you don’t have to keep listening to me talk about humps)
variability
is a measure of the dispersion or spread of scores in a distribution that ranges from 0 to positive infinity
range
is the difference between the largest value (L) and smallest value (S) in a data set.
interquartile range
is the range of values between the upper (Q3) and the lower (Q1) quartiles of a data set
Lower Quartile: the median value of the lower half of a data set at the 25th percentile of a distribution
Upper Quartile: is the median value of the upper half of the data set at the 75th percentile of a distribution
SS (sum of squares)
is the sum of the squared deviations of scores from their mean. The SS is the numerator in the variance formula
variance
measure of variability for the average squared difference that scores deviate from their mean
standard deviation
measure of variability for the average distance that scores deviate from their mean. It is calculated by taking the square root of the variance.
z score
value on the x-axis of a standard normal distribution. The numerical value of a z score specifies the distance or the number of standard deviations that a value is above or below the mean
characteristics of normal distribution
-Mathematically defined
-Theoretical (no distribution is perfect, it is approximate)
-Mean, median, and mode all in the 50th percentile
-It is symmetrical
-The total area under the curve is equal to 1 or 100%
sampling distribution
a distribution of samples means or sample variances that could be obtained in samples of a given size from the same population
**Sampling distribution of sampling means is an unbiased estimator and follows the central limit theorem
unbiased estimator
any sample statistic, such as sample variance when we divide SS by n-1, obtained from a randomly selected samples that equals the value of its respective population parameter, such as population variance, on average
central limit theorem
explains that regardless of the distribution of scores in a population, the sampling distribution of samples means selected at random from that population will approach the shape of a normal distribution, as the number of samples in the sampling distribution increases
standard error of the mean
the standard deviation of a sampling distribution of sampling means. It is the standard error or distance that sample mean values deviate from the value of the population mean
sampling error
the extent to which sample means selected from the same population differ from the mean of sampling distribution of sample means (SDSM)
-The question is: is you randomly select one sample mean from the sample, how close will it be to the SDSM?
-Therefore, sampling error is the measurement of how narrow you SDSM is so a lot of error will mean that when taking a random sample mean, that number will be far off from the SDSM
factors that decrease standard error
The larger the population standard deviation, the larger the standard error
-As numerator gets bigger, quantity as a whole gets bigger
-The wider the distribution, the less likely you are to get a random sample mean that is close to the SDSM
As sample size increases, standard error decreases
-As denominator gets bigger, quantity gets smaller
-Narrower distribution
bigger population = less error
z scores for sample means
tells you the probability of sampling a particular mean from a particular population (NEED population standard devation)
hypothesis testing
a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. In this method, we use test a hypothesis by determining the likelihood that a sample statistic would be selected if the hypothesis regarding the population parameter were true
-Goal: to see if difference are significant
null hypothesis
is a statement about a population parameter, such as the population mean, that is assumed to be true
alternative hypothesis
a statement that directly contradicts a null hypothesis by stating that the actual value of a population parameter is less than, greater than, or not equal to the value states in the null hypothesis
level of significance
is a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a static measured in a sample if the value stated in the null hypothesis were true (p value)
significance
describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance.
type 1 error
is the probability of rejecting the null hypothesis that is actually true. Researchers directly control for the probability of committing this type of error (using alpha). So, saying there is significance and rejecting the null when the null was actually true (false positive)
alpha level
the largest probability of committing a Type I error than we will allow and still decide to reject the null hypothesis (usually .05)
type 2 error
is the probability of retaining a null hypothesis that is actually false. So, retaining the null and saying your results are not significant when really the alternative hypothesis is true (false negative)
power
the probability that a randomly selected sample will show that the null hypothesis is false when the null hypothesis is indeed false
critical value
a cutoff value that define the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true
z statistic
an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates form the population mean stated in the null hypothesis
**To do a z-test, you need a POPULATION standard deviation, which is very unlikely. That is why t test are more commonly used.
effect size
a statistical measure od the size of an effect in a population, which allows researchers to describe how far scores shifted in the population, or the present of variance that can be explained by a given variable (Cohen’s d)
Cohen’s d effects: Small= d < 0.2, Medium= 0.2 < d < 0.8, Large= d > 0.8
one tailed
alternative hypothesis is stated as specifically greater of less than, region of rejection is on either the upper or lower portion of sampling distribution, makes it easier to reject the null and has greater power BUT is harder to justify (everybody will be like… that’s some whack ass research)
two tailed
alternative hypothesis is stated as not equal, region of rejection is upper AND lower portion of sampling distribution, ALWAYS USE TWO TAILS
t statistic
an inferential statistic used to determine the number of standard deviations in a t distribution that a sample mean deviates from the mean value or mean difference stated in the null hypothesis
assumptions for t test
Normality
Random Sampling
Independence
Equal Variances
-(larger s2 divided by smaller s2, should be less than 2 to show equality)
one independent sample t test
a statistical procedure used to compare a mean value measured in a sample to known value in the population. It is specifically used to test hypothesis concerning the mean in a single population with an unknown variance (or other arbitrary value).
two independent sample t test
a statistical procedure used to compare the mean difference between two independent groups. This test is specifically used to test hypotheses concerning the difference between two population means, where the variance in one or both populations is unknown
related samples
participants are related
repeated measures design
research design in which the same participants are observed in each treatment
pre post test design
a types of repeated-measures design in which researchers measure a dependent variable from participants before and after a treatment
within subjects design
type of repeated-measures design in which researchers observe the same participants across many treatments but not necessarily before and after a treatment
matched pairs design and the two ways it can be done
research design in which participants are selected and then matched based on common characteristics or traits
Experimentally: ex. measure intelligence and then match the two participants who scored the highest (put one in the drug group and one in the placebo group)
Naturally: ex. testing twins
advantages for selecting related samples
More practical
Minimizes standard error
-Most important advantage, it reduces standard error by removing individual differences and isolating effect
Increases power
disadvantages for selecting related samples
Requires individuals to be exposed to more than one treatment
-The 1st test may influence the score of the 2nd test (aka order effects)
-People get tired the more tests you give them
one sample z test
a statistical procedure used to test hypothesis concerning the mean in a single population with a known variance
independent sample
a type of sample in which different participants are independently observed one time in each group
one independent sample t test
only one group of participants that are given treatment and then compared to the population mean or some arbitrary value
two independent sample t test
participants in each group or sample are unrelated, selected from 2 or more populations, or single population with random different groups, then the groups are compared to each other
related or dependent sample
participants are related
repeated measures design
is a research design in which the same participants are observed in each treatment. Two types of repeated-measures designs are the pre-post design and the within-subjects design
pre-post design
types of design in which researchers measure a dependent variable for participants before (pre) and after (post) a treatment
within subject design
is a type of repeated-measures design in which researchers observe the same participants across many treatments but not necessarily before and after a treatment
matched pairs design
a within-subjects research design in which participants are selected and then matched, experimentally or naturally, based on common characteristics or traits
Experimentally: ex. Measure intelligence then match the two participants who scored the highest with each other and so on
Naturally: ex. Match based on genetics (twins)
difference score
score or value obtained by subtracting one score from another. In a related-samples t test, difference score are obtained prior to computing the test statistic
error
a test, this refers to any unexplained difference that cannot be attributed to, or caused by, having different treatments. The standard error of the man is used to measure error or unexplained differences in a statistical design.
levels of the factor
symbolized as K, are the number of groups or different ways in which an independent or quasi-independent variable is observed
analysis of variance (ANOVA)
statistical procedure used to test hypothesis for one or more factors concerning the variance among two or more group means (k>2), where the variance in one or more populations is unknown
one way between subjects ANOVA
a statistical procedure used to test hypothesis for one factor with two or more levels concerning the variance among group means. This test is used when different participants are observed at each level of a factor and the variance in any one population is unknown.
source of variation
variation that can be measures in a study. In the one-way between-subjects ANOVA, there are two sources of variation: variation attributed to difference between group means and variation attributed to error.
within groups variation
is the variation attributed to mean differences with each group. This source of variation cannot be attributed to or caused by having difference groups and is therefore called error variation (this is bad variation)
between groups variation
variation attributed to the mean differences between groups (this is good variation). This is where you will see treatment effects.
post hoc test
statistical procedure computed following a significant ANOVA to determine which pair or pairs of group means significantly differ. This test is only necessary when k >2 because multiple comparisons are needed. When k=2, only one comparison is made because only one pair of group means can be compared.
factorial design
research design in which participants are observed across the combination of levels of two of more factors
two way ANOVA
statistical procedure used to test hypotheses concerning the variance of groups created by combining the levels of two factors. This test is used when the variance in any one population is unknown.
1 between 1 within design or mixed design
research design in which different participants are observed at each level of the between-subjects factor and the same participants are observed across the levels of the within-subjects factor.
cell
combination of one level from each factor. Each cell is a group in a research study
complete factorial design
research design in which each level of one factor is combined or crossed with each level of the other factor, with participants observed in each cell or combination of levels
main effect
source of variation associated with mean differences across the levels of a single factor. In the two-way ANOVA, there are two factors and therefore two main effects: one for Factor A and one for Factor B.
interaction
source of variation associated with the variance of group means across the combination of levels of two factors. It is a measure of how cell means at each level of one factor change across the levels of a second factor.
levels
number of different ways a variable is manipulated or measured
conditions
unique combinations of different levels of different independent variables
simple main effects test
are hypothesis tests used to analyze a significant interaction by comparing the mean differences or simple main effects of one factor at each level of a second factor
**Main effects and interactions occur independently of each other
eye balling
add cell means then divide by how many cells, the marginal means will inform you about main effects and the cell means will inform you about interactions
one way vs two way anova
one way: uses one variable (e.g. average heights of plants grown with different fertilizers)
two way: uses two independent variables, also assesses for interaction between the two independent variables (e.g. worker productivity based on gender and department)