Midterm #1 Flashcards
what are the three most common used methological designs for collecting data
- correlational design
- experimental design
- quasi-experimental design
describe a correlational design
- investigates relationships between variables without the researcher controlling or manipulating any of them
- you cant draw claims about cause-ande-effect
describe an experimental design
- used to test the causal effect of one variable on another variable
researchers manipulate which level of an independent variable participants are assigned to
- IV = manipulated variable
- DV = variable that changes as a result of IV manipulation
random assignment
participants randomly assigned to receive one of the manipulations
describe a quasi-experimental design
- a combination of correlational and experimental
- participants assigned to a group based on their level on an already-existing characteristics. Then, their scored on a dependent variable are measured.
ex. i ask my participants whether they studied or didnt study for the final exam, and then I measure their exam grades.
what is a discrete variable
a variable that has specific values and cannot have values in between these
- ask ‘how many’ –> ex. number of children in a household
what is a continuous variable
a variable that can take on fractional values
- ask ‘how much’ –> ex. yearly income
what are the scales of measurement
- nominal
- interval
- ratio
- ordinal
describe nominal
measuring a variable using categories (ex. type of pet)
describe interval
measuring a variable using numerical values; equal intervals between adjacent values (ex. temp in Fahrenheit)
describe ratio
measuring a variable using numerical vales; equal intervals between adjacent values and a zero means the absence of the variable (ex. height in inches)
describe ordinal
measuring a variable using rankings; adjacent values are not equally spaced (ex. ranking states by their populations)
what are the two main functions of statistics
- describing / summarizing data
- making inferences from a smaller set of people to a larger set of people
what are descriptive statistics
techniques for organizing a group of numbers so the data can be more easily comprehended
- descriptive numbers (ex. mean, median, and mode)
- tables and graphs
what are inferential statistics
techniques for drawing conclusions about a very large group from a smaller subset of people
- population: a large group of people that a researcher wants to draw conclusions about from their study
- sample: a smaller subset of cases selected from the larger population
what do each columns of a frequency table indicate
first column: name of the variable and all possible values from HIGHEST to LOWEST
second column: the frequency of each value in a data set
third value: the cumulative frequency, which is the frequency of a given value or lower than the given value
fourth column: percentages column, which is the frequencies transformed into frequencies (% = frequency / total number of people *100)
fifth column: the cumulative percentage, which is the cumulative frequencies transformed into percentages (%c = fc / total number of people *100)
what variable are bar graphs used for and why
discrete variables because they offer finite categories (ex. high school, associate’s, bachelor’s, etc.)
what variable are histograms graphs used for and why
continuous variables. bars are touching (continuous variable)
describe the shapes of a frequency distribution
modality
skewness
kurtosis
describe modality
refers to how many high points (i.e. peaks) it has. peaks represent values with the highest frequency.
- unimodal = one value at highest frequency
- bimodal = two values at highest frequency
- multimodal = three plus values at highest frequency
describe skewness
the measure of how symmetrical a frequency distribution is
symmetrical distribution
- the pattern of frequencies on one half of the distribution is a mirror image of the pattern of frequencies on the other half (‘normal distribution’)
skewed distribution
- the majority of scores “pile up” on one side
- negatively skewed = fewer scores on the negative side of the distribution
- positively skewed = fewer scores on the positive side of the distribution
describe kurtosis
refers to how peaked or flat a frequency distribution is
mesokurtic = typically seen (‘normal distribution’)
platykurtic = flatter than normal distr
leptokurtic = more peaked than normal distr
what is the measure of central tendency
a single score that represents how participants tended to score on a variable
there are three main measures of central tendency:
- mean
- median
- mode
what symbol indicates mean
Mu
M - represents the average value of a variable in an entire POPULATION
μ - used to denote the sample mean. The sample mean is similar to the population mean but is calculated using a sample of data rather than the entire population
where is the mean pulled for a negatively skewed graph
left to right:
mean, median, mode
where is the mean pulled for a normally skewed graph
mean, median, and mode all occur in the center
where is the mean pulled for a positively skewed graph
left to right:
mode, median, mean
variability
refers to how similar or different the scores in a set of data are from each other
what are the four typical measures of variability
- range
- IQR
- variance
- stdev
how to measure IQR
- arrange scores from smallest to largest
- find the median (this is IQ2)
- find the median of scores below IQ2 (this is IQ1)
- find the median of scores above IQ2 (this is IQ3)
variance
average squared deviation of scores from the mean
stdev
square root of the variance; average non-squared deviation of scores from the mean
how to calculate degrees of freedom
sample - 1
sample size notation
sample data: n
population data: N
variance notation
sample data: s2
population data: σ2
stdev notation
sample data: s, SD
population data: σ
what are the percentages for a normal distribution
z-scores
-3 to -2 –> 2%
-2 to -1 –> 14%
-1 to 0 –> 34%
where does 68% of the data land
between -1 and 1
where does 96% of the data land
between -2 and 2
where does 100% of the data land
between -3 and 3
what does the sign of the z-score tell you
+ z-score = above mean
- z-score = below mean
what does the value of the z-score tell you
how many stdevs the score is away from the mean
what is the purpose of standardizing scores
allows us to compare how people scored on different variables that could have been originally measured in different raw units
what is probability
the likelihood that an event will occur ranging from 0 to 1
what is the common zone
the area of the distribution where 95% of scores are found when a phenomenon is normally distributed
what is the rare zone
the area of the distribution where only 5% of scores are found when a phenomenon is normally distributed
p < 0.05
it is uncommon to find someone that lands in the rare zone
- probability of finding someone who lands in the rare zone is less than 5%
- probability of landing in the rare zone is notated as p < 0.05 (p-value less than 0.05)
p > 0.05
it is common to find someone that lands in the common zone
- probability of finding someone who lands in the rare zone is greater than 5%
- probability of landing in the rare zone is notated as p > 0.05 (p-value greater than 0.05)
representative sample
participants have the same attributes as those that exist in the population and in approximately the same proportions, improving generalizability
what are the methods for obtaining a representative sample
- random sample
- large sample size
what is a sampling distribution
a frequency distribution created by calculating all the possible sample means (or another type of statistic) that could be obtained by randomly sampling from a given population
what are some practical issues with sampling
- self-selection bias (occurs when not everyone who is asked to participate agrees to do so)
- sampling error (discrepancy due to random factors between a sample statistic and the population parameter being estimated)
central limit theorem (CLT)
describes characteristics of a sampling distribution of means when the sample size is large and every possible sample is obtained
for CLT, what does it mean if n is large
if n is greater than or equal to 30, the mean of the sampling distribution of means is equal to the population mean
for CLT, what does it mean if
standard error of the mean
measures the precision of the sample mean to the population
how dos the standard error of the mean change based on the shape of the population?
it doesn’t, it will be normally distributed no matter the shape.
what is standard error
the STDEV of a sample distribution.
what is a point estimate
a single value that is used to estimate a given population (ex. a sample mean is a point estimate of a population mean)
what are interval estimates
a range of values around a point estimate within which a population parameter is likely to exist
- commonly, this is the 95% confidence interval
what is the 95% confidence interval
if we were to take 100 different samples and compute a 95% confidence interval for each sample, then approximately 95 of the 100 confidence intervals will contain the true mean value (μ)
+/- 1.96 –> these are the standardized values that cover a range of 95% of means around the sample mean
involves a lower and an upper bound
what is the difference between a single sample z-test and a single sample t-test
A z-test is used to test a Null Hypothesis if the population variance is known, or if the sample size is larger than 30, for an unknown population variance. A t-test is used when the sample size is less than 30 and the population variance is unknown.
what is hypothesis testing
the procedure for testing a hypothesis about a POPULATION using only SAMPLE data
what is a single sample z-test
used to compare a single sample to a population when a population has a known population mean and a known population stdev
two tailed vs. one tailed hypothesis
two-tailed population:
hypothesis does not state a particular direction of the effect of the independent variable on the dependent variable
one-tailed population:
hypothesis states a predicted direction of the effect of the independent variable on the dependent variable
how do you check for assumptions for the single sample z-test
use the assumptions table:
- column one = attribute
- column two = assumption
- column three = robustness (robust to violations. ex. even if you don’t have a truly random sample you could still perform the test but you wouldn’t be able to generalize your results)
null hypothesis
(H0) states the expected results if the IV had no effect on the dependent variable (ex. the population mean)
alternative hypothesis
(H1) states the expected result if the IV HAS an effect on the DV (ex. does not equal population mean)
are hypothesis about populations or samples
populations
what are decision errors
type I and type II errors
type I error
the researcher concluding that there is an effect or a relationship between variables when in reality there is not (the null hyp. is true)
type II error
the researcher concluding that there is not an effect or a relationship between variables when in reality there is (the null is false)
what is a decision rule
is deciding how willing you are to make a type I error
how to set a decision rule
- construct sampling distribution of means that represent all possible means that could be obtained of a certain sample size (if the null is true)
- set how willing you are to make a type I error (the convention used is the alpha (a) value)
how to interpret the data in relation to the hypothesis test
- if the calculation of the test statistic fall in the rare zone (p < 0.05), reject the null hypothesis, this means the results are statistically significant
- if the results land in the common zone (p > 0.05), fail to reject the null hypothesis, this means that the results are not significant (theres still a 5% chance you are wrong)
what does it mean when a = 0.05
that means that the rare zone on the sampling distribution of means is made up of the 5% most extreme sample means that are possible to obtain when the null hypothesis is true
one tailed = 5% is on one side
two tailed = 2.5% is on one side and 2.5% is on the other side
how do you interpret the direction of the effect
why do we calculate effect size
- statistical significance does not tell you the practical limitations of the study’s results
- effect size is a way of measuring the size of the effect of the explanatory variable of the DV
- calculate cohnen’s d
what do cohnen’s d values correspond to
when cohnen’s d = …
0.00, the size of the effect is none
0.20, the size of the effect is small
0.50, the size of the effect is medium
0.80 (or greater), the size of the effect is large
what are the six steps of hypothesis testing
- test (pick the right statistical test)
- assumptions (check the assumptions table to make sure you can run the test)
- hypotheses (list the null and alternative hypotheses)
- decision rule (construct sampling distribution, specify alpha, and determine the critical vales)
- calculation (calculate the test statistic)
- interpretation and effect size (interpret the results and report the effect size)