Exam 1 Flashcards
independent variable
variable you manipulate
dependent variable
variable you measure that is dependent on the independent variable
confounding variable
variable you try to control or randomize away
discrete variable
variables that can only take on specific values (whole numbers) (“How many letters are in your name?”)
continuous variable
variables that can take on a full range of values (decimals) (“How tall are you?”)
measurement scales
nominal, ordinal, interval, ratio. Each adds another attribute to the measurement process
nominal
variables that are categories or names. Numbers may be used to code for participants. (Ex: 1 for females, 2 for males)
ordinal
ranks or order (Ex: birth order, class rank)
interval
variables with equal intervals, no meaningful zero (Ex: shoe size, Fehrenheit temperature)
ratio
adds a requirement of a meaningful zero (Ex: height, weight, # of credits)
experiments
studies in which participants are are RANDOMLY ASSIGNED to a condition or level of one or more independent variables
descriptive statistics
organize, summarize, and communicate large amounts of numerical observations. (Ex: 40% of pets sold at a pet store were dogs)
inferential statistics
use sample data to draw conclusions about larger population. Gathering sample data is associated w/ it
random assignment
every participant in a study has an equal chance of being assigned to any of the groups or conditions in a study. Helps experiments achieve equality between groups. Distinctive feature of a scientific study. Used whenever possible
true experiments vs. correlational designs
correlational designs look for associations that naturally occur, NO MANIPULATION. True experiments ACTIVELY MANIPULATE something. A true experiment is easier to interpret the results, uses random assignment, and controls for confounding variables. Easier to make causal statements
quasi-experiments
use intact (pre-existing) groups, so no random assignment, but still actively manipulate something. Used when random assignment is unethical. (Ex: can’t make a group paralyzed, but can look at people already paralyzed) Possible confounding variables.
operational definition
describes the operations or procedures used to measure or manipulate a variable (Ex: use IQ tests to define intelligence)
histograms
look like bar graphs but plot quantitative data. Ranges of numerical data. Ex: x-axis = number of trees, y-axis = tree hieght
bar graphs
graphs categorical data. Order doesn’t matter. Ex: favorite fruit. x-axis = number of people, y-axis = types of fruit
frequency table
shows how frequent each value occurred. Values are listed in one column, and the numbers of individuals with scores at that value are listed in the second column. Ex: number of candy bars students had after halloween. One column = # of candy bars, Other column = # of students
grouped frequency table
shows frequencies within an interval. For a lot of responses. Rather than each response having its own row, creates bins
frequency histogram
looks like a bar graph, but y-axis is frequency. Ex: x-axis = frequency, y-axis = weight of Jessica
frequency polygon
forms the shape of a distribution. Connects the points of a frequency histogram with a line, forming a shape. Ex: x-axis = frequency, y-axis = weight of Jessica
positive skew
tail to the right. May represent floor effects. Mean is on the right side of the peak
negative skew
tail to the left. May represent ceiling effects. (Ex: test scores - can’t get higher than 100%). Mean is on the left side of the peak
stem-and-leaf plots
let us view two groups in a single graph easier than grouped frequency distributions. The stem is the first digit, the leaves are the second digits and reflect how many there are of individual scores
the three types of chartjunk
moiré vibrations, grids, ducks
moiré vibrations
unintentional optical art on a graph. Gives the impression of movement
grids
grid is behind the chart. Almost looks like graphing paper
ducks
graphics>data on a graph. Obscures the message. Makes reading and interpreting graphs difficult
mode
central point is the most frequent score in a sample. The only option for nominal data
median
central point is where 50% of the scores are above and 50% are below. Less sensitive to outliers. Line up scores in ascending order. W/ an odd # of scores, there is an actual middle number. With an even #, (sum of the scores)/(total # of scores)
mean
M for a sample, u for a population. Central point is the average of a group of scores. Avoid using when there are outliers
measures of variability
range, interquartile range, variance
variance
the average of the squared deviations from the mean. # describes how far a distribution varies around the mean
interquartile range
measure of the distance between the 1st and 3rd quartiles. 1st quartile is the 25th percentile of the data set, 3rd quartile is the 75th percentile of the data set
What is an advantage of using the interquartile range instead of the range?
The interquartile range may be better to use than the range if there is an outlier in the data set
common biased samples
testimonials, volunteer samples using one person
central tendency to use for ordinal measurement?
median
central tendency to use if there are extreme scores in data
median
central tendency with multiple values
mode (unimodal, bimodal, multimodal)
central tendency that relates to area under the curve
median
value driven central tendency measurement
mean
central tendency measurement that will always yield a zero when deviations are added
mean
variance
the mean of the squared deviations. ∑(X-M)^2/N. Shows how much a set of numbers are spread out from their mean. Population: σ^2. Sample: SD^2, s^2, MS
standard deviations
a measure of how spread out numbers are. The square root of the variance. √∑(X-M)^2/N. Population: σ. Sample: SD
do we prefer variance or standard deviation?
we often prefer standard deviation because we can understand it at a glance. The variance is from squared deviations so too large. The standard deviation is the √ of that
z score
the number of standard deviations a score is from the mean. Give us the ability to convert any variable to a standard distribution, allowing us to make comparisons among variables. z= M-u/σ
z distribution
z=0 is mean. z=1 means 1 standard deviation above the mean. z=-1 means 1 standard deviation below the mean.
transforming z scores to raw scores
Step 1: multiply the z score by the standard deviation of the population. Step 2: add the mean of the population to this product.
x=zσ +u
u
mean of population
σ
standard deviation of a population
σm
standard error
N
number in a sample
M
mean of a sample
pie chart
A graph in the shape of a circle, with a slice for every level of the independent variable. The size of each slice represents the proportion of each level
scatterplot
A graph that depicts the relation between two scale variables. The values of each variable are marked along the two axes, and a mark is made to indicate the intersection of the two scores for each participant. The mark is above the participant’s score on the x-axis and across from the score on the y-axis.
When do you use the mode as the best choice for central tendency?
- When one particular score dominates a distribution
- When the distribution is bimodal or multimodal
- When the data are nominal
For a data set that has been skewed due to outliers, what measure of central tendency is most accurate and should be reported?
median
With measures of central tendency, if the distribution is symmetrical or unimodal then…
you can use either the mean, median, or mode (all = the same)
Parameter
a characteristic of a population
Statistic
a characteristic of a sample
type 1 error
Occurs when a researcher rejects a null hypothesis when it is actually true. Considered MORE SERIOUS!
type 2 error
Occurs when a researcher accepts a null hypothesis that is actually false.
What is an example of a type I error?
Accepts the premise that there is a difference when actually there is no difference between groups. (rejecting a null hypothesis when it is actually true)
What is an example of a type 2 error?
Accepts the premise that there is no difference between the groups when a difference actually exists
How to calculate the variance
- Subtract the mean from every score
- Square every deviation from the mean
- Sum all of the squared deviations
- Divide the sum of squares by the total number in the sample
How to calculate a z-score
The mean of the sample (M) minus the mean of the population (u), divided by the standard deviation (σ). M-u/σ
control group
group in an experiment or study that does not receive treatment by the researchers and is then used as a benchmark to measure how the other tested subjects do.
illusory correlation
is the phenomenon of perceiving a relationship between variables (typically people, events, or behaviors) even when no such relationship exists
central limit theorem
as the size of the distribution of means increases, it assumes a normal shape. A distribution made up of the means of many samples approximates a normal curve, even if the underlying population is not normally distributed
distribution of means
a distribution composed of many means that are calculated from all possible samples of a given size, all taken from the same population
% of scores that fall between the mean and a z-score of 1?
34% (same with the mean and a z-score of -1)
% of scores that fall between a z-score of 1 and 2?
14% (same with between z=-1 and z=-2)
% of scores that fall between z-score of 2 and 3?
2% (same with between z=-2 and z=-3)
% of scores that fall within 1 z-score?
68%
% of scores that fall within 2 z-scores?
96%
standard error
the standard deviation of a distribution of means. σm
how to calculate standard error
σm = σ/√N
critical value
a test statistic value beyond in which the null hypothesis can be rejected (cutoff). The most extreme 5% on the curve (2.5% on either end) (1.96)
critical region
the area in the tails in which the null hypothesis can be rejected
p level
the probability used to determine the critical values (cutoffs). Often called alpha
um
means of all possible samples of a given size from a particular population of individual scores. um = u
standardization
a way to convert individual scores from different normal distributions to a shared normal distribution with a known mean, SD, and percentiles
robust hypothesis tests
produces fairly accurate results even when the data suggest that the population might not meet some of the assumptions
assumptions
the characteristics we ideally require the population from which we are sampling
1: scale data
2: random sampling
3: normally distributed
parametric tests
inferential statistical test based on assumptions about a population
nonparametric tests
inferential statistical test not based on assumptions about that population
6 steps of hypothesis testing
1) identify populations, distribution, assumptions, and tests;
2) state the hypotheses;
3) formulate the characteristics of the comparison distribution;
4) identify critical values or cutoffs
5) calculate the test statistic; reject or fail to reject the null hypothesis
find percentile rank
find z-score on table, if - use # in the tail
When p < 0.05, the findings could be:
statistically significant
One of the best ways to figure out if “dirty data” are displaying the genuine pattern of results or skewing results due to missing or misleading data or outliers is to:
try to replicate your findings
floor effect
prevents a variable from taking values lower than a certain number. (positively skewed)
ceiling effect
prevents a variable from taking values above a certain number (negatively skewed)
effect associated with positive skew
floor effects (prevents a variable from taking values lower than a certain number)
effect associated with negative skew
ceiling effects (prevents a variable from taking values above a certain number)
skewed distribution
one of the tails is pulled away from the center
grouped frequency table vs. stem-and-leaf
stem-and-leaf plots let us view two groups in a single graph easier than grouped frequency distributions
random sample
every member of the population has an equal chance of being selected into the study
expected relative-frequency probability
the likelihood of an event occurring based on the actual outcome of many, many trials
false face validity lie
method seems to represent what it says, but does not actually (ex: basing class enjoyment off of smiles)
biased scale lie
scaling to skew the results (ex: rank 0 if hate, 2 if uncertain, 3 if like, 4 if love, 5 if obsessed. Not many ranks about disliking)
interpolation lie
assumes there’s a linear relationship between 2 data pants. Makes assumptions about the variables IN BETWEEN
extrapolation lie
assumes knowledge outside of the study., and that all data follows the trend. OUTSIDE the data
inaccurate values lie
uses scaling to distort portions of the data
outright lie
making up data
sneaky sample lie
when participants are preselected or self-selected to provide data (ex: only like A students if they like the class)
random sampling
every member of a population has an equal chance of being selected for the study
why is random sampling often not used?
expensive, unpractical. Almost impossible to get access to every member of a population