Module 1 Flashcards
A psychological theory says that individual differences in one variable BLANK can be predicted from or causally explained by another variable
- Dependent variable
- Independent variable
Other names for independent variable
Predictor, covariate, explanatory variable, exogenous variable, X
(these terms are not perfectly synonymous, depending on context, but they are essentially
interchangeable with respect to how they are included in statistical models)
Experimentally manipulated/naturally occurring ie. country people come from
Other names for dependent variable:
Outcome, criterion, response variable, endogenous variable, Y
All statistical models are fundamentally ___________
descriptive,
in that they describe the nature of a
dependent variable as a function of one more independent variables or covariates.
Models commonly used for two things
Description
Causal Explanation
Prediction
models are also commonly used for
causal explanation:
The model represents the process(es) by which differences in independent variables influence differences in a dependent variable.
prediction
meaning that observed data is used to develop
a model for how independent variables are related to dependent variables, and then that model is used to predict dependent variable scores in future data.
the primary purpose of machine learning is _____
prediction
For example, a social media company
might use data about a person to predict whether that person is likely to click on an ad for a
product.
This prediction is based on a statistical model developed using data from people who have
already clicked on the ad.
How psychologists misapply the word predict
Yet, psychologists often use language about “prediction” when presenting statistical models that are mainly meant to describe or explain the association between an independent and a dependent
variable.
For example, a researcher might report that a personality trait “predicts” whether adults suffer
from sleep disturbances.
But this “prediction” is likely meant to explain why certain people are pre-disposed to experience sleep disturbances,
and the statistical model is not necessarily going to be applied to future data to determine the
chance that a given person has a sleep disturbance.
But true statistical prediction is not concerned with “why”.
The population of interest
population:
Definition 1: The set of all entities (e.g., people, animals, cities, etc) for which a theory is
intended to apply.
Definition 2: The set of all entities to which a research study generalizes.
Definition 3: The natural (psychological) process that created the observed data.
- The sampling scheme
sample definition
finite subset of entities (or observations) drawn from a particular population.
GRE predictor confusion
GRE is supposed to predict whether a student will successfully complete grad school - GRE scores predictor - success dependent
- Not about causal mechanism
Good GRE score isn’t going to cause you to have a good PhD
- Define operational variables: How are we actually going to observe or measure the conceptual
variables?
Operational variable = conceptual variable + measurement error
Often, independent variables are assumed to be measured without error.
This assumption holds in experimental studies, where participants are assigned to a particular
treatment or control group. Group membership, the independent variable, is known for all
participants (regardless of whether random assignment was used).
But in a lot of psychological research, both independent and dependent variables are
characterized by measurement error. If ignored, measurement error introduces statistical bias in
model estimates.
What are the 3 major features of study design:
population
sample
define operational variables
Continuous variables
have a scale with an infinite number of possible values.
Discrete variables
are categorical; they have a scale with a finite number of possible values.
in psychology - measure many continuous variables on a likert scale which is categorical
Nominal variables
have a scale whose values have arbitrary numerical meaning.
It only makes sense to say whether two observations are equal, but we cannot say that one nominal value is “greater than” or “less than” another.
For example, membership in a treatment or control group might be numerically coded so that 0 = control and 1 = treated, but the specific numerical values chosen are arbitrary.
Ordinal variables
have a scale such that lower values are meaningfully defined to be less than
higher values, but we don’t necessarily know by how much a lower value is less than a higher
value.
a Likert-type item response
frequency distribution
is a representation (either tabular or graphic) of the observed values of a
variable along with the frequency, or number of observations, occurring with each value.
Relative frequency
is the proportion (or percent = proportion × 100) of observations at a given value of a variable.
histogram
is a graph of the frequencies observed at each of several intervals (or bins) along the continuous scale of the variable
Histogram provides frequency within each bin
Distributions of continuous variables are characterized by their
centre, spread, and shape
outlier
is an unusual observation that falls well outside of the range of most of the other observations in the distribution
Outliers can occur because of…
sampling error (the outlying observation comes from a different
population than the other observations),
researcher error (e.g., a data entry mistake was made),
participant error (e.g., the participant did not follow the researcher’s instructions),
or just random chance.