Module 1 Flashcards by Emma R

A psychological theory says that individual differences in one variable BLANK can be predicted from or causally explained by another variable

Dependent variable
Independent variable

How well did you know this?

Not at all

Perfectly

Other names for independent variable

Predictor, covariate, explanatory variable, exogenous variable, X

(these terms are not perfectly synonymous, depending on context, but they are essentially
interchangeable with respect to how they are included in statistical models)

Experimentally manipulated/naturally occurring ie. country people come from

How well did you know this?

Not at all

Perfectly

Other names for dependent variable:

Outcome, criterion, response variable, endogenous variable, Y

How well did you know this?

Not at all

Perfectly

All statistical models are fundamentally ___________

descriptive,

in that they describe the nature of a
dependent variable as a function of one more independent variables or covariates.

How well did you know this?

Not at all

Perfectly

Models commonly used for two things

Description

Causal Explanation

Prediction

How well did you know this?

Not at all

Perfectly

models are also commonly used for

causal explanation:
The model represents the process(es) by which differences in independent variables influence differences in a dependent variable.

How well did you know this?

Not at all

Perfectly

prediction

meaning that observed data is used to develop
a model for how independent variables are related to dependent variables, and then that model is used to predict dependent variable scores in future data.

How well did you know this?

Not at all

Perfectly

the primary purpose of machine learning is _____

prediction

For example, a social media company
might use data about a person to predict whether that person is likely to click on an ad for a
product.
This prediction is based on a statistical model developed using data from people who have
already clicked on the ad.

How well did you know this?

Not at all

Perfectly

How psychologists misapply the word predict

Yet, psychologists often use language about “prediction” when presenting statistical models that are mainly meant to describe or explain the association between an independent and a dependent
variable.

For example, a researcher might report that a personality trait “predicts” whether adults suffer
from sleep disturbances.
But this “prediction” is likely meant to explain why certain people are pre-disposed to experience sleep disturbances,
and the statistical model is not necessarily going to be applied to future data to determine the
chance that a given person has a sleep disturbance.

But true statistical prediction is not concerned with “why”.

How well did you know this?

Not at all

Perfectly

The population of interest

population:
Definition 1: The set of all entities (e.g., people, animals, cities, etc) for which a theory is
intended to apply.

Definition 2: The set of all entities to which a research study generalizes.

Definition 3: The natural (psychological) process that created the observed data.

How well did you know this?

Not at all

Perfectly

The sampling scheme

sample definition

finite subset of entities (or observations) drawn from a particular population.

How well did you know this?

Not at all

Perfectly

GRE predictor confusion

GRE is supposed to predict whether a student will successfully complete grad school - GRE scores predictor - success dependent
- Not about causal mechanism
Good GRE score isn’t going to cause you to have a good PhD

How well did you know this?

Not at all

Perfectly

Define operational variables: How are we actually going to observe or measure the conceptual
variables?

Operational variable = conceptual variable + measurement error

Often, independent variables are assumed to be measured without error.
This assumption holds in experimental studies, where participants are assigned to a particular
treatment or control group. Group membership, the independent variable, is known for all
participants (regardless of whether random assignment was used).
But in a lot of psychological research, both independent and dependent variables are
characterized by measurement error. If ignored, measurement error introduces statistical bias in
model estimates.

How well did you know this?

Not at all

Perfectly

What are the 3 major features of study design:

population

sample

define operational variables

How well did you know this?

Not at all

Perfectly

Continuous variables

have a scale with an infinite number of possible values.

How well did you know this?

Not at all

Perfectly

Discrete variables

are categorical; they have a scale with a finite number of possible values.

in psychology - measure many continuous variables on a likert scale which is categorical

How well did you know this?

Not at all

Perfectly

Nominal variables

have a scale whose values have arbitrary numerical meaning.

It only makes sense to say whether two observations are equal, but we cannot say that one nominal value is “greater than” or “less than” another.

For example, membership in a treatment or control group might be numerically coded so that 0 = control and 1 = treated, but the specific numerical values chosen are arbitrary.

How well did you know this?

Not at all

Perfectly

Ordinal variables

have a scale such that lower values are meaningfully defined to be less than
higher values, but we don’t necessarily know by how much a lower value is less than a higher
value.

a Likert-type item response

How well did you know this?

Not at all

Perfectly

frequency distribution

is a representation (either tabular or graphic) of the observed values of a
variable along with the frequency, or number of observations, occurring with each value.

How well did you know this?

Not at all

Perfectly

Relative frequency

is the proportion (or percent = proportion × 100) of observations at a given value of a variable.

How well did you know this?

Not at all

Perfectly

histogram

is a graph of the frequencies observed at each of several intervals (or bins) along the continuous scale of the variable

Histogram provides frequency within each bin

How well did you know this?

Not at all

Perfectly

Distributions of continuous variables are characterized by their

centre, spread, and shape

How well did you know this?

Not at all

Perfectly

outlier

is an unusual observation that falls well outside of the range of most of the other observations in the distribution

How well did you know this?

Not at all

Perfectly

Outliers can occur because of…

sampling error (the outlying observation comes from a different
population than the other observations),

researcher error (e.g., a data entry mistake was made),

participant error (e.g., the participant did not follow the researcher’s instructions),

or just random chance.

How well did you know this?

Not at all

Perfectly

Exclude outlier with which types of error

researcher or participant error

what is the spread

the extent of variability or individual differences in the variable E.g. Scores are clustered from blank to blank but a notable number of people have lower scores

Unimodal

one general peak

sensitivity analysis

do analysis with outlier and without and report on both sets of data

descriptive statistics

describe the centre, shape, and spread of a distribution using numerical information

parameter

numerical characteristic of a population

statistic

value calculated from the sample data that estimates a parameter

Which central tendency measure is higher than the other when asymmetric

mean gets pulled in the direction of the skewness

3 measures of spread or variability

1. variance 2. standard deviation 3. interquartile range

the mean is more affected by blank then blank

the mean is more affected by outliers than the median

standard deviation

represents the average amount that a score differs from the mean of a distribution

Calculate sample SD

1. Deviations from the mean - observed score subtract the mean 2. Square the answers 3. Mean of the squared deviations SQUARE ROOT OF THE VARIANCE sample SD is an estimate of the population SD

Sample variance

Mean of the squared deviations estimate of the population variance

Why do we divide the sample SD by N-1 and not N

leads to a biased estimate of the population standard deviation, dividing by n-1 corrects this bias when we calculate the sample mean we 'use up' once piece of information degrees of freedom associated with a univariate standard deviation

Interquartile range

IQR is defined as Q3-Q1

Range of a distribution

difference btw max and min

Boxplot top of box bottom of box hard line whiskers

Q3 Q1 median Q2 whiskers max and min outliers show up as dots

Boxplot is negatively skewed if

distance from the median to Q1 is slightly greater than the distance to Q3

probability density functions

give the probability of observing a particular value of a variable To get the hypothetical probability distribution

Normal distribution

Normal is a population distribution do NOT describe a sample as normal would make sense that the sample was DRAWN from a normal population distribution Normal distribution is a function of the population mean and SD

the normal distribution is a BLANK population distribution

HYPOTHETICAL population distribution doesnt make sense to refer to a sample as normal describe as consistent with a normal distribution

Mean is known as the blank blank of a population distribution

first moment

what is the first moment of a population distribution

mean

the variance is known as the blank blank blank of a population distribution

second central moment

what is the second central moment of a population distribution

variance

The mean and variance are both a BLANK

average - variance is the average of the squared deviations from the mean

why is the variance called a central moment

deviating from the mean

what is the third central moment of a population distribution

skewness

third central moment

skewness

extent to which the distribution is asymmetric

skewness formula

the numerator is the sum of cubed deviations from the mean

What is the fourth central moment

kurtosis

extent to which the distribution shape is flat (negative kurtosis) or has a steep peak with thick tails (positive kurtosis)

kurtosis

fourth moment of the population distribution

kurtosis formula

raised to the 4th power

is it worse to have non-zero kurtosis or skewness

having non-zero kurtosis is more problematic than skewness (ie having kurtosis is worse) distributions with strong skewness also have nonzero kurtosis

Module 1 Flashcards

(60 cards)