Module 1 Flashcards

1
Q

A psychological theory says that individual differences in one variable BLANK can be predicted from or causally explained by another variable

A
  • Dependent variable
  • Independent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Other names for independent variable

A

Predictor, covariate, explanatory variable, exogenous variable, X

(these terms are not perfectly synonymous, depending on context, but they are essentially
interchangeable with respect to how they are included in statistical models)

Experimentally manipulated/naturally occurring ie. country people come from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Other names for dependent variable:

A

Outcome, criterion, response variable, endogenous variable, Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

All statistical models are fundamentally ___________

A

descriptive,

in that they describe the nature of a
dependent variable as a function of one more independent variables or covariates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Models commonly used for two things

A

Description

Causal Explanation

Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

models are also commonly used for

A

causal explanation:
The model represents the process(es) by which differences in independent variables influence differences in a dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

prediction

A

meaning that observed data is used to develop
a model for how independent variables are related to dependent variables, and then that model is used to predict dependent variable scores in future data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the primary purpose of machine learning is _____

A

prediction

For example, a social media company
might use data about a person to predict whether that person is likely to click on an ad for a
product.
This prediction is based on a statistical model developed using data from people who have
already clicked on the ad.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How psychologists misapply the word predict

A

Yet, psychologists often use language about “prediction” when presenting statistical models that are mainly meant to describe or explain the association between an independent and a dependent
variable.

For example, a researcher might report that a personality trait “predicts” whether adults suffer
from sleep disturbances.
But this “prediction” is likely meant to explain why certain people are pre-disposed to experience sleep disturbances,
and the statistical model is not necessarily going to be applied to future data to determine the
chance that a given person has a sleep disturbance.

But true statistical prediction is not concerned with “why”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The population of interest

A

population:
Definition 1: The set of all entities (e.g., people, animals, cities, etc) for which a theory is
intended to apply.

Definition 2: The set of all entities to which a research study generalizes.

Definition 3: The natural (psychological) process that created the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. The sampling scheme

sample definition

A

finite subset of entities (or observations) drawn from a particular population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

GRE predictor confusion

A

GRE is supposed to predict whether a student will successfully complete grad school - GRE scores predictor - success dependent
- Not about causal mechanism
Good GRE score isn’t going to cause you to have a good PhD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. Define operational variables: How are we actually going to observe or measure the conceptual
    variables?
A

Operational variable = conceptual variable + measurement error

Often, independent variables are assumed to be measured without error.
This assumption holds in experimental studies, where participants are assigned to a particular
treatment or control group. Group membership, the independent variable, is known for all
participants (regardless of whether random assignment was used).
But in a lot of psychological research, both independent and dependent variables are
characterized by measurement error. If ignored, measurement error introduces statistical bias in
model estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 3 major features of study design:

A

population

sample

define operational variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Continuous variables

A

have a scale with an infinite number of possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discrete variables

A

are categorical; they have a scale with a finite number of possible values.

in psychology - measure many continuous variables on a likert scale which is categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Nominal variables

A

have a scale whose values have arbitrary numerical meaning.

It only makes sense to say whether two observations are equal, but we cannot say that one nominal value is “greater than” or “less than” another.

For example, membership in a treatment or control group might be numerically coded so that 0 = control and 1 = treated, but the specific numerical values chosen are arbitrary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Ordinal variables

A

have a scale such that lower values are meaningfully defined to be less than
higher values, but we don’t necessarily know by how much a lower value is less than a higher
value.

a Likert-type item response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

frequency distribution

A

is a representation (either tabular or graphic) of the observed values of a
variable along with the frequency, or number of observations, occurring with each value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Relative frequency

A

is the proportion (or percent = proportion × 100) of observations at a given value of a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

histogram

A

is a graph of the frequencies observed at each of several intervals (or bins) along the continuous scale of the variable

Histogram provides frequency within each bin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Distributions of continuous variables are characterized by their

A

centre, spread, and shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

outlier

A

is an unusual observation that falls well outside of the range of most of the other observations in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Outliers can occur because of…

A

sampling error (the outlying observation comes from a different
population than the other observations),

researcher error (e.g., a data entry mistake was made),

participant error (e.g., the participant did not follow the researcher’s instructions),

or just random chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Exclude outlier with which types of error

A

researcher or participant error

26
Q

what is the spread

A

the extent of variability or individual differences in the variable

E.g. Scores are clustered from blank to blank but a notable number of people have lower scores

26
Q

Unimodal

A

one general peak

27
Q

sensitivity analysis

A

do analysis with outlier and without and report on both sets of data

28
Q

descriptive statistics

A

describe the centre, shape, and spread of a distribution using numerical information

29
Q

parameter

A

numerical characteristic of a population

30
Q

statistic

A

value calculated from the sample data that estimates a parameter

31
Q

Which central tendency measure is higher than the other when asymmetric

A

mean gets pulled in the direction of the skewness

32
Q

3 measures of spread or variability

A
  1. variance
  2. standard deviation
  3. interquartile range
33
Q

the mean is more affected by blank then blank

A

the mean is more affected by outliers than the median

34
Q

standard deviation

A

represents the average amount that a score differs from the mean of a distribution

35
Q

Calculate sample SD

A
  1. Deviations from the mean - observed score subtract the mean
  2. Square the answers
  3. Mean of the squared deviations

SQUARE ROOT OF THE VARIANCE

sample SD is an estimate of the population SD

36
Q

Sample variance

A

Mean of the squared deviations

estimate of the population variance

37
Q

Why do we divide the sample SD by N-1 and not N

A

leads to a biased estimate of the population standard deviation, dividing by n-1 corrects this bias

when we calculate the sample mean we ‘use up’ once piece of information

degrees of freedom associated with a univariate standard deviation

38
Q

Interquartile range

A

IQR is defined as Q3-Q1

39
Q

Range of a distribution

A

difference btw max and min

40
Q

Boxplot

top of box
bottom of box
hard line
whiskers

A

Q3

Q1

median Q2

whiskers max and min

outliers show up as dots

41
Q

Boxplot is negatively skewed if

A

distance from the median to Q1 is slightly greater than the distance to Q3

42
Q

probability density functions

A

give the probability of observing a particular value of a variable

To get the hypothetical probability distribution

43
Q

Normal distribution

A

Normal is a population distribution

do NOT describe a sample as normal

would make sense that the sample was DRAWN from a normal population distribution

Normal distribution is a function of the population mean and SD

44
Q

the normal distribution is a BLANK population distribution

A

HYPOTHETICAL population distribution

doesnt make sense to refer to a sample as normal

describe as consistent with a normal distribution

45
Q

Mean is known as the blank blank of a population distribution

A

first moment

46
Q

what is the first moment of a population distribution

A

mean

47
Q

the variance is known as the blank blank blank of a population distribution

A

second central moment

48
Q

what is the second central moment of a population distribution

A

variance

49
Q

The mean and variance are both a BLANK

A

average

  • variance is the average of the squared deviations from the mean
50
Q

why is the variance called a central moment

A

deviating from the mean

51
Q

what is the third central moment of a population distribution

A

skewness

52
Q

third central moment

A

skewness

53
Q

skewness

A

extent to which the distribution is asymmetric

54
Q

skewness formula

A

the numerator is the sum of cubed deviations from the mean

54
Q

What is the fourth central moment

A

kurtosis

55
Q

kurtosis

A

extent to which the distribution shape is flat (negative kurtosis) or has a steep peak with thick tails (positive kurtosis)

55
Q

kurtosis

A

fourth moment of the population distribution

56
Q

kurtosis formula

A

raised to the 4th power

57
Q

is it worse to have non-zero kurtosis or skewness

A

having non-zero kurtosis is more problematic than skewness (ie having kurtosis is worse)

distributions with strong skewness also have nonzero kurtosis