Statistics Flashcards

1
Q

dichotomous

A

refers to nominal data that only contains two categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

continuous

A

refers to interval/ratio data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

nominal data

A
  • categories
  • no order or direction

example: male/female, democrat/republican, ethnicity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ordinal data

A
  • categories
  • ordered, ranking, scaled, etc.

example: low income/medium income/high income, agree/somewhat agree/disagree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interval data

A
  • differences between measurements but has no true zero (something can have a score of less than zero)

example: temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ratio data

A
  • differences between measurements but true zero does exists (there can’t be a score of less than zero)

example: height, weight, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

mean

A
  • average
  • calculated by adding all the scores together and dividing by the number of scores
  • typically the best measure of central tendency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

median

A
  • the score at which half the people score below and half the people score above
  • considered the best measure of central tendency when the data is skewed or includes extreme scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

mode

A
  • the score that occurs the most
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

variance

A

the standard deviation squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

range

A

the difference between the highest and lowest value obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the x-axis of a graph represents the…

A

categories (nominal or ordinal data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

the y-axis of a graph represents the…

A

frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

skewed distribution

A
  • the data is not equally distributed above and below the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

positive skew

A
  • there is a higher proportion of scores in the lower range of the values
  • graph is high on the left side and slants downwards towards the right side
  • mean is higher, mode is lower
    *think 0 for mode comes 1st and bc it is lower in graph it’s lowest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

negative skew

A
  • there is a higher proportion of scores in the higher range of values
  • graph is high on the right side and slants downwards on the left side
  • mean comes first, then mode. So mean is lower and mode is highest.

In a normal distribution mean media and model are all =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

raw scores

A
  • an individual’s score on a test
  • typically a percentage score
  • provides little information (don’t know whether it is good, bad, mediocre, etc.)
  • a percentage score is considered a criterion referenced score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

criterion referenced vs. norm referenced

A

criterion referenced - how well you know the material

norm referenced - how well you know the material compared to others in the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

percentile scores

A
  • the percentage of scores in the group that are less than that score
  • the higher your percentile rank, the better you did in comparison to others

example:
98th percentile = you score better than 98% of the group
5th percentile = you only score better than 5% of the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

z scores

A
  • mean of 0, standard deviation of 1
  • shape of a z-score distribution is always identical to the shape of the raw score distribution

example:
- a score of +2 = 2 standard deviations about the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

t scores

A
  • mean of 50, standard deviation of 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

IQ scores

A
  • mean of 100, standard deviations of 15
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

null hypothesis

A
  • there are no differences between the groups
  • the independent variable has NOT had an effect on the dependent variable
  • the goal is to be able to REJECT the null hypothesis (in other words - conclude that there IS differences between the groups)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

alternative hypothesis

A
  • there ARE differences between the groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

type 1 error

A
  • if the null hypothesis is rejected (i.e. the researcher declares that there IS differences and the independent variable DID have an effect), but it then later turns out that this was a mistake, this is a type 1 error
  • in other words - differences are found, but they do not actually exist
26
Q

alpha

A
  • the size of the rejection region

example:
- if the alpha is 0.5, the rejection region is 5%
- if the alpha is 0.1, the rejection region is 1%

*the size of the alpha is directly related to how likely it is you will make a type 1 error

27
Q

type 2 error

A
  • when the null hypothesis is accepted (i.e. the researcher declares that there is NOT a difference between groups and the independent variable DID NOT have an effect), but then it later turns out that this was a mistake, this is a type 2 error
  • in other words - differences are not found, but they actually do exist
28
Q

Statistical power

A
  • the ability to reject the null
  • if the null hypothesis is rejected (i.e. there is differences between the groups, the independent variable does have an effect), the decision is referred to as power (or significance)
  • affected by several factors including: homogenous populations (less variability facilitates effect detection)
29
Q

power is increased when…

A
  • Think of the “Four Cs” to easily remember factors that increase statistical power:
  1. Effect Size (C1): A larger effect size increases power. If the difference or relationship you’re looking for is substantial, it’s easier to detect.
  2. Sample Size (C2): A larger sample size enhances power. A bigger sample provides more information and reduces the impact of variability.
  3. Significance Level (C3): A higher significance level (e.g., using 0.01 instead of 0.05) can increase power, but it also decreases the risk of a Type I error. (0.01 vs 0.05)
  4. Consistency (C4): A reduction in variability or noise in the data increases power. This involves controlling extraneous factors that might introduce randomness. Homogenous populations also do this.

*if alpha increases, so does beta

30
Q

3 commonly asked questions in research

A
  • question of difference
  • question relationship/prediction
  • question of structure or fit
31
Q

how to select the appropriate test when the research is testing for difference

A

ONE DEPENDENT VARIABLE ONLY
nominal or ordinal data > parametric test (Chi-Square)
interval or ratio data > parametric test (t-test, ANOVA)

MORE THAN ONE DEPENDENT VARIABLE
interval or ratio data > MANOVA

**how to ID between NOIR variables

  1. Nominal Variables: These are categories with no specific order. Think of them as labels or names. Examples: Colors, Types of fruits, Gender.
  2. Ordinal Variables: You can rank them/order them, but you can’t say how much one is “more” than another. Examples: Education levels (High school, Bachelor’s, Master’s), Customer satisfaction ratings (1-star, 2-star, 3-star).
  3. Interval Variables: They have a specific order, and the differences between values are meaningful, but there is no true zero point. Examples: Temperature in Celsius (0°C doesn’t mean no temperature), IQ scores.
  4. Ratio Variables: These have a specific order, meaningful differences, and a true zero point. Examples: Age, Height, Weight.
32
Q

examples of levels of independent variables

A

Hint: ask yourself what groups are being compared?

gender = 1 independent variable, 2 levels (male/female)
treatment = 1 independent variable, 3 levels (CBT, psychodynamic, no treatment)

33
Q

independent vs. correlated groups

A

independent = if people are randomly assigned or the group is based on a pre-existing characteristic (ex - gender)

correlation = group members are measured at more than one point, group members are matched prior to their assignment to groups (ex - IQ, income), or there is an inherent relationship (ex - twins, siblings)

34
Q

one-way ANOVA

A
  • statistic of choice when more than two groups are being compared on only one independent variable
  • more preferable over a t-test in this situation, because multiple t-tests increases the likeliness of a type 1 error
35
Q

F ratio

A
  • F ratio = mean square between (MBW) over mean square within. (MW/IN). An F-ratio is a statistical measure used to compare the variances (variability) of two or more groups or sets of data. It helps determine if the differences between these groups are statistically significant. In simpler terms, it tells us whether the variations between groups are due to real differences or just random chance.
  • if the F ratio equals or approximates 1.0, there is no significance
  • if the F ratio is above 2.0, it is considered significant
36
Q

two-way anovas

A
  • when groups are being compared on TWO independent variables, you can either run two separate one-way anovas, or do a two-way anova
37
Q

correlations

A

statistics that depict relationships between variables

38
Q

regressions/analyses

A

statistics that are used to predict

  • in a multiple regression analysis that has a negative regression coefficient = predictor has an inverse relationship with the criterion
39
Q

correlation coefficients

A
  • describe the relationship between X (the predictor) and Y (the criterion) in terms of strength and direction (positive or negative)
  • range in value from -1.0 to +1.0
  • on a graph, the closer the data points are clustered, the stronger the correlation (and vice versa)
40
Q

coefficient of determination

A
  • calculated by squaring the correlation coefficient
  • represents the amount of variability in Y that is shared/explain/accounted for by X

example: 25% of variability in income (Y) is explained by education (X), which leaves 75% of the variability in income to be accounted for by other factors

41
Q

random selection increases ____________________
random assignment increases ____________________

A

external validity

internal validity

42
Q

Cohen d

A

Effect size that indicates how the means* of two groups differ in terms of SD units.

  • D= 0.60 indicates a medium effect.
43
Q

Multicollinearity

A

Occurs when scores on one or more explanatory variables are highly correlated with scores on one or more of other explanatory variables

44
Q

Point biseral

A

Appropriate correlation coefficient when one variable is a true dichotomy (yes/no) and the other is measured on a continuous scale (interval or ratio)

45
Q

Changing alpha from .01 to .05 has what effects on Type 1 & Type 2 error

A

TYPE 1: increases (when null is true, but is rejected)

TYPE 2: decreases (when null is false but is NOT rejected/is maintained)

46
Q

Orthogonal vs oblique factors (factor analysis)

A

Orthogonal= uncorrelated (think don’t care about specialty)
Oblique= correlated (but I do care about obliques)

47
Q

Internal validity of research is threatened by statistical regression when

A

Participants are chosen for inclusion because of their extreme scores on a pretest (low/high)

48
Q

Calculating the incremental validity of a new selection test is calculated by substracting what?

A

Base rate from the positive hit rate

The base rate (number of people hired without the predictor) and who obtained high scores on the measure of job performance (criterion). The positive hit rate is the proportion who were hired using the new selection test and who obtained high scores on job performance.

49
Q

Which scales of measuring allow you to conclude that the difference between scores 50 and 51 is equal to the difference between scores 90 and 91 on a test?

A

Interval and ratio (both have equal intervals between adjacent points in a scale)

50
Q

Post hoc test is helpful when?

A

When there are 3 or more levels to the independent variable and we have a statistically significant finding and want to figure out which group is it.

51
Q

Graph types for nominal, interval and ratio data

A

Nominal: bar graph (gender, eye colour)
Interval/ratio: histograms, line graphs (frequency polygons)

52
Q

Cohen’s kappa coefficient

A

Assesses Inter-rater reliability.

Assesses consistency of ratings assigned by two raters when ratings represent a nominal scale (discrete; yes/no)

53
Q

Probability sampling (simple random, stratified and systematic)

A

Simple: drawing names from a hat, all have equal chance
Stratified: sorting candles by colour and then picking from each group (strata)
Systematic: like choosing every 5th person from a list (there is a system to choosing)

54
Q

Non probability sampling (purposive, convenience and snowball)

A

Purposive (judgemental) sampling: lose choosing friends to help bc you know their strengths
Convenience: ask people nearby bc it’s easy not random
Snowball: friends bring more friends into a study growing like a snowball

55
Q

Reliable change index (RCI)

A

Change in client scores due to an outcome measure administered before and after the client receives treatment is attributable to measurement error*

56
Q

What to alter to maximize magnitude or tests reliability coefficient?

A

1) make them longer
2) have an unrestricted range of scores/items are heterogeneous with regard to the attribute being measured by test

57
Q

Wilcoxon signed rank test

A

Non parametric test used to compare the mean of 2 data sets (or one) when the data is ranked*

58
Q

Eigenvalue

A
  • from principal components analysis
  • total variability explained by an orthogonal component
59
Q

Downside to non parametric test

A

Less precise with data, they are less powerful and LESS LIKELY to detect a false null. They are also set with a lower alpha than parametric.

60
Q

Mediator vs moderator vs latent vs suppressor variable

A

Mediator= accounts for/is responsible for relationship between IV and DV (Ellis proposes that beliefs mediate (are responsible for) the impact of an event on our emotional/behavioural responses.)

Moderator: variable that affects the strength between two variables (if the size of the correlation between a predictor and criterion differs for OA vs YA)

Latent: theoretical variable that is believed to underlie a measured or observed variable

Suppressor: reduces or conceals the relationship between 2 variables. Removing the effects of this variable increases correlation between two variables