Extra Notes Flashcards

1
Q

How do we report the percentile rank?

A

Do not use %
Just say, e.g., The percentile rank of the student who obtained a raw score of 172, is 80.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the rule of linear transformations?

A
  • If you add or subtract a constant from each value in a distribution,
    – The mean is increased/decreased by that constant – The standard deviation is unchanged
    • If you multiply or divide each value in a distribution by a constant,
    – The mean is multiplied/divided by that constant
    – The standard deviation is multiplied/divided by that constant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does it mean for a non-linear relationship to have a Pearson r value of 0?

A

This does not mean there is no relationship between the variables, just that there is no linear relationship!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the effect of outliers?

A

An outlier can have a large effect on the Pearson r, and on the “line of best fit”
It pulls the line over towards the outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the effect of range restriction?

A

Not looking at the whole range of data may lead to a weaker relationship/r value

Before calculating a correlation coefficient, consider whether the ranges of the two variables are sufficient to show their true relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you define Pearson r?

A

Pearson r is a measure of the extent to which paired scores occupy the same or opposite positions within their own distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you interpret Pearson r values?

A

Equal to 0
No relationship

Between 0 and .10
Trivial

Between .10 and .30
Small to medium

Between .30 and .50
Medium to large

Greater than .50
Large to very large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s one thing to notice when calculating regression Y?

A

Regression constants (ay and by) should be reported to 4 decimal places. This helps retain accuracy in your final answer when using the equation to predict Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Pearson r tell us?

A

Pearson r tells us how helpful the regression line will be in predicting Yi given Xi. Forrvaluesin between 0 and 1, the regression line will produce moderate errors

Pearson r also tells us something about how much of the variability in Y is accounted for by (the variability in) X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is r^ 2?

A

proportion of the variability of Y accounted for by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In a sample of students, height and weight are correlated with r = .65. What percentage of the variability in weight is accounted for by height in this sample?

A

r^2 = 42.25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pearson r is used when X and Y are both measured on what scales?

A

Interval or ratio scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are other types of correlation coefficients are used?

A

If the relationship is curvilinear, the correlation coefficient eta (η) can be used to describe the strength of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Spearman rank order correlation coefficient rho (rs)

A

one or both variables are measured on an ordinal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

biserial correlation coefficient (rb)

A

one of the variables is interval or ratio and the other is dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

phi coefficient (Φ)

A

both variables are dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the least-squares regression line?

A

The least-squares regression line is the prediction line that minimizes the total error of prediction, according to the least-squares criterion of
∑(Y −Y ‘)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Y’ = byX + aY

A

report ay and by to 4 decimal places.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Limitations of linear regression

A

Only appropriate:
– for linear relationships
– when the sample you used to calculate the regression line is representative of the sample you want to make predictions about
– within the range of the original variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the standard error of estimate? SEE

A

The standard error of estimate (SEE) tells us how much error can we expect on average when we use the regression line.

Example report: We can expect about 68% of actual GPAs to will fall within 0.43 points of the prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is homoscedasticity?

A

variability in Y stays constant across X values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How much has our prediction improved by adding a new predictor variable?

A

You could answer this by looking at:
– how much the standard error of estimate has decreased after adding the new predictor
– how much the proportion of variability accounted for has increased after adding the new predictor

“Total proportion of variability accounted for” (R2) is the most common measure of a regression model’s “goodness of fit” to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why can’t we just add up the r2 values of the two predictor variables (X1 and X2) to get the total proportion of variability accounted for (R2)?

A

The overlapping circles, You don’t want to “double count” the area

When X1 and X2 are highly correlated to each other…
… adding X2 to your regression model won’t substantially increase the total proportion of variability in Y accounted for

24
Q

R2 definition

A

the multiple coefficient of determination, tells us the proportion of the variance in Y that is accounted for by X1 and X2.

25
Q

Give final answers with two decimal places

A

Regression coefficients (b’s and a’s) should be reported to 4 decimal places. To get an a that is accurate to 4 decimal places, you should keep MORE than 4 decimal places in b when calculating a!

26
Q

Positive skew = right skew

A

Tail to the right, means get pulled to the tail

27
Q

Negative skew = left skew

A

Tail to the left, means get pulled to the tail

28
Q

What is the self-selection technique?

A

Self-selection is a sampling technique based on letting people sign up to participate
– convenient
– might end up over-representing people with strong opinions (e.g., restaurant reviews on Yelp), who are more comfortable with the topic you’re studying (e.g., a study on sexuality), who want to be paid for participation, etc.

29
Q

What is quota sampling?

A

Quota sampling is based on determining what proportions of people to sample based on whether they fit some pre-set constraints or categories
– might miss some important categories
– might miss participants who don’t obviously fit pre-set categories

30
Q

What is a random sample?

A

A random sample is one selected from the population by a process that ensures that each possible sample of a given size has an equal chance of being selected.
– True random sampling avoids sampling bias, and should result in a representative sample.
– Random sampling can sometimes impractical or impossible to achieve, but it can be thought of as the gold standard in sampling strategies

31
Q

How are chances expressed?

A

Chances are probabilities expressed as percentages.
– A probability of .75 = a 75% chance

32
Q

A probability of .75 is equal to?

A

3 to 1 odds

The odds for an event is the probability that the event happens, compared to the probability that it doesn’t happen.

33
Q

What is a sample space?

A

A sample space contains all possible outcomes of a random process. It is an exhaustive set of events.
– Sample space for coin toss: {heads, tails} – Sample space for die roll: {1, 2, 3, 4, 5, 6}
p(sample space) = 1

34
Q

Mutually exclusive events are events that cannot both occur together.

A

In other words, one event’s occurrence precludes the other event’s simultaneous occurrence
p(AandB)=0

35
Q

a priori probabilities

A

– “Before the fact” (without experience)
– Theoretically derived
– Not based on collected data

36
Q

a posteriori probabilities

A

– “After the fact” (with experience)
– Empirically derived
– Based on collected data

Given a large enough sample size, the a posteriori probability should approach the a priori probability
– If it doesn’t, some assumption you used to calculate the a priori probability is off (e.g. you might have a biased coin or loaded die)

37
Q

The general form of the addition rule is:

A

p(AorB)=p(A)+p(B)–p(AandB)

38
Q

The general form of the multiplication rule is:

A

p(A and B) = p(A) x p(B|A)

39
Q

Two events are independent if?

A

if the occurrence of one has no effect on the probability of occurrence of the other.

Coin landing heads vs. tails on first toss does not influence probability of coin landing heads vs. tails on second toss
– In this case, since the probability of B is not affected by whether A has occurred or not,
p(B|A) = p(B|not A) = p(B)

40
Q

What is the conjunction fallacy?

A

The conjunction fallacy results from a failure to recognize that the joint probability of A and B is always less than or equal to the probability of A
p(A and B) ≤ p(A)

41
Q

What is the gambler’s fallacy?

A

The “Gambler’s Fallacy” results from a failure to account for the fact that events are independent, i.e. that in this situation
p(B|A) = p(B)

If a certain outcome hasn’t happened for a while, players often assume it’s “due”

42
Q

What is the general principle of hypothesis testing?

A

to evaluate your sample data against what is likely to have been produced by random chance

43
Q

What are some features of binomial distribution?

A

– Involves a discrete variable
– Has two tails (probabilities decrease from center to ends)
– Approximates the shape of a normal curve as N increases
– Is symmetrical when the two outcomes are equally probable (P=Q=.50)

The binomial distribution can also be used when P ≠ Q, but the distribution will not be symmetrical

44
Q

What is a decision rule, for whether to reject or retain H0?

A

– If the obtained probability ≤ α, REJECT H0. – If the obtained probability > α, RETAIN H0.

45
Q

What is the Unicorn principle?

A

If you do not accept H1, it is proper to say that you “retain” or “fail to reject” H0.
It is incorrect to say that you “accept” H0.
– Failing to find an effect is different from proving there is no effect.
– “Absence of evidence is not evidence of absence.”

46
Q

What happens if we make the alpha level more strict (for example, α = .01 instead of .05)?

A

– You reduce the chance of a false alarm (Type I error)
– At the same time, you increase the chance of a miss (Type II error)

47
Q

Describe the sign test, in seven steps

A
  1. Ensure that the conditions are met for using the binomial distribution
  2. Formulate clear, specific null and alternate hypotheses
    – Directional or non-directional H1?
  3. Set your alpha value
  4. Gather data and calculate the number of pluses and minuses obtained
  5. Determine the probability of getting an outcome this extreme or any more extreme
    – An extreme value is one that favors the alternate hypothesis
    – For a non-directional H1, do a two-tailed evaluation
    – For a directional H1, do a one-tailed evaluation
  6. If the probability calculated in Step 5 is:
    ≤ α, reject H0 and accept H1
    > α, retain (or “fail to reject”) H0
  7. Write out a clear, specific conclusion that is consistent with the wording of your original hypotheses
    – For example, “Reject H0 and conclude that a romantic partner’s scent affects sleep efficiency.”
48
Q

What is a sampling distribution? How is it generated?

A

A sampling distribution is generated by
1)Taking repeated,random samples of a specified
size (N) from a population (with replacement)
2) Calculating a sample statistic(like the mean)for
each sample
3) Making a frequency distribution of that sample statistic

49
Q

Two features of the sampling distribution of the mean:

A

1) Has a mean equal to the mean of the raw-score
population

2) Has a standard deviation equal to the standard deviation of the raw-score population divided by the square root of N

50
Q

Why is sigma X-bar also called the standard error of the mean?

A

If we can collect data on our entire population, we can answer the research question with perfect accuracy.

Our sample mean would provide our best possible guess of the population mean. But we should expect some amount of error in our guess or estimate.

51
Q

As N increases, the standard error of the mean

A

Decreases

– Think of each Xbar as an estimate/prediction of μ.
– As N increases, your Xbar is likely to provide a more
accurate estimate/prediction of μ.
– The Xbars will cluster more tightly around μ. The sampling distribution will be narrower, and σ X will decrease.

52
Q

What does the standard error of the mean measure?

A

The standard error of the mean measures the standard amount of difference (measurement error) between X and μ that we should expect simply by chance.

The larger the sample size, the smaller the standard error of the mean. (LAW OF LARGE NUMBERS)

53
Q

Describe the central limit theorem.

A

Regardless of the shape of the population of raw scores, the sampling distribution of the mean approaches a normal distribution as sample size N increases

N for a sampling distribution is the size of each sample.
If the raw-score population is normally distributed, then the
sampling distribution will always be approximately normal.

If the raw-score population is NOT normally distributed, but N ≥ 30, then the sampling distribution will also be approximately normal.

54
Q

Describe the z-step, step by step

A
  1. Check that assumptions are met for the z-test
    – Is the dependent measure on an interval or ratio scale?
    – Is the sampling distribution to which you’ll be comparing your X obt approximately normal?
    • Parent population should be approximately normal, and/or N ≥ 30 (“Central Limit Theorem”)
  2. Formulate your null and alternate hypotheses
    – Are you using non-directional or directional hypotheses?
  3. Set alpha value
  4. Determine the characteristics of the comparison distribution (i.e., the sampling distribution based on the null hypothesis)
  5. Gather data and calculate Xbar-obt
  6. Calculate the z-score of Xbar-obt
  7. Compared z-obt to z-crit
  8. Make a decision to reject or retain Ho
  9. Write out a conclusion
55
Q

Power can be defined as:

A

• 1–β
• The probability that we will NOT make a Type II error
• The probability of rejecting H0 given that H0 is false
• The probability that our study will be able to detect an effect that truly exists

56
Q

If H0 is actually true in reality, the probability of a Type II error is equal to

A

0

57
Q

4 influences on power

A
  1. More relaxed alpha, higher power. However, this would simultaneously increase the chance of a Type I error.
  2. Larger effect size, higher power
  3. Lower variability, increased power
  4. Increase sample size, increase power