stats Flashcards

1
Q

what is an absolute value?

A

the distance from 0 on a number line, can be positive or negative. if a number is negative its absolute value is equal to the positive value of the number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does an exhaustive variable mean?

A

it covers everyone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does a mutually exclusive variable mean?

A

everyone fits into one and only one place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a nominal variable?

A

a variable with unordered categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a ordinal variable?

A

a variable with ordered categories and undefined distances between values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is an interval variable?

A

a variable with defined distances between values and arbitrary zero (0 DOESN”T mean nothing) ex. temperature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a ratio variable?

A

a variable with defined distances between values and non-arbitrary zero. 0=0. Ex.Age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

describe discrete vs continuous (interval and ratio intervals)

A

discrete is measured in whole numbers when continuous is measured in units that are infinitely divisible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variables with only two possible values are called ____

A

dummy, dichotomous or binary variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

deterministic causality means…

A

if A causes B deterministically if A’s occurrence is always followed by B’s occurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

probabilistic causality means…

A

A causes B probabilistic if A’s occurrence increases the probability of B’s occurrence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how to determine causality…

A

we have reason to believe that x causes y when…
1. there is an association between x and y
2. x precedes in time
3. we have eliminated spurious casual linkages
4. we have a plausible explanatory rational for the casual relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The multivariate casual scenarios

A
  1. Indirect or chain casual relationships
  2. Spurious associations
  3. Multiple Causality
  4. Statistical interactions
    5.Suppression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

describe indirect or chain causal relationships

A

x1 casually influences x2 which then casually influences y. (x1 -> x2 -> y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

describe spurious associations

A

a third variable x2 casually influences both x1 and y such that an empirical association exists between x1 and y but the association is not casual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

describe multiple causality

A

x1 and x2 have distinct effects on y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

describe statistical interactions

A

x1 influences y differently for different values of x2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

describe suppression

A

x1 is related to y through distinct processes that cancel each other out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does holding or controlling or above and beyond x2 mean?

A

what’s the effect of x1 with x2 set asides. removed x2 from the story. what is x1’s unique explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

when thinking of OLS regression line

A
  • y= bx + A
  • b represents the slope of the regression line
  • A positive value of b means that the slope is positive
  • b represents the change in Y for a unit change in X, i.e., if X increases by a value of one, Y changes by an amount of b
    -A represents the value of Y at which the line crosses the Y axis. It is known as the intercept.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what does R2 mean

A

R2 = the proportion of total variability in Y that is explained by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what are beta 1 and beta 2?

A

beta 1 represents to change in Y in standard
deviations for a one standard deviation change in x1, holding x2.
beta 2 represents the change in y in standard deviations for a one standard change in x2 holding x1 constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are the four ways we talked about for causality stuff

A
  1. quick and dirty
  2. eliminate cofounding
  3. explore mediation
  4. explore statistical interaction
24
Q

describe 1. quick and dirty multiple causality
2.eliminate cofounding
3. explore mediation
4. explore statistical interactions

A
  1. quick and dirty multiple causality assemble one multiple regression model with lots of variables what’s the effect of controlling variables.
  2. eliminate cofounding is about causality. Eliminate confounding and spuriousness.
  3. explore mediation involves establishing causation, and explaining an association.
  4. exploring statistical interactions involves having a story about _____, brings opinion.
25
Q

what does R2 = 0.0373 mean?

A

3% of the variability in Y is explained by these demographic factors. This is small.

26
Q

is P=0.913 or P=0.627 statistically significant?

27
Q

what are the 6 aspects of tests of significance

A

test of significance, test statistic, p value, alpha level, conclusion, assumptions

28
Q

test of significance - describe null and alternative hypothesis

A

null is not associated in the population.
alternative is there is an association in the population.

29
Q

describe p value

A

the lower the value the more likely to contradict the Null.

30
Q

describe alpha level

A

number that p value must be below to contradict the null. 0.05 usually.

31
Q

what is the conclusion after finding p and alpha

A

is p value less than alpha or not, if yes then reject the null in favour of the alternative, if no them you don’t reject.

32
Q

practice question: suppose you produced a 95% confidence interval for a variable from a sample size of n=100. describe two strategies you might follow to produce a narrower confidence interval.

A
  1. increaser sample size (larger n = smaller margin of error
  2. reduce to 90% confidence interval
33
Q

practice question: describe different operationalize of a UBC students GPA at the quantitative, ordinal, nominal, and dichotomous level of measurement.

A

quantitative = GPA nearest 10th or percentage
ordinal = letter grades
nominal = none
dichotomous = pass or fall

34
Q

practice question: you are interested on the casual effect of self-esteem on self-rated health. Describe an experimental approach to studying this issue, a cross sectional approach to the issue and a longitude approach to the issue.

A

experimental = group of 100 randomly assign 50 to experimental and 50 to a control, spend am hour watching positively, then measure self rated health.
cross section = random people in the street and ask them to rate self esteem and self rated health
longitude = take some people and keep asking them overtime so you can track changes.

35
Q

we can explore the association. Between a categorical variable x and a numerical variable y by comparing?

A

o 1. Central tendencies (means or medians)
o 2. Variability (standard deviations or interquartile ranges) and
o 3. Shapes (histograms or boxplots)
o For the distributions of Y by each value of X.

36
Q

What is Cramers v used for?

A

measure the strength of the categorical by categorical association

37
Q

what are the “association scores” for Cramers V

A

0.00 = no association
0.00-0.10 = very weak
0.10-0.20 = weak
0.20-0.30 = modest
0.30-0.40 = strong
0.40 or higher = very strong

38
Q

what is Kendalls tau-b?

A

tells us about the direction of the association (categorical by categorical)

39
Q

what are the “association scores” for Kendals Tau B

A

direction:
range from a low of -1 (a perfect negative association) to a high of 1 (a perfect positive association)
magnitude:
0.00 = no association
0.00-0.10 = very weak
0.10-0.20 = weak
0.20-0.30 = modest
0.30-0.40 = strong
0.40 or higher = very strong

40
Q

what is Pearsons R

A

Pearson r measures the direction and strength of a linear associated between two numerical variables. Pearson 4 is also referred to a r, Pearson correlation, correlation, or correlation coefficient.
How closely do the point hug the linear line? The closer they are the stronger the association.

41
Q

the shapes of Pearsons r

A

You can get a good sense of the strength of the association by the shape of the cloud of points. Is the cloud of points hsaped more like a soccer ball (representing no association), a football (a moderate association) or a cucumber (strong association

42
Q

lets use the following protocol when interpreting the magnitude (strength) of r:

A

0.00 = no association
0.00-0.10 = very weak
0.10-0.20 = weak
0.20-0.30 = modest
0.30-0.40 = strong
0.40 or higher = very strong

43
Q

how to tell the direction of Pearsons R

A

R is positive when the association is positive and negative when the association is negative; r = 0 represents no linear association at all.

44
Q

what is spearman’s rho

A

Spearman’s Rho measures the direction and strength of an association between two numerical variables. It replaces the actual data values with ranks and then executes a Pearson’s correlation on the ranked data.
§ Where is this country ranked lowest to highest?
§ Scatter plot will show if its linear or not

45
Q

OLS regression line

A

A straight line in a two-dimensional space can be represented by the equation Y = bX + a.
In a scatterplot the equation for the straight line that best represents the pattern of the points can be a useful summary of the nature of the linear association between the two variables
VERTICAL DISTANCE OF THE POINT FROM THE LINE

46
Q

R2 provides an indication of the strength of the association. Let’s use the following protocol when interpreting the magnitude of R2.

A

0.00 = no association
0.00-0.01 = very weak
0.01-0.04 = weak
0.04-0.09 = modest
0.09-0.16 = strong
0.16 or higher = very strong

47
Q

how to conduct numerical by numerical (in order)

A
  • Always start with scatter plot (lowess)
  • If linear we can use persons r (test of significance, 95% confidence interval)
  • If linear we can additionally execute a OLS regression line (b,a, beta, r squared, 95% confidence)
  • If curvy linear, we can transform x or y to make it linear OR spearmans ro
  • If it looks crazy/other, make an ordinal version or x or y.
  • Maybe do a contingency table.
48
Q

what is strength

A

strength is how closely points hug the line

49
Q

what is a chi-square used for

A

a test of significance for categorical by categorical

50
Q

what to mention when describing situation

A

single peak?
skewed?
postive or negative?
curvilinear?
floor or ceiling effect?
linear?

51
Q

explain experimental design

A

Randomly assign study participants to an experimental group or a control group, introduce the treatment X to the experimental group but not the control group; measure Y in each group and compare the scores of Y between the two groups

52
Q

explain cross sectional design

A

Measure X and Y in a group of study participants at a single point in time

53
Q

explain longitudinal design

A

Measure X and Y in a group of study participants at multiple points in time

54
Q

what are the five random sampling strategies

A
  1. simple random sample
  2. systematic random
  3. stratified random sample
  4. cluster random sample
  5. multistage random sample