block 9 + Flashcards

1
Q

What is a “two way” table?

What’s another name for two way table?

A

When we want to examine the relationship between two categorical variables we tabulate one against the other.

Another name– cross tabulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you know if an association exists between 2 categorical variables?

A

if the distribution of one variable varies according to the value of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the formula for calculating the expected frequency in each cell?

A

(row total x column total)/ overall total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a large and small X2 say about the null hypothesis?

A

Large X2 suggests the data DON’T support the null hypothesis since the observed data are not what we would expect under the null hypothesis.

Small X2 suggests the data do support the null hypothesis since the observed data are close to what we would expect under the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the formula for degrees of freedom for a chi squared test?

A
v = (r- 1) x (c-1)
r = number of rows in the table
c = number of columns in the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Knowing the test statistic, X2, and degrees of freedom allows us to know what information?

A

we can obtain the probability of obtaining the observed, or more extreme, data if the Null Hypothesis were true (a P-value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the degrees of freedom for:

  • chi squared test for independence
  • chi squared test for trend
A

The chi-squared test of independence is designed to provide evidence against the null hypothesis of independence. That means that if there is a sufficiently large dependence of any kind between the two variables, the test will provide evidence against the null.

The chi-squared test for trend is designed to provide evidence against the null hypothesis of no linear trend. So the dependence between the variables needs to be in the form of a linear trend for the test to detect evidence against the null.

While the chi-squared test of independence can be used for a table with any number of rows (r) and columns (c), and has a (r-1)*(c-1) degrees of freedom, the chi-squared test for trend requires that one of the variables has 2 categories and one of the variables can be ordered. The chi-squared test for trend always has 1 degree of freedom because it is focused on one kind of dependence, a linear trend across ordered categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

With what sample sizes can you use X2 (chi squared)?

A

For 2 x 2 Tables

If the total sample size N >40, then X2 can be used.

If N is between 20 and 40, and the smallest expected value is at least 5, X2 can be used.

r X c (r rows and c columns with r > 2 and c >2)

The X2 test is valid if no more than 20% of the expected values are less than 5 and none are less than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the six steps necessary to determine whether there is an association between two categorical variables?

A
  1. display data as a 2 way table
  2. calculate row or column percentages as appropriate
  3. declare null hypothesis and calculate chi squared value
  4. calculate degrees of freedom
  5. refer to chi squared distribution to get p-value
  6. interpret p- value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is correlation?

A

The method used to assess the degree to which two quantitative variables are associated with each other.

quantifies the strength of the association between two quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is linear regression?

A

describe the relationship between two variables and predict the value of one variable given the value of another variable.

When one of the variables is thought to depend on the other, it is more appropriate to quantify the relationship between them. (quantify = regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is an effective and simple way of looking at two quantitative variables?

A

scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In a scatterplot, what’s usually plotted on x and y axis?

A

It is standard to plot:

The explanatory variable on the x-axis (horizontal axis)
The response variable on the y-axis (vertical axis)

Example:
to examine whether haemoglobin changes with age:

age = the explanatory variable
haemoglobin = the response variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a correlation coefficient?

Whats the standard way of calculating this?

A

measures the degree of linear association

The standard method is to calculate the Pearson’s correlation coefficient, denoted r.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name 2 features of the correlation coefficient.

A
    • measures the scatter of the points around an underlying linear (straight line trend)
    • can take any value from -1 to +1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does the line for a positive and negative correlation coefficient look like

A

r +. line increasing, as one variable increases, the other variable increases.

r - . line decreasing, as one variable increases, the other variable decreases.

17
Q

what is r = 0

A

no correlation, but sometimes this could be due to a strong non-linear correlation like a U shape

18
Q

Definition of Pearson’s correlation coefficient.

A

Measures the scatter of points around an underlying linear trend.

19
Q

All variables must be independent for use in a correlation. true or false.

A

true. all observations should be independent, this means that only one observation for each variable should come from each individual in the study.

Can’t– In a study of pregnant women, blood pressure and estrogen levels were observed at different gestational ages.

20
Q

what are 3 points to remember in the presentation of correlation?

A

The data should be shown in a scatterplot.
The correlation coefficient, r should be given to two decimal places.
The number of observations should be stated.

21
Q

correlation is most useful for generating hypothesis or causation?

A

generation hypothesis. can’t show causation.

22
Q

does it matter what’s plotted on the x and y axis for correlation and regression studies?

A

It matters in regression, not in correlation. Though usually for correlation, it’ll be plotted with the same consistency as regression.

x- axis– explanatory/ independent variable
y- axis– response/ dependent variable.

Ex–BP is explained by age. So BP on y axis, age on x axis.

23
Q

For regression, what goes on the different axii?

A

x- axis– explanatory/ independent variable
y- axis– response/ dependent variable.

Ex–BP is explained by age. So BP on y axis, age on x axis.

24
Q

How to define the line of best fit for a regression line?

What does the regression line tell us?

A

The one that minimizes the sum of the square of the residuals.

The regression line gives an estimate of the average value of y for any value of x.

25
Q

What are 3 assumptions that underlie linear regression?

A

The response variable y has a Normal distribution for each x
The variability of y must be the same across x
The relationship between x and y should be linear

26
Q

when is it valid to use a regression?

What assumptions need to be made?

A

For a regression analysis to estimate a broadly sensible line:
– The relationship should be at least approximately linear.

For p-values and confidence intervals to be correct there are more assumptions:

    • Residuals should be spread out about the regression line about as much at the left hand end and at the right hand end.
  • -Residuals should be approximately normally distributed.
27
Q

what is multi-variable analysis?

A

we look at one response variable and multiple explanatory variables

28
Q

what are two ways of analysis to control/ address issue of confounding?

A
  1. stratification

separating study subjects into groups according to their exposure to the confounder

  1. regression models

analysis of the association of a quantitative response variable with several explanatory variables (quantitative or categorical) simultaneously

29
Q

why do we need multiple variable analysis methods?

A

When considering the effects of explanatory variables one-by-one, results can be confounded.

30
Q

what are 2 techniques for describing sampling uncertainty in estimates?

A
  • confidence intervals

- hypothesis tests, p- value

31
Q

what is a confidence interval?

A

A confidence interval for an estimate describes the range of uncertainty in what the true population value might be

32
Q

what is a hypothesis/ significance test

A

A significance test quantifies the evidence in data against a null hypothesis, using a p-value

33
Q

what is the general formula for calculating the confidence interval?

A

CI: estimate ± multiplier x SE(estimate)

34
Q

what are the general steps for calculating significance test?

A
  1. state null hypothesis
  2. calculate test statistic (z or t)
  3. look up calculated statistic to find p value
  4. interpret p value