HYPOTHESIS TESTING - LEARNING OUTCOMES Flashcards

Question

What is the general method of assessing association between two categorical variables?

Answer 1

Chi-squared test

Answer 2

We are essentially looking at how likely it is that we would get the difference that we have observed (in the odds ratio for example) if the truth was that there was no association between our two variables. Each percentage in our results table is subject to sampling error. We need to assess whether the differences between them could be due to chance. We conduct a chi-squared test to get a p-value and this p-value tells us how likely the value is to have occurred by chance if there is truly no association.

Answer 3

1. State the null hypothesis - no association between the two variables. 2. Calculate the test statistic - how close are the observed values in the table to the values expected were there no true association? Expected number = row total x column total / overall total The chi-squared test will compare the expected numbers under the null hypothesis to the numbers we actually got to see if there is a significant difference. For each cell we then subtract the expected (E) from the observed (O), then square and divide by E: (O-E)^2 / E Then sum all cells to give the chi-squared statistic. The larger the value of the chi-squared statistic, the less consistent the data are with the null hypothesis. 3. Obtain a p-value - refer value of chi-squared to table of chi-squared distribution. Degrees of freedom = (number of rows - 1) x (number of columns - 1) 4. Interpret the p-value

Answer 4

Expected number = row total x column total / overall total The chi-squared test will compare the expected numbers under the null hypothesis to the numbers we actually got to see if there is a significant difference. For each cell we then subtract the expected (E) from the observed (O), then square and divide by E: (O-E)^2 / E Then sum all cells to give the chi-squared statistic. The larger the value of the chi-squared statistic, the less consistent the data are with the null hypothesis.

Answer 5

The larger the value of the chi-squared statistic, the less consistent the data are with the null hypothesis of no association. Larger value of chi-squared = Smaller p-value

Answer 6

The p-value is the probability of getting a difference between groups as large or larger than that observed, were there really no true association. For example, P

Answer 7

1. Each subject contributes data to only one cell | 2. The expected count in each cell should be at least 5

Answer 8

If you have small numbers in the 2x2 table we can use Yates continuity correction for any expected values less than 10. Alternatively, if you have very low numbers you can use a variant of the chi-squared test called Fishers exact test (any expected values

Answer 9

Yes. We can use the chi-squared test for tables with more than 2 rows and columns - i.e. outcome and/or exposure has more than 2 categories. SPSS won't give you the odds ratio for more than a 2x2 table, but you can still use the chi-squared. We can compute descriptive statistics such as odds ratios and 95% confidence intervals (assuming outcome binary) for each exposure group relative to baseline BUT doesn't tell us about the overall association. You would then use the chi-squared to tell us about the overall association between those two values. The p-value is used to assess the overall significance of association.

Answer 10

For large tables the exact test is computationally intensive. It is ok to use the chi-squared test if no more than 20% of expected values are less than 5 and no expected values are less than 1. SPSS tell you about these conditions. If the test is not valid you may have to combine rows or columns with small numbers (make sure it is sensible and biologically relevant).

Answer 11

The chi-squared test for trend is a special test for when the exposure variable is ordered and the outcome is binary. e.g. our exposure variable is age group and our out come is disease status. The chi-squared test for trend will look at whether there is any evidence for our outcome changing with a change in our exposure groups.

Answer 12

The chi-squared test assesses whether the proportion with the outcome differs between exposure groups.

Answer 13

The chi-squared test for trend is more sensitive than the standard chi-squared test and looks for an increasing or decreasing trend in the proportions across the exposure categories. As before, the null hypothesis is that there is no association. This test is displayed automatically as a 'linear-by-linear' test whenever you perform a chi-squared test.

Answer 14

The odds ratio or risk ratio. Hypothesis testing tells us about the significance of an association. We really need to be presenting both in order to describe both the magnitude of effect and the likelihood that you would see that effect.

Answer 15

Standard Pearson chi-squared test

Answer 16

The chi-squared test for trend

Answer 17

If you have a 2x2 table and low numbers you can use chi-squared with continuity correction. If you have very small numbers you can use Fishers exact test. You may need to merge categories as long as it is sensible to do so.

Answer 18

Plot them on a scatter diagram with the outcome on the y axis.

Answer 19

Essentially this just measure the closeness or degree of association between two continuous variables. The correlation coefficients (r) lie between -1 and +1. +1 indicates perfect positive association and -1 indicates perfect negative association. 0 indicates no association.

Answer 20

Pearson's correlation coefficient. This requires both height and weight to be normally distributed. Pearson's correlation coefficient measures the degree of linear association. Remember that the correlation coefficient is not assessing the steepness of the slope.

Answer 21

In order to use Pearson's correlation coefficient: - Both x and y variables have to be approximately normally distributed - plot histograms to check. - Can also look if cloud of dots falls in oval shape.

Answer 22

The null hypothesis is that there is no association between the x and y continuous variables - we are essentially testing whether the correlation coefficient (r) is different to 0. i.e. The null hypothesis is that the correlation coefficient is 0.

Answer 23

r^2 (Pearsons r) = the proportion of the variance of outcome variable which is explained by exposure variable. Essentially - how much of our outcome is due to our exposure?

Answer 24

If one or both variables are not normal: - Try transforming and using pearson's r - If can't be transformed, used Spearman's rank correlation (or Kendall's). This ranks values and examines how closely the ranks are correlated. The closer to +1 or -1 the greater the degree of association.

Answer 25

The non-parametric version of Pearson's is called Spearman's. Spearman's rank correlation (or Kendall's). This ranks values and examines how closely the ranks are correlated. The closer to +1 or -1 the greater the degree of association.

Answer 26

Spearman's

Answer 27

1. When observations are not independent - e.g. multiple measurements on subjects 2. Data dredging 3. To assess trends over time 4. Subgroup analysis - unless a priori aim because expect different relation 5. To assess agreement or reliability 6. If one is a function of the other e.g. baseline BP vs change in BP

Answer 28

Correlation does not equate to causation. There are a number of possibilities: 1. X influences or causes Y 2. Y influences X 3. Both X and Y are influenced by one or more other variables (confounders)

Answer 29

This might be the case when we have a very large sample size. This is why you need to interpret the size, direction and statistical significance all together.

Answer 30

Linear regression is an extension of correlation analysis. Correlation assesses how closely two continuous variables are associated with each other. If we see that there is a relationship we can then use linear regression to describe that relationship a bit further in terms of how much our outcome variable (y) increases or decreases in terms of how much our exposure variable (x) increases. Linear regression is based on fitting a best fit line through the data.

Answer 31

y = a + bx a = the intercept where it crosses the y axis b = slope of the line y and x =the y and x variables

Answer 32

The line of best fit is fitter to minimise the square of vertical distances between each observation and the line (method of least squares).

Answer 33

The beta coefficients.

Answer 34

a is the intercept. It is the value of y when x=0. It is not always biologically plausible but can be used for predictions.

Answer 35

b is the slope. It tells us how much, on average, y increases/decreases for each unit increase in x. It is an estimate of the magnitude of effect.

Answer 36

- b can be any value and depends on scale, i.e. units used (e.g. cm or m). - If b is positive it means that outcome increases as exposure increases. - If b is negative it means that outcome decreases as exposure increases. - If b=0 the outcome and exposure are not related. - Value of b is only an estimate of the true slope.

Answer 37

The value of b is only an estimate of the true slope. We can construct a 95% confidence interval around b (computed by SPSS). We can also obtain a p-value for the null hypothesis of b=0 (given by SPSS).

Answer 38

1. The relationship between y and x should be linear (look at scatter plot). 2. Values of y should be normally distributed around each value of x - this is different to both x and y having to be normally distributed themselves. You can still do linear regression even if one of your outcome or exposure variables is not drawn from a normal distribution. You check in SPSS that for every value of x the scatter of the values of y is drawn from a normal distribution. 3. Variability of y should be similar for each value of x - is scatter similar along the line?

Answer 39

Having got estimates of a and b we can use the equation to make predictions. These predictions are only valid for the range of x for which we have data available.

Answer 40

1. Look at the Model Summary box - this tells us how good the prediction is likely to be. The important thing to look at is the value of R Square. The adjusted R Square becomes important when you start taking into account multiple variables. This value of R Square tells us how much of our variation in output variable is explained by exposure variable. 2. We then get our coefficients from the Coefficients table. a can be found in the first row of the Unstandardised coefficients column under (Constant). b can be found in the row under a. 3. Our p-value for the slope is testing whether the slope of the line is any different to 0. CI will be shown on the right hand side of the Coefficients table.

Answer 41

Multiple linear regression can be used to extend the equation y=a+bx to include 2 or more exposure variables in cases where we want to take into account more exposure factors that are having an effect on our outcome variable.

Answer 42

y = a + b1x1 + b2x2 ........ For example if we want to look at blood pressure taking into account both weight and age: Bp = a + (b1 x weight) + (b2 x age) This would give us the age-adjusted regression coefficient for the effect of weight on blood pressure. The output when taking account multiple variables may give us a higher R Square value - meaning that we have explained more of our variation in output variable with our exposures variables and improved the predictability of our model.

Answer 43

- Methods are mathematically related but serve different purposes - Need to choose a method appropriate to the study aim - Correlation is used if we want to know if 2 variables are related and how closely - Linear regression can be used if we want to describe or model the relationship between 2 variables, or make predictions

Answer 44

A scatter plot.

Answer 45

By using a statistical test such as chi-squared

Answer 46

A chi-squared test compares be served categorical data to expected categorical data.

Answer 47

When none of the expected counts are less than 10

HYPOTHESIS TESTING - LEARNING OUTCOMES Flashcards

(71 cards)